├── .DS_Store
├── .ipynb_checkpoints
├── 06장_1절_데이터셋_설명-checkpoint.ipynb
├── 06장_2절_결측치처리_클래스레이블_원핫-checkpoint.ipynb
├── 07장_3절_파이프라인-checkpoint.ipynb
├── 07장_4절_그리드서치-checkpoint.ipynb
├── 07장_모형평가_6절_분류_회귀_군집-checkpoint.ipynb
├── 08장_지도학습_3절_k최근접이웃-checkpoint.ipynb
├── 08장_지도학습_4절_선형회귀분석-checkpoint.ipynb
├── 08장_지도학습_5절_로지스틱회귀분석-checkpoint.ipynb
├── 08장_지도학습_6절_나이브베이즈-checkpoint.ipynb
├── 08장_지도학습_7절_의사결정나무-checkpoint.ipynb
├── 08장_지도학습_8절_서포트벡터머신-checkpoint.ipynb
├── 08장_지도학습_9절_크로스밸리데이션-checkpoint.ipynb
├── 09장_앙상블_2절_보팅-checkpoint.ipynb
├── 09장_앙상블_3절_1_랜덤포레스트-checkpoint.ipynb
├── 09장_앙상블_3절_2_배깅-checkpoint.ipynb
├── 09장_앙상블_4절_1_adaboost-checkpoint.ipynb
├── 09장_앙상블_4절_2_gradient_boost-checkpoint.ipynb
├── 09장_앙상블_5절_스태킹-checkpoint.ipynb
├── 10장_차원축소_2절_주성분분석(PCA)-checkpoint.ipynb
├── 10장_차원축소_3절_커널_PCA-checkpoint.ipynb
├── 10장_차원축소_4절_LDA-checkpoint.ipynb
├── 10장_차원축소_5절_LLE(locally_linear_embedding)-checkpoint.ipynb
├── 10장_차원축소_6절_비음수행렬분해(NMF)-checkpoint.ipynb
├── 11장_비지도학습_2절_k 평균 클러스터링 -checkpoint.ipynb
├── 11장_비지도학습_3절_계층군집-checkpoint.ipynb
├── 11장_비지도학습_4절_DBSCAN-checkpoint.ipynb
├── 11장_비지도학습_5절_가우시안_혼합_모델-checkpoint.ipynb
├── 12장_딥러닝_2절_1_퍼셉트론-checkpoint.ipynb
├── 12장_딥러닝_3절_7_텐서플로_소개-checkpoint.ipynb
├── 12장_딥러닝_3절_8_신경망_(1)분류문제-checkpoint.ipynb
├── 12장_딥러닝_3절_9_신경망_(2)회귀문제-checkpoint.ipynb
├── 12장_딥러닝_4절_CNN-checkpoint.ipynb
├── 12장_딥러닝_5절_RNN-checkpoint.ipynb
├── 12장_딥러닝_6절_오토인코더1(시퀀스형)-checkpoint.ipynb
├── 12장_딥러닝_6절_오토인코더2(함수형)-checkpoint.ipynb
├── 12장_딥러닝_7절_1_자연어처리-checkpoint.ipynb
├── 12장_딥러닝_7절_2_seq2seq-checkpoint.ipynb
└── 12장_딥러닝_8절_GAN-checkpoint.ipynb
├── 06장_1절_데이터셋_설명.ipynb
├── 06장_2절_결측치처리_클래스레이블_원핫.ipynb
├── 07장_3절_파이프라인.ipynb
├── 07장_4절_그리드서치.ipynb
├── 07장_모형평가_6절_분류_회귀_군집.ipynb
├── 08장_지도학습_3절_k최근접이웃.ipynb
├── 08장_지도학습_4절_선형회귀분석.ipynb
├── 08장_지도학습_5절_로지스틱회귀분석.ipynb
├── 08장_지도학습_6절_나이브베이즈.ipynb
├── 08장_지도학습_7절_의사결정나무.ipynb
├── 08장_지도학습_8절_서포트벡터머신.ipynb
├── 08장_지도학습_9절_크로스밸리데이션.ipynb
├── 09장_앙상블_2절_보팅.ipynb
├── 09장_앙상블_3절_1_랜덤포레스트.ipynb
├── 09장_앙상블_3절_2_배깅.ipynb
├── 09장_앙상블_4절_1_adaboost.ipynb
├── 09장_앙상블_4절_2_gradient_boost.ipynb
├── 09장_앙상블_5절_스태킹.ipynb
├── 10장_차원축소_2절_주성분분석(PCA).ipynb
├── 10장_차원축소_3절_커널_PCA.ipynb
├── 10장_차원축소_4절_LDA.ipynb
├── 10장_차원축소_5절_LLE(locally_linear_embedding).ipynb
├── 10장_차원축소_6절_비음수행렬분해(NMF).ipynb
├── 11장_비지도학습_2절_k 평균 클러스터링 .ipynb
├── 11장_비지도학습_3절_계층군집.ipynb
├── 11장_비지도학습_4절_DBSCAN.ipynb
├── 11장_비지도학습_5절_가우시안_혼합_모델.ipynb
├── 12장_딥러닝_2절_1_퍼셉트론.ipynb
├── 12장_딥러닝_3절_7_텐서플로_소개.ipynb
├── 12장_딥러닝_3절_8_신경망_(1)분류문제.ipynb
├── 12장_딥러닝_3절_9_신경망_(2)회귀문제.ipynb
├── 12장_딥러닝_4절_CNN.ipynb
├── 12장_딥러닝_5절_RNN.ipynb
├── 12장_딥러닝_6절_오토인코더1(시퀀스형).ipynb
├── 12장_딥러닝_6절_오토인코더2(함수형).ipynb
├── 12장_딥러닝_7절_1_자연어처리.ipynb
├── 12장_딥러닝_7절_2_seq2seq.ipynb
├── 12장_딥러닝_8절_GAN.ipynb
├── README.md
├── data
├── .DS_Store
└── eng-fra
│ ├── _about.txt
│ └── fra.txt
└── 소스코드전체.zip
/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bjpublic/MachineLearning/f6e04995802b49c5841a917a178dbf43f58a9f44/.DS_Store
--------------------------------------------------------------------------------
/.ipynb_checkpoints/07장_3절_파이프라인-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 파이프 라인"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 8,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "from sklearn import datasets\n",
17 | "from sklearn.pipeline import Pipeline\n",
18 | "from sklearn.preprocessing import StandardScaler\n",
19 | "from sklearn.linear_model import LinearRegression\n",
20 | "from sklearn.model_selection import train_test_split\n",
21 | "from sklearn.metrics import mean_squared_error"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 9,
27 | "metadata": {},
28 | "outputs": [
29 | {
30 | "data": {
31 | "text/plain": [
32 | "29.515137790197567"
33 | ]
34 | },
35 | "execution_count": 9,
36 | "metadata": {},
37 | "output_type": "execute_result"
38 | }
39 | ],
40 | "source": [
41 | "raw_boston = datasets.load_boston()\n",
42 | "\n",
43 | "X = raw_boston.data\n",
44 | "y = raw_boston.target\n",
45 | "\n",
46 | "# 트레이닝 / 테스트 데이터 분할\n",
47 | "X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=7)\n",
48 | "\n",
49 | "# 표준화 스케일링\n",
50 | "std_scale = StandardScaler()\n",
51 | "X_tn_std = std_scale.fit_transform(X_tn)\n",
52 | "X_te_std = std_scale.transform(X_te)\n",
53 | "\n",
54 | "# 학습\n",
55 | "clf_linear = LinearRegression()\n",
56 | "clf_linear.fit(X_tn_std, y_tn)\n",
57 | "\n",
58 | "# 예측\n",
59 | "pred_linear = clf_linear.predict(X_te_std)\n",
60 | "\n",
61 | "# 평가\n",
62 | "mean_squared_error(y_te, pred_linear)"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 10,
68 | "metadata": {},
69 | "outputs": [
70 | {
71 | "data": {
72 | "text/plain": [
73 | "29.515137790197567"
74 | ]
75 | },
76 | "execution_count": 10,
77 | "metadata": {},
78 | "output_type": "execute_result"
79 | }
80 | ],
81 | "source": [
82 | "# 트레이닝 / 테스트 데이터 분할\n",
83 | "X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=7)\n",
84 | "\n",
85 | "# 파이프라인\n",
86 | "linear_pipline = Pipeline([\n",
87 | " ('scaler',StandardScaler()), \n",
88 | " ('linear_regression', LinearRegression()) \n",
89 | "])\n",
90 | "\n",
91 | "# 학습\n",
92 | "linear_pipline.fit(X_tn, y_tn)\n",
93 | "\n",
94 | "# 예측\n",
95 | "pred_linear = linear_pipline.predict(X_te)\n",
96 | "\n",
97 | "# 평가\n",
98 | "mean_squared_error(y_te, pred_linear)"
99 | ]
100 | }
101 | ],
102 | "metadata": {
103 | "kernelspec": {
104 | "display_name": "Python 3",
105 | "language": "python",
106 | "name": "python3"
107 | },
108 | "language_info": {
109 | "codemirror_mode": {
110 | "name": "ipython",
111 | "version": 3
112 | },
113 | "file_extension": ".py",
114 | "mimetype": "text/x-python",
115 | "name": "python",
116 | "nbconvert_exporter": "python",
117 | "pygments_lexer": "ipython3",
118 | "version": "3.7.6"
119 | }
120 | },
121 | "nbformat": 4,
122 | "nbformat_minor": 4
123 | }
124 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/07장_4절_그리드서치-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 그리드 서치"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 7,
13 | "metadata": {
14 | "scrolled": true
15 | },
16 | "outputs": [
17 | {
18 | "name": "stdout",
19 | "output_type": "stream",
20 | "text": [
21 | "{'k': 3}\n",
22 | "0.9736842105263158\n"
23 | ]
24 | }
25 | ],
26 | "source": [
27 | "from sklearn import datasets\n",
28 | "from sklearn.preprocessing import StandardScaler\n",
29 | "from sklearn.neighbors import KNeighborsClassifier\n",
30 | "from sklearn.model_selection import train_test_split\n",
31 | "\n",
32 | "from sklearn.metrics import accuracy_score\n",
33 | "from sklearn.metrics import confusion_matrix\n",
34 | "from sklearn.metrics import classification_report\n",
35 | "\n",
36 | "# 꽃 데이터 불러오기\n",
37 | "raw_iris = datasets.load_iris()\n",
38 | "\n",
39 | "# 피쳐 / 타겟\n",
40 | "X = raw_iris.data\n",
41 | "y = raw_iris.target\n",
42 | "\n",
43 | "# 트레이닝 / 테스트 데이터 분할\n",
44 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
45 | "\n",
46 | "# 표준화 스케일\n",
47 | "std_scale = StandardScaler()\n",
48 | "std_scale.fit(X_tn)\n",
49 | "X_tn_std = std_scale.transform(X_tn)\n",
50 | "X_te_std = std_scale.transform(X_te)\n",
51 | "\n",
52 | "best_accuracy = 0\n",
53 | "\n",
54 | "for k in [1,2,3,4,5,6,7,8,9,10]:\n",
55 | " clf_knn = KNeighborsClassifier(n_neighbors=k)\n",
56 | " clf_knn.fit(X_tn_std, y_tn)\n",
57 | " knn_pred = clf_knn.predict(X_te_std)\n",
58 | " accuracy = accuracy_score(y_te, knn_pred)\n",
59 | " if accuracy > best_accuracy:\n",
60 | " best_accuracy = accuracy\n",
61 | " final_k = {'k': k}\n",
62 | " \n",
63 | "print(final_k)\n",
64 | "print(accuracy)"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": null,
70 | "metadata": {},
71 | "outputs": [],
72 | "source": []
73 | }
74 | ],
75 | "metadata": {
76 | "kernelspec": {
77 | "display_name": "Python 3",
78 | "language": "python",
79 | "name": "python3"
80 | },
81 | "language_info": {
82 | "codemirror_mode": {
83 | "name": "ipython",
84 | "version": 3
85 | },
86 | "file_extension": ".py",
87 | "mimetype": "text/x-python",
88 | "name": "python",
89 | "nbconvert_exporter": "python",
90 | "pygments_lexer": "ipython3",
91 | "version": "3.7.6"
92 | }
93 | },
94 | "nbformat": 4,
95 | "nbformat_minor": 4
96 | }
97 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/07장_모형평가_6절_분류_회귀_군집-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 분류"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "## 정확도"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 3,
22 | "metadata": {},
23 | "outputs": [
24 | {
25 | "name": "stdout",
26 | "output_type": "stream",
27 | "text": [
28 | "0.5\n",
29 | "2\n"
30 | ]
31 | }
32 | ],
33 | "source": [
34 | "#import numpy as np\n",
35 | "from sklearn.metrics import accuracy_score\n",
36 | "y_pred = [0, 2, 1, 3]\n",
37 | "y_true = [0, 1, 2, 3]\n",
38 | "print(accuracy_score(y_true, y_pred))\n",
39 | "print(accuracy_score(y_true, y_pred, normalize=False))"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 3,
45 | "metadata": {},
46 | "outputs": [],
47 | "source": [
48 | "## confusionm matrix"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 4,
54 | "metadata": {},
55 | "outputs": [
56 | {
57 | "data": {
58 | "text/plain": [
59 | "array([[2, 0, 0],\n",
60 | " [0, 0, 1],\n",
61 | " [1, 0, 2]])"
62 | ]
63 | },
64 | "execution_count": 4,
65 | "metadata": {},
66 | "output_type": "execute_result"
67 | }
68 | ],
69 | "source": [
70 | "from sklearn.metrics import confusion_matrix\n",
71 | "y_true = [2, 0, 2, 2, 0, 1]\n",
72 | "y_pred = [0, 0, 2, 2, 0, 2]\n",
73 | "confusion_matrix(y_true, y_pred)"
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": 5,
79 | "metadata": {},
80 | "outputs": [],
81 | "source": [
82 | "## classification report "
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 6,
88 | "metadata": {},
89 | "outputs": [
90 | {
91 | "name": "stdout",
92 | "output_type": "stream",
93 | "text": [
94 | " precision recall f1-score support\n",
95 | "\n",
96 | " class 0 0.67 1.00 0.80 2\n",
97 | " class 1 0.00 0.00 0.00 1\n",
98 | " class 2 1.00 0.50 0.67 2\n",
99 | "\n",
100 | " accuracy 0.60 5\n",
101 | " macro avg 0.56 0.50 0.49 5\n",
102 | "weighted avg 0.67 0.60 0.59 5\n",
103 | "\n"
104 | ]
105 | }
106 | ],
107 | "source": [
108 | "from sklearn.metrics import classification_report\n",
109 | "y_true = [0, 1, 2, 2, 0]\n",
110 | "y_pred = [0, 0, 2, 1, 0]\n",
111 | "target_names = ['class 0', 'class 1', 'class 2']\n",
112 | "print(classification_report(y_true, y_pred, target_names=target_names))"
113 | ]
114 | },
115 | {
116 | "cell_type": "markdown",
117 | "metadata": {},
118 | "source": [
119 | "# 회귀"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": null,
125 | "metadata": {},
126 | "outputs": [],
127 | "source": [
128 | "# mean absolute error"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": 5,
134 | "metadata": {},
135 | "outputs": [
136 | {
137 | "name": "stdout",
138 | "output_type": "stream",
139 | "text": [
140 | "0.5\n"
141 | ]
142 | }
143 | ],
144 | "source": [
145 | "from sklearn.metrics import mean_absolute_error\n",
146 | "y_true = [3, -0.5, 2, 7]\n",
147 | "y_pred = [2.5, 0.0, 2, 8]\n",
148 | "\n",
149 | "print(mean_absolute_error(y_true, y_pred))"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": 6,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "# mean squared error"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": 7,
164 | "metadata": {},
165 | "outputs": [
166 | {
167 | "data": {
168 | "text/plain": [
169 | "0.375"
170 | ]
171 | },
172 | "execution_count": 7,
173 | "metadata": {},
174 | "output_type": "execute_result"
175 | }
176 | ],
177 | "source": [
178 | "from sklearn.metrics import mean_squared_error\n",
179 | "y_true = [3, -0.5, 2, 7]\n",
180 | "y_pred = [2.5, 0.0, 2, 8]\n",
181 | "print(mean_squared_error(y_true, y_pred))"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": 8,
187 | "metadata": {},
188 | "outputs": [],
189 | "source": [
190 | "# R2"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": 10,
196 | "metadata": {},
197 | "outputs": [
198 | {
199 | "name": "stdout",
200 | "output_type": "stream",
201 | "text": [
202 | "0.9486081370449679\n"
203 | ]
204 | }
205 | ],
206 | "source": [
207 | "from sklearn.metrics import r2_score\n",
208 | "y_true = [3, -0.5, 2, 7]\n",
209 | "y_pred = [2.5, 0.0, 2, 8]\n",
210 | "print(r2_score(y_true, y_pred))"
211 | ]
212 | },
213 | {
214 | "cell_type": "markdown",
215 | "metadata": {},
216 | "source": [
217 | "# 군집"
218 | ]
219 | },
220 | {
221 | "cell_type": "code",
222 | "execution_count": null,
223 | "metadata": {},
224 | "outputs": [],
225 | "source": [
226 | "# adjusted rand index"
227 | ]
228 | },
229 | {
230 | "cell_type": "code",
231 | "execution_count": 2,
232 | "metadata": {},
233 | "outputs": [
234 | {
235 | "name": "stdout",
236 | "output_type": "stream",
237 | "text": [
238 | "0.24242424242424246\n"
239 | ]
240 | }
241 | ],
242 | "source": [
243 | "from sklearn.metrics import adjusted_rand_score\n",
244 | "labels_true = [0, 0, 0, 1, 1, 1]\n",
245 | "labels_pred = [0, 0, 1, 1, 2, 2]\n",
246 | "\n",
247 | "print(adjusted_rand_score(labels_true, labels_pred))"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": 3,
253 | "metadata": {},
254 | "outputs": [],
255 | "source": [
256 | "# silloutte score"
257 | ]
258 | },
259 | {
260 | "cell_type": "code",
261 | "execution_count": 2,
262 | "metadata": {},
263 | "outputs": [
264 | {
265 | "name": "stdout",
266 | "output_type": "stream",
267 | "text": [
268 | "0.5789497702625118\n"
269 | ]
270 | }
271 | ],
272 | "source": [
273 | "from sklearn.metrics import silhouette_score\n",
274 | "X = [[1, 2], [4, 5], [2, 1], [6, 7], [2, 3]]\n",
275 | "labels = [0, 1, 0, 1, 0] \n",
276 | "sil_score = silhouette_score(X, labels)\n",
277 | "print(sil_score)"
278 | ]
279 | },
280 | {
281 | "cell_type": "code",
282 | "execution_count": null,
283 | "metadata": {},
284 | "outputs": [],
285 | "source": []
286 | }
287 | ],
288 | "metadata": {
289 | "kernelspec": {
290 | "display_name": "Python 3",
291 | "language": "python",
292 | "name": "python3"
293 | },
294 | "language_info": {
295 | "codemirror_mode": {
296 | "name": "ipython",
297 | "version": 3
298 | },
299 | "file_extension": ".py",
300 | "mimetype": "text/x-python",
301 | "name": "python",
302 | "nbconvert_exporter": "python",
303 | "pygments_lexer": "ipython3",
304 | "version": "3.7.6"
305 | }
306 | },
307 | "nbformat": 4,
308 | "nbformat_minor": 4
309 | }
310 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/08장_지도학습_3절_k최근접이웃-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_iris = datasets.load_iris()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐/타겟\n",
28 | "X = raw_iris.data\n",
29 | "y = raw_iris.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "#데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
66 | " metric_params=None, n_jobs=None, n_neighbors=2, p=2,\n",
67 | " weights='uniform')"
68 | ]
69 | },
70 | "execution_count": 5,
71 | "metadata": {},
72 | "output_type": "execute_result"
73 | }
74 | ],
75 | "source": [
76 | "# 학습\n",
77 | "from sklearn.neighbors import KNeighborsClassifier\n",
78 | "clf_knn = KNeighborsClassifier(n_neighbors=2)\n",
79 | "clf_knn.fit(X_tn_std, y_tn)"
80 | ]
81 | },
82 | {
83 | "cell_type": "code",
84 | "execution_count": 8,
85 | "metadata": {},
86 | "outputs": [
87 | {
88 | "name": "stdout",
89 | "output_type": "stream",
90 | "text": [
91 | "[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n",
92 | " 2]\n"
93 | ]
94 | }
95 | ],
96 | "source": [
97 | "# 예측\n",
98 | "knn_pred = clf_knn.predict(X_te_std)\n",
99 | "print(knn_pred)"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": 9,
105 | "metadata": {},
106 | "outputs": [
107 | {
108 | "name": "stdout",
109 | "output_type": "stream",
110 | "text": [
111 | "0.9473684210526315\n"
112 | ]
113 | }
114 | ],
115 | "source": [
116 | "# 정확도\n",
117 | "from sklearn.metrics import accuracy_score\n",
118 | "accuracy = accuracy_score(y_te, knn_pred)\n",
119 | "print(accuracy)"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": 10,
125 | "metadata": {},
126 | "outputs": [
127 | {
128 | "name": "stdout",
129 | "output_type": "stream",
130 | "text": [
131 | "[[13 0 0]\n",
132 | " [ 0 15 1]\n",
133 | " [ 0 1 8]]\n"
134 | ]
135 | }
136 | ],
137 | "source": [
138 | "# confusion matrix 확인 \n",
139 | "from sklearn.metrics import confusion_matrix\n",
140 | "conf_matrix = confusion_matrix(y_te, knn_pred)\n",
141 | "print(conf_matrix)"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": 22,
147 | "metadata": {},
148 | "outputs": [
149 | {
150 | "name": "stdout",
151 | "output_type": "stream",
152 | "text": [
153 | " precision recall f1-score support\n",
154 | "\n",
155 | " 0 1.00 1.00 1.00 13\n",
156 | " 1 0.94 0.94 0.94 16\n",
157 | " 2 0.89 0.89 0.89 9\n",
158 | "\n",
159 | " accuracy 0.95 38\n",
160 | " macro avg 0.94 0.94 0.94 38\n",
161 | "weighted avg 0.95 0.95 0.95 38\n",
162 | "\n"
163 | ]
164 | }
165 | ],
166 | "source": [
167 | "# 분류 레포트 확인\n",
168 | "from sklearn.metrics import classification_report\n",
169 | "class_report = classification_report(y_te, knn_pred)\n",
170 | "print(class_report)"
171 | ]
172 | },
173 | {
174 | "cell_type": "markdown",
175 | "metadata": {},
176 | "source": [
177 | "# 통합 코드"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": 11,
183 | "metadata": {},
184 | "outputs": [
185 | {
186 | "name": "stdout",
187 | "output_type": "stream",
188 | "text": [
189 | "[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n",
190 | " 2]\n",
191 | "0.9473684210526315\n",
192 | "[[13 0 0]\n",
193 | " [ 0 15 1]\n",
194 | " [ 0 1 8]]\n",
195 | " precision recall f1-score support\n",
196 | "\n",
197 | " 0 1.00 1.00 1.00 13\n",
198 | " 1 0.94 0.94 0.94 16\n",
199 | " 2 0.89 0.89 0.89 9\n",
200 | "\n",
201 | " accuracy 0.95 38\n",
202 | " macro avg 0.94 0.94 0.94 38\n",
203 | "weighted avg 0.95 0.95 0.95 38\n",
204 | "\n"
205 | ]
206 | }
207 | ],
208 | "source": [
209 | "from sklearn import datasets\n",
210 | "from sklearn.preprocessing import StandardScaler\n",
211 | "from sklearn.neighbors import KNeighborsClassifier\n",
212 | "from sklearn.model_selection import train_test_split\n",
213 | "\n",
214 | "from sklearn.metrics import accuracy_score\n",
215 | "from sklearn.metrics import confusion_matrix\n",
216 | "from sklearn.metrics import classification_report\n",
217 | "\n",
218 | "# 꽃 데이터 불러오기\n",
219 | "raw_iris = datasets.load_iris()\n",
220 | "\n",
221 | "# 피쳐 / 타겟\n",
222 | "X = raw_iris.data\n",
223 | "y = raw_iris.target\n",
224 | "\n",
225 | "# 트레이닝 / 테스트 데이터 분할\n",
226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
227 | "\n",
228 | "\n",
229 | "# 표준화 스케일\n",
230 | "std_scale = StandardScaler()\n",
231 | "std_scale.fit(X_tn)\n",
232 | "X_tn_std = std_scale.transform(X_tn)\n",
233 | "X_te_std = std_scale.transform(X_te)\n",
234 | "\n",
235 | "#학습\n",
236 | "clf_knn = KNeighborsClassifier(n_neighbors=2)\n",
237 | "clf_knn.fit(X_tn_std, y_tn)\n",
238 | "\n",
239 | "# 예측\n",
240 | "knn_pred = clf_knn.predict(X_te_std)\n",
241 | "print(knn_pred)\n",
242 | "\n",
243 | "# 정확도\n",
244 | "accuracy = accuracy_score(y_te, knn_pred)\n",
245 | "print(accuracy)\n",
246 | "\n",
247 | "# confusion matrix 확인 \n",
248 | "conf_matrix = confusion_matrix(y_te, knn_pred)\n",
249 | "print(conf_matrix)\n",
250 | "\n",
251 | "# 분류 레포트 확인\n",
252 | "class_report = classification_report(y_te, knn_pred)\n",
253 | "print(class_report)"
254 | ]
255 | },
256 | {
257 | "cell_type": "code",
258 | "execution_count": null,
259 | "metadata": {},
260 | "outputs": [],
261 | "source": []
262 | }
263 | ],
264 | "metadata": {
265 | "kernelspec": {
266 | "display_name": "Python 3",
267 | "language": "python",
268 | "name": "python3"
269 | },
270 | "language_info": {
271 | "codemirror_mode": {
272 | "name": "ipython",
273 | "version": 3
274 | },
275 | "file_extension": ".py",
276 | "mimetype": "text/x-python",
277 | "name": "python",
278 | "nbconvert_exporter": "python",
279 | "pygments_lexer": "ipython3",
280 | "version": "3.7.6"
281 | }
282 | },
283 | "nbformat": 4,
284 | "nbformat_minor": 4
285 | }
286 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/08장_지도학습_6절_나이브베이즈-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드 "
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "#데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "GaussianNB(priors=None, var_smoothing=1e-09)"
66 | ]
67 | },
68 | "execution_count": 5,
69 | "metadata": {},
70 | "output_type": "execute_result"
71 | }
72 | ],
73 | "source": [
74 | "# 나이브 베이즈 학습\n",
75 | "from sklearn.naive_bayes import GaussianNB\n",
76 | "clf_gnb = GaussianNB()\n",
77 | "clf_gnb.fit(X_tn_std, y_tn)"
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": 12,
83 | "metadata": {},
84 | "outputs": [
85 | {
86 | "name": "stdout",
87 | "output_type": "stream",
88 | "text": [
89 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
90 | " 1 1 2 0 0 1 1 1]\n"
91 | ]
92 | }
93 | ],
94 | "source": [
95 | "# 예측\n",
96 | "pred_gnb = clf_gnb.predict(X_te_std)\n",
97 | "print(pred_gnb)"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 15,
103 | "metadata": {},
104 | "outputs": [
105 | {
106 | "name": "stdout",
107 | "output_type": "stream",
108 | "text": [
109 | "0.9523809523809524\n"
110 | ]
111 | }
112 | ],
113 | "source": [
114 | "# 리콜\n",
115 | "from sklearn.metrics import recall_score\n",
116 | "recall = recall_score(y_te, pred_gnb, average='macro')\n",
117 | "print(recall)"
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": 26,
123 | "metadata": {},
124 | "outputs": [
125 | {
126 | "name": "stdout",
127 | "output_type": "stream",
128 | "text": [
129 | "[[16 0 0]\n",
130 | " [ 2 18 1]\n",
131 | " [ 0 0 8]]\n"
132 | ]
133 | }
134 | ],
135 | "source": [
136 | "# confusion matrix 확인 \n",
137 | "from sklearn.metrics import confusion_matrix\n",
138 | "conf_matrix = confusion_matrix(y_te, pred_gnb)\n",
139 | "print(conf_matrix)"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 27,
145 | "metadata": {},
146 | "outputs": [
147 | {
148 | "name": "stdout",
149 | "output_type": "stream",
150 | "text": [
151 | " precision recall f1-score support\n",
152 | "\n",
153 | " 0 0.89 1.00 0.94 16\n",
154 | " 1 1.00 0.86 0.92 21\n",
155 | " 2 0.89 1.00 0.94 8\n",
156 | "\n",
157 | " accuracy 0.93 45\n",
158 | " macro avg 0.93 0.95 0.94 45\n",
159 | "weighted avg 0.94 0.93 0.93 45\n",
160 | "\n"
161 | ]
162 | }
163 | ],
164 | "source": [
165 | "# 분류 레포트 확인\n",
166 | "from sklearn.metrics import classification_report\n",
167 | "class_report = classification_report(y_te, pred_gnb)\n",
168 | "print(class_report)"
169 | ]
170 | },
171 | {
172 | "cell_type": "markdown",
173 | "metadata": {},
174 | "source": [
175 | "# 통합코드"
176 | ]
177 | },
178 | {
179 | "cell_type": "code",
180 | "execution_count": 1,
181 | "metadata": {},
182 | "outputs": [
183 | {
184 | "name": "stdout",
185 | "output_type": "stream",
186 | "text": [
187 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
188 | " 1 1 2 0 0 1 1 1]\n",
189 | "0.9523809523809524\n",
190 | "[[16 0 0]\n",
191 | " [ 2 18 1]\n",
192 | " [ 0 0 8]]\n",
193 | " precision recall f1-score support\n",
194 | "\n",
195 | " 0 0.89 1.00 0.94 16\n",
196 | " 1 1.00 0.86 0.92 21\n",
197 | " 2 0.89 1.00 0.94 8\n",
198 | "\n",
199 | " accuracy 0.93 45\n",
200 | " macro avg 0.93 0.95 0.94 45\n",
201 | "weighted avg 0.94 0.93 0.93 45\n",
202 | "\n"
203 | ]
204 | }
205 | ],
206 | "source": [
207 | "from sklearn import datasets\n",
208 | "from sklearn.preprocessing import StandardScaler\n",
209 | "from sklearn.model_selection import train_test_split\n",
210 | "\n",
211 | "from sklearn.naive_bayes import GaussianNB\n",
212 | "\n",
213 | "from sklearn.metrics import recall_score\n",
214 | "from sklearn.metrics import confusion_matrix\n",
215 | "from sklearn.metrics import classification_report\n",
216 | "\n",
217 | "\n",
218 | "# 데이터 불러오기\n",
219 | "raw_wine = datasets.load_wine()\n",
220 | "\n",
221 | "# 피쳐, 타겟 데이터 지정\n",
222 | "X = raw_wine.data\n",
223 | "y = raw_wine.target\n",
224 | "\n",
225 | "# 트레이닝/테스트 데이터 분할\n",
226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
227 | "\n",
228 | "# 데이터 표준화\n",
229 | "std_scale = StandardScaler()\n",
230 | "std_scale.fit(X_tn)\n",
231 | "X_tn_std = std_scale.transform(X_tn)\n",
232 | "X_te_std = std_scale.transform(X_te)\n",
233 | "\n",
234 | "# 나이브 베이즈 학습\n",
235 | "clf_gnb = GaussianNB()\n",
236 | "clf_gnb.fit(X_tn_std, y_tn)\n",
237 | "\n",
238 | "# 예측\n",
239 | "pred_gnb = clf_gnb.predict(X_te_std)\n",
240 | "print(pred_gnb)\n",
241 | "\n",
242 | "# 리콜\n",
243 | "recall = recall_score(y_te, pred_gnb, average='macro')\n",
244 | "print(recall)\n",
245 | "\n",
246 | "# confusion matrix 확인 \n",
247 | "conf_matrix = confusion_matrix(y_te, pred_gnb)\n",
248 | "print(conf_matrix)\n",
249 | "\n",
250 | "# 분류 레포트 확인\n",
251 | "class_report = classification_report(y_te, pred_gnb)\n",
252 | "print(class_report)"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": null,
258 | "metadata": {},
259 | "outputs": [],
260 | "source": []
261 | }
262 | ],
263 | "metadata": {
264 | "kernelspec": {
265 | "display_name": "Python 3",
266 | "language": "python",
267 | "name": "python3"
268 | },
269 | "language_info": {
270 | "codemirror_mode": {
271 | "name": "ipython",
272 | "version": 3
273 | },
274 | "file_extension": ".py",
275 | "mimetype": "text/x-python",
276 | "name": "python",
277 | "nbconvert_exporter": "python",
278 | "pygments_lexer": "ipython3",
279 | "version": "3.7.6"
280 | }
281 | },
282 | "nbformat": 4,
283 | "nbformat_minor": 4
284 | }
285 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/08장_지도학습_7절_의사결정나무-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
66 | " max_depth=None, max_features=None, max_leaf_nodes=None,\n",
67 | " min_impurity_decrease=0.0, min_impurity_split=None,\n",
68 | " min_samples_leaf=1, min_samples_split=2,\n",
69 | " min_weight_fraction_leaf=0.0, presort='deprecated',\n",
70 | " random_state=0, splitter='best')"
71 | ]
72 | },
73 | "execution_count": 5,
74 | "metadata": {},
75 | "output_type": "execute_result"
76 | }
77 | ],
78 | "source": [
79 | "# 의사결정나무 학습\n",
80 | "from sklearn import tree \n",
81 | "clf_tree = tree.DecisionTreeClassifier(random_state=0)\n",
82 | "clf_tree.fit(X_tn_std, y_tn)"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 6,
88 | "metadata": {},
89 | "outputs": [
90 | {
91 | "name": "stdout",
92 | "output_type": "stream",
93 | "text": [
94 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
95 | " 1 1 2 1 0 1 1 1]\n"
96 | ]
97 | }
98 | ],
99 | "source": [
100 | "# 예측\n",
101 | "pred_tree = clf_tree.predict(X_te_std)\n",
102 | "print(pred_tree)"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": 7,
108 | "metadata": {},
109 | "outputs": [
110 | {
111 | "name": "stdout",
112 | "output_type": "stream",
113 | "text": [
114 | "0.9349141206870346\n"
115 | ]
116 | }
117 | ],
118 | "source": [
119 | "# f1 score\n",
120 | "from sklearn.metrics import f1_score\n",
121 | "f1 = f1_score(y_te, pred_tree, average='macro')\n",
122 | "print(f1)"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": 8,
128 | "metadata": {},
129 | "outputs": [
130 | {
131 | "name": "stdout",
132 | "output_type": "stream",
133 | "text": [
134 | "[[14 2 0]\n",
135 | " [ 0 20 1]\n",
136 | " [ 0 0 8]]\n"
137 | ]
138 | }
139 | ],
140 | "source": [
141 | "# confusion matrix 확인 \n",
142 | "from sklearn.metrics import confusion_matrix\n",
143 | "conf_matrix = confusion_matrix(y_te, pred_tree)\n",
144 | "print(conf_matrix)"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 9,
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "name": "stdout",
154 | "output_type": "stream",
155 | "text": [
156 | " precision recall f1-score support\n",
157 | "\n",
158 | " 0 1.00 0.88 0.93 16\n",
159 | " 1 0.91 0.95 0.93 21\n",
160 | " 2 0.89 1.00 0.94 8\n",
161 | "\n",
162 | " accuracy 0.93 45\n",
163 | " macro avg 0.93 0.94 0.93 45\n",
164 | "weighted avg 0.94 0.93 0.93 45\n",
165 | "\n"
166 | ]
167 | }
168 | ],
169 | "source": [
170 | "# 분류 레포트 확인\n",
171 | "from sklearn.metrics import classification_report\n",
172 | "class_report = classification_report(y_te, pred_tree)\n",
173 | "print(class_report)"
174 | ]
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "metadata": {},
179 | "source": [
180 | "# 통합 코드"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": 10,
186 | "metadata": {},
187 | "outputs": [
188 | {
189 | "name": "stdout",
190 | "output_type": "stream",
191 | "text": [
192 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
193 | " 1 1 2 1 0 1 1 1]\n",
194 | "0.9349141206870346\n",
195 | "[[14 2 0]\n",
196 | " [ 0 20 1]\n",
197 | " [ 0 0 8]]\n",
198 | " precision recall f1-score support\n",
199 | "\n",
200 | " 0 1.00 0.88 0.93 16\n",
201 | " 1 0.91 0.95 0.93 21\n",
202 | " 2 0.89 1.00 0.94 8\n",
203 | "\n",
204 | " accuracy 0.93 45\n",
205 | " macro avg 0.93 0.94 0.93 45\n",
206 | "weighted avg 0.94 0.93 0.93 45\n",
207 | "\n"
208 | ]
209 | }
210 | ],
211 | "source": [
212 | "from sklearn import datasets\n",
213 | "from sklearn.preprocessing import StandardScaler\n",
214 | "from sklearn.model_selection import train_test_split\n",
215 | "\n",
216 | "from sklearn import tree \n",
217 | "\n",
218 | "from sklearn.metrics import f1_score\n",
219 | "from sklearn.metrics import confusion_matrix\n",
220 | "from sklearn.metrics import classification_report\n",
221 | "\n",
222 | "\n",
223 | "# 데이터 불러오기\n",
224 | "raw_wine = datasets.load_wine()\n",
225 | "\n",
226 | "# 피쳐, 타겟 데이터 지정\n",
227 | "X = raw_wine.data\n",
228 | "y = raw_wine.target\n",
229 | "\n",
230 | "# 트레이닝/테스트 데이터 분할\n",
231 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
232 | "\n",
233 | "# 데이터 표준화\n",
234 | "std_scale = StandardScaler()\n",
235 | "std_scale.fit(X_tn)\n",
236 | "X_tn_std = std_scale.transform(X_tn)\n",
237 | "X_te_std = std_scale.transform(X_te)\n",
238 | "\n",
239 | "# 의사결정나무 학습\n",
240 | "clf_tree = tree.DecisionTreeClassifier(random_state=0)\n",
241 | "clf_tree.fit(X_tn_std, y_tn)\n",
242 | "\n",
243 | "# 예측\n",
244 | "pred_tree = clf_tree.predict(X_te_std)\n",
245 | "print(pred_tree)\n",
246 | "\n",
247 | "# f1 score\n",
248 | "from sklearn.metrics import f1_score\n",
249 | "f1 = f1_score(y_te, pred_tree, average='macro')\n",
250 | "print(f1)\n",
251 | "\n",
252 | "# confusion matrix 확인 \n",
253 | "conf_matrix = confusion_matrix(y_te, pred_tree)\n",
254 | "print(conf_matrix)\n",
255 | "\n",
256 | "# 분류 레포트 확인\n",
257 | "class_report = classification_report(y_te, pred_tree)\n",
258 | "print(class_report)"
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": null,
264 | "metadata": {},
265 | "outputs": [],
266 | "source": []
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": null,
271 | "metadata": {},
272 | "outputs": [],
273 | "source": []
274 | },
275 | {
276 | "cell_type": "code",
277 | "execution_count": null,
278 | "metadata": {},
279 | "outputs": [],
280 | "source": []
281 | }
282 | ],
283 | "metadata": {
284 | "kernelspec": {
285 | "display_name": "Python 3",
286 | "language": "python",
287 | "name": "python3"
288 | },
289 | "language_info": {
290 | "codemirror_mode": {
291 | "name": "ipython",
292 | "version": 3
293 | },
294 | "file_extension": ".py",
295 | "mimetype": "text/x-python",
296 | "name": "python",
297 | "nbconvert_exporter": "python",
298 | "pygments_lexer": "ipython3",
299 | "version": "3.7.6"
300 | }
301 | },
302 | "nbformat": 4,
303 | "nbformat_minor": 4
304 | }
305 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/08장_지도학습_8절_서포트벡터머신-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별코드 "
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,\n",
66 | " decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',\n",
67 | " max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,\n",
68 | " verbose=False)"
69 | ]
70 | },
71 | "execution_count": 5,
72 | "metadata": {},
73 | "output_type": "execute_result"
74 | }
75 | ],
76 | "source": [
77 | "# 서포트벡터머신 학습\n",
78 | "from sklearn import svm \n",
79 | "clf_svm_lr = svm.SVC(kernel='linear', random_state=0)\n",
80 | "clf_svm_lr.fit(X_tn_std, y_tn)"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": 6,
86 | "metadata": {},
87 | "outputs": [
88 | {
89 | "name": "stdout",
90 | "output_type": "stream",
91 | "text": [
92 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
93 | " 1 1 2 0 0 1 1 1]\n"
94 | ]
95 | }
96 | ],
97 | "source": [
98 | "# 예측\n",
99 | "pred_svm = clf_svm_lr.predict(X_te_std)\n",
100 | "print(pred_svm)"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 7,
106 | "metadata": {},
107 | "outputs": [
108 | {
109 | "name": "stdout",
110 | "output_type": "stream",
111 | "text": [
112 | "1.0\n"
113 | ]
114 | }
115 | ],
116 | "source": [
117 | "# 정확도\n",
118 | "from sklearn.metrics import accuracy_score\n",
119 | "accuracy = accuracy_score(y_te, pred_svm)\n",
120 | "print(accuracy)"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 8,
126 | "metadata": {},
127 | "outputs": [
128 | {
129 | "name": "stdout",
130 | "output_type": "stream",
131 | "text": [
132 | "[[16 0 0]\n",
133 | " [ 0 21 0]\n",
134 | " [ 0 0 8]]\n"
135 | ]
136 | }
137 | ],
138 | "source": [
139 | "# confusion matrix 확인 \n",
140 | "from sklearn.metrics import confusion_matrix\n",
141 | "conf_matrix = confusion_matrix(y_te, pred_svm)\n",
142 | "print(conf_matrix)"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": 9,
148 | "metadata": {},
149 | "outputs": [
150 | {
151 | "name": "stdout",
152 | "output_type": "stream",
153 | "text": [
154 | " precision recall f1-score support\n",
155 | "\n",
156 | " 0 1.00 1.00 1.00 16\n",
157 | " 1 1.00 1.00 1.00 21\n",
158 | " 2 1.00 1.00 1.00 8\n",
159 | "\n",
160 | " accuracy 1.00 45\n",
161 | " macro avg 1.00 1.00 1.00 45\n",
162 | "weighted avg 1.00 1.00 1.00 45\n",
163 | "\n"
164 | ]
165 | }
166 | ],
167 | "source": [
168 | "# 분류 레포트 확인\n",
169 | "from sklearn.metrics import classification_report\n",
170 | "class_report = classification_report(y_te, pred_svm)\n",
171 | "print(class_report)"
172 | ]
173 | },
174 | {
175 | "cell_type": "markdown",
176 | "metadata": {},
177 | "source": [
178 | "# 통합코드"
179 | ]
180 | },
181 | {
182 | "cell_type": "code",
183 | "execution_count": 1,
184 | "metadata": {},
185 | "outputs": [
186 | {
187 | "name": "stdout",
188 | "output_type": "stream",
189 | "text": [
190 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
191 | " 1 1 2 0 0 1 1 1]\n",
192 | "1.0\n",
193 | "[[16 0 0]\n",
194 | " [ 0 21 0]\n",
195 | " [ 0 0 8]]\n",
196 | " precision recall f1-score support\n",
197 | "\n",
198 | " 0 1.00 1.00 1.00 16\n",
199 | " 1 1.00 1.00 1.00 21\n",
200 | " 2 1.00 1.00 1.00 8\n",
201 | "\n",
202 | " accuracy 1.00 45\n",
203 | " macro avg 1.00 1.00 1.00 45\n",
204 | "weighted avg 1.00 1.00 1.00 45\n",
205 | "\n"
206 | ]
207 | }
208 | ],
209 | "source": [
210 | "from sklearn import datasets\n",
211 | "from sklearn.preprocessing import StandardScaler\n",
212 | "from sklearn.model_selection import train_test_split\n",
213 | "\n",
214 | "from sklearn import svm \n",
215 | "\n",
216 | "from sklearn.metrics import accuracy_score\n",
217 | "from sklearn.metrics import confusion_matrix\n",
218 | "from sklearn.metrics import classification_report\n",
219 | "\n",
220 | "# 데이터 불러오기\n",
221 | "raw_wine = datasets.load_wine()\n",
222 | "\n",
223 | "# 피쳐, 타겟 데이터 지정\n",
224 | "X = raw_wine.data\n",
225 | "y = raw_wine.target\n",
226 | "\n",
227 | "# 트레이닝/테스트 데이터 분할\n",
228 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
229 | "\n",
230 | "# 데이터 표준화\n",
231 | "std_scale = StandardScaler()\n",
232 | "std_scale.fit(X_tn)\n",
233 | "X_tn_std = std_scale.transform(X_tn)\n",
234 | "X_te_std = std_scale.transform(X_te)\n",
235 | "\n",
236 | "# 서포트벡터머신 학습\n",
237 | "clf_svm_lr = svm.SVC(kernel='linear', random_state=0)\n",
238 | "clf_svm_lr.fit(X_tn_std, y_tn)\n",
239 | "\n",
240 | "# 예측\n",
241 | "pred_svm = clf_svm_lr.predict(X_te_std)\n",
242 | "print(pred_svm)\n",
243 | "\n",
244 | "# 정확도\n",
245 | "accuracy = accuracy_score(y_te, pred_svm)\n",
246 | "print(accuracy)\n",
247 | "\n",
248 | "# confusion matrix 확인 \n",
249 | "conf_matrix = confusion_matrix(y_te, pred_svm)\n",
250 | "print(conf_matrix)\n",
251 | "\n",
252 | "# 분류 레포트 확인\n",
253 | "class_report = classification_report(y_te, pred_svm)\n",
254 | "print(class_report)"
255 | ]
256 | },
257 | {
258 | "cell_type": "code",
259 | "execution_count": null,
260 | "metadata": {},
261 | "outputs": [],
262 | "source": []
263 | }
264 | ],
265 | "metadata": {
266 | "kernelspec": {
267 | "display_name": "Python 3",
268 | "language": "python",
269 | "name": "python3"
270 | },
271 | "language_info": {
272 | "codemirror_mode": {
273 | "name": "ipython",
274 | "version": 3
275 | },
276 | "file_extension": ".py",
277 | "mimetype": "text/x-python",
278 | "name": "python",
279 | "nbconvert_exporter": "python",
280 | "pygments_lexer": "ipython3",
281 | "version": "3.7.6"
282 | }
283 | },
284 | "nbformat": 4,
285 | "nbformat_minor": 4
286 | }
287 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/09장_앙상블_3절_1_랜덤포레스트-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 2,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 3,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 4,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 5,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 6,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
66 | " criterion='gini', max_depth=2, max_features='auto',\n",
67 | " max_leaf_nodes=None, max_samples=None,\n",
68 | " min_impurity_decrease=0.0, min_impurity_split=None,\n",
69 | " min_samples_leaf=1, min_samples_split=2,\n",
70 | " min_weight_fraction_leaf=0.0, n_estimators=100,\n",
71 | " n_jobs=None, oob_score=False, random_state=0, verbose=0,\n",
72 | " warm_start=False)"
73 | ]
74 | },
75 | "execution_count": 6,
76 | "metadata": {},
77 | "output_type": "execute_result"
78 | }
79 | ],
80 | "source": [
81 | "from sklearn.ensemble import RandomForestClassifier\n",
82 | "clf_rf = RandomForestClassifier(max_depth=2, \n",
83 | " random_state=0)\n",
84 | "clf_rf.fit(X_tn_std, y_tn)"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 7,
90 | "metadata": {},
91 | "outputs": [
92 | {
93 | "name": "stdout",
94 | "output_type": "stream",
95 | "text": [
96 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
97 | " 1 1 2 0 0 1 1 1]\n"
98 | ]
99 | }
100 | ],
101 | "source": [
102 | "# 예측\n",
103 | "pred_rf = clf_rf.predict(X_te_std)\n",
104 | "print(pred_rf)"
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 8,
110 | "metadata": {},
111 | "outputs": [
112 | {
113 | "name": "stdout",
114 | "output_type": "stream",
115 | "text": [
116 | "0.9555555555555556\n"
117 | ]
118 | }
119 | ],
120 | "source": [
121 | "# 정확도\n",
122 | "from sklearn.metrics import accuracy_score\n",
123 | "accuracy = accuracy_score(y_te, pred_rf)\n",
124 | "print(accuracy)"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": 9,
130 | "metadata": {},
131 | "outputs": [
132 | {
133 | "name": "stdout",
134 | "output_type": "stream",
135 | "text": [
136 | "[[16 0 0]\n",
137 | " [ 1 19 1]\n",
138 | " [ 0 0 8]]\n"
139 | ]
140 | }
141 | ],
142 | "source": [
143 | "# confusion matrix 확인 \n",
144 | "from sklearn.metrics import confusion_matrix\n",
145 | "conf_matrix = confusion_matrix(y_te, pred_rf)\n",
146 | "print(conf_matrix)"
147 | ]
148 | },
149 | {
150 | "cell_type": "code",
151 | "execution_count": 10,
152 | "metadata": {},
153 | "outputs": [
154 | {
155 | "name": "stdout",
156 | "output_type": "stream",
157 | "text": [
158 | " precision recall f1-score support\n",
159 | "\n",
160 | " 0 0.94 1.00 0.97 16\n",
161 | " 1 1.00 0.90 0.95 21\n",
162 | " 2 0.89 1.00 0.94 8\n",
163 | "\n",
164 | " accuracy 0.96 45\n",
165 | " macro avg 0.94 0.97 0.95 45\n",
166 | "weighted avg 0.96 0.96 0.96 45\n",
167 | "\n"
168 | ]
169 | }
170 | ],
171 | "source": [
172 | "# 분류 레포트 확인\n",
173 | "from sklearn.metrics import classification_report\n",
174 | "class_report = classification_report(y_te, pred_rf)\n",
175 | "print(class_report)"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {},
181 | "source": [
182 | "# 통합 코드"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": 11,
188 | "metadata": {},
189 | "outputs": [
190 | {
191 | "name": "stdout",
192 | "output_type": "stream",
193 | "text": [
194 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
195 | " 1 1 2 0 0 1 1 1]\n",
196 | "0.9555555555555556\n",
197 | "[[16 0 0]\n",
198 | " [ 1 19 1]\n",
199 | " [ 0 0 8]]\n",
200 | " precision recall f1-score support\n",
201 | "\n",
202 | " 0 0.94 1.00 0.97 16\n",
203 | " 1 1.00 0.90 0.95 21\n",
204 | " 2 0.89 1.00 0.94 8\n",
205 | "\n",
206 | " accuracy 0.96 45\n",
207 | " macro avg 0.94 0.97 0.95 45\n",
208 | "weighted avg 0.96 0.96 0.96 45\n",
209 | "\n"
210 | ]
211 | }
212 | ],
213 | "source": [
214 | "from sklearn import datasets\n",
215 | "from sklearn.model_selection import train_test_split\n",
216 | "from sklearn.preprocessing import StandardScaler\n",
217 | "\n",
218 | "from sklearn.ensemble import RandomForestClassifier\n",
219 | "\n",
220 | "from sklearn.metrics import accuracy_score\n",
221 | "from sklearn.metrics import confusion_matrix\n",
222 | "from sklearn.metrics import classification_report\n",
223 | "\n",
224 | "# 데이터 불러오기\n",
225 | "raw_wine = datasets.load_wine()\n",
226 | "\n",
227 | "# 피쳐, 타겟 데이터 지정\n",
228 | "X = raw_wine.data\n",
229 | "y = raw_wine.target\n",
230 | "\n",
231 | "# 트레이닝/테스트 데이터 분할\n",
232 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
233 | "\n",
234 | "# 데이터 표준화\n",
235 | "std_scale = StandardScaler()\n",
236 | "std_scale.fit(X_tn)\n",
237 | "X_tn_std = std_scale.transform(X_tn)\n",
238 | "X_te_std = std_scale.transform(X_te)\n",
239 | "\n",
240 | "# 랜덤포레스트 학습\n",
241 | "clf_rf = RandomForestClassifier(max_depth=2, \n",
242 | " random_state=0)\n",
243 | "clf_rf.fit(X_tn_std, y_tn)\n",
244 | "\n",
245 | "# 예측\n",
246 | "pred_rf = clf_rf.predict(X_te_std)\n",
247 | "print(pred_rf)\n",
248 | "\n",
249 | "# 정확도\n",
250 | "accuracy = accuracy_score(y_te, pred_rf)\n",
251 | "print(accuracy)\n",
252 | "\n",
253 | "# confusion matrix 확인 \n",
254 | "conf_matrix = confusion_matrix(y_te, pred_rf)\n",
255 | "print(conf_matrix)\n",
256 | "\n",
257 | "# 분류 레포트 확인\n",
258 | "class_report = classification_report(y_te, pred_rf)\n",
259 | "print(class_report)"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": null,
265 | "metadata": {},
266 | "outputs": [],
267 | "source": []
268 | }
269 | ],
270 | "metadata": {
271 | "kernelspec": {
272 | "display_name": "Python 3",
273 | "language": "python",
274 | "name": "python3"
275 | },
276 | "language_info": {
277 | "codemirror_mode": {
278 | "name": "ipython",
279 | "version": 3
280 | },
281 | "file_extension": ".py",
282 | "mimetype": "text/x-python",
283 | "name": "python",
284 | "nbconvert_exporter": "python",
285 | "pygments_lexer": "ipython3",
286 | "version": "3.7.6"
287 | }
288 | },
289 | "nbformat": 4,
290 | "nbformat_minor": 4
291 | }
292 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/09장_앙상블_3절_2_배깅-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 16,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "BaggingClassifier(base_estimator=GaussianNB(priors=None, var_smoothing=1e-09),\n",
66 | " bootstrap=True, bootstrap_features=False, max_features=1.0,\n",
67 | " max_samples=1.0, n_estimators=10, n_jobs=None,\n",
68 | " oob_score=False, random_state=0, verbose=0, warm_start=False)"
69 | ]
70 | },
71 | "execution_count": 16,
72 | "metadata": {},
73 | "output_type": "execute_result"
74 | }
75 | ],
76 | "source": [
77 | "# 배깅 학습\n",
78 | "from sklearn.naive_bayes import GaussianNB\n",
79 | "from sklearn.ensemble import BaggingClassifier\n",
80 | "clf_bagging = BaggingClassifier(base_estimator=GaussianNB(),\n",
81 | " n_estimators=10, \n",
82 | " random_state=0)\n",
83 | "clf_bagging.fit(X_tn_std, y_tn)"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 17,
89 | "metadata": {},
90 | "outputs": [
91 | {
92 | "name": "stdout",
93 | "output_type": "stream",
94 | "text": [
95 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
96 | " 1 1 2 0 0 1 1 1]\n"
97 | ]
98 | }
99 | ],
100 | "source": [
101 | "# 예측\n",
102 | "pred_bagging = clf_bagging.predict(X_te_std)\n",
103 | "print(pred_bagging)"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 18,
109 | "metadata": {},
110 | "outputs": [
111 | {
112 | "name": "stdout",
113 | "output_type": "stream",
114 | "text": [
115 | "0.9555555555555556\n"
116 | ]
117 | }
118 | ],
119 | "source": [
120 | "# 정확도\n",
121 | "from sklearn.metrics import accuracy_score\n",
122 | "accuracy = accuracy_score(y_te, pred_bagging)\n",
123 | "print(accuracy)"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": 19,
129 | "metadata": {},
130 | "outputs": [
131 | {
132 | "name": "stdout",
133 | "output_type": "stream",
134 | "text": [
135 | "[[16 0 0]\n",
136 | " [ 1 19 1]\n",
137 | " [ 0 0 8]]\n"
138 | ]
139 | }
140 | ],
141 | "source": [
142 | "# confusion matrix 확인 \n",
143 | "from sklearn.metrics import confusion_matrix\n",
144 | "conf_matrix = confusion_matrix(y_te, pred_bagging)\n",
145 | "print(conf_matrix)"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": 20,
151 | "metadata": {},
152 | "outputs": [
153 | {
154 | "name": "stdout",
155 | "output_type": "stream",
156 | "text": [
157 | " precision recall f1-score support\n",
158 | "\n",
159 | " 0 0.94 1.00 0.97 16\n",
160 | " 1 1.00 0.90 0.95 21\n",
161 | " 2 0.89 1.00 0.94 8\n",
162 | "\n",
163 | " accuracy 0.96 45\n",
164 | " macro avg 0.94 0.97 0.95 45\n",
165 | "weighted avg 0.96 0.96 0.96 45\n",
166 | "\n"
167 | ]
168 | }
169 | ],
170 | "source": [
171 | "# 분류 레포트 확인\n",
172 | "from sklearn.metrics import classification_report\n",
173 | "class_report = classification_report(y_te, pred_bagging)\n",
174 | "print(class_report)"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "# 통합 코드"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": 21,
187 | "metadata": {},
188 | "outputs": [
189 | {
190 | "name": "stdout",
191 | "output_type": "stream",
192 | "text": [
193 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
194 | " 1 1 2 0 0 1 1 1]\n",
195 | "0.9555555555555556\n",
196 | "[[16 0 0]\n",
197 | " [ 1 19 1]\n",
198 | " [ 0 0 8]]\n",
199 | " precision recall f1-score support\n",
200 | "\n",
201 | " 0 0.94 1.00 0.97 16\n",
202 | " 1 1.00 0.90 0.95 21\n",
203 | " 2 0.89 1.00 0.94 8\n",
204 | "\n",
205 | " accuracy 0.96 45\n",
206 | " macro avg 0.94 0.97 0.95 45\n",
207 | "weighted avg 0.96 0.96 0.96 45\n",
208 | "\n"
209 | ]
210 | }
211 | ],
212 | "source": [
213 | "from sklearn import datasets\n",
214 | "from sklearn.model_selection import train_test_split\n",
215 | "from sklearn.preprocessing import StandardScaler\n",
216 | "\n",
217 | "from sklearn.naive_bayes import GaussianNB\n",
218 | "from sklearn.ensemble import BaggingClassifier\n",
219 | "\n",
220 | "from sklearn.metrics import accuracy_score\n",
221 | "from sklearn.metrics import confusion_matrix\n",
222 | "from sklearn.metrics import classification_report\n",
223 | "\n",
224 | "# 데이터 불러오기\n",
225 | "raw_wine = datasets.load_wine()\n",
226 | "\n",
227 | "# 피쳐, 타겟 데이터 지정\n",
228 | "X = raw_wine.data\n",
229 | "y = raw_wine.target\n",
230 | "\n",
231 | "# 트레이닝/테스트 데이터 분할\n",
232 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
233 | "\n",
234 | "# 데이터 표준화\n",
235 | "std_scale = StandardScaler()\n",
236 | "std_scale.fit(X_tn)\n",
237 | "X_tn_std = std_scale.transform(X_tn)\n",
238 | "X_te_std = std_scale.transform(X_te)\n",
239 | "\n",
240 | "# 배깅 학습\n",
241 | "clf_bagging = BaggingClassifier(base_estimator=GaussianNB(),\n",
242 | " n_estimators=10, \n",
243 | " random_state=0)\n",
244 | "clf_bagging.fit(X_tn_std, y_tn)\n",
245 | "\n",
246 | "# 예측\n",
247 | "pred_bagging = clf_bagging.predict(X_te_std)\n",
248 | "print(pred_bagging)\n",
249 | "\n",
250 | "# 정확도\n",
251 | "accuracy = accuracy_score(y_te, pred_bagging)\n",
252 | "print(accuracy)\n",
253 | "\n",
254 | "# confusion matrix 확인 \n",
255 | "conf_matrix = confusion_matrix(y_te, pred_bagging)\n",
256 | "print(conf_matrix)\n",
257 | "\n",
258 | "# 분류 레포트 확인\n",
259 | "class_report = classification_report(y_te, pred_bagging)\n",
260 | "print(class_report)"
261 | ]
262 | },
263 | {
264 | "cell_type": "code",
265 | "execution_count": null,
266 | "metadata": {},
267 | "outputs": [],
268 | "source": []
269 | }
270 | ],
271 | "metadata": {
272 | "kernelspec": {
273 | "display_name": "Python 3",
274 | "language": "python",
275 | "name": "python3"
276 | },
277 | "language_info": {
278 | "codemirror_mode": {
279 | "name": "ipython",
280 | "version": 3
281 | },
282 | "file_extension": ".py",
283 | "mimetype": "text/x-python",
284 | "name": "python",
285 | "nbconvert_exporter": "python",
286 | "pygments_lexer": "ipython3",
287 | "version": "3.7.6"
288 | }
289 | },
290 | "nbformat": 4,
291 | "nbformat_minor": 4
292 | }
293 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/09장_앙상블_4절_1_adaboost-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 11,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_breast_cancer = datasets.load_breast_cancer()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 12,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_breast_cancer.data\n",
29 | "y = raw_breast_cancer.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 13,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 14,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 15,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1.0,\n",
66 | " n_estimators=50, random_state=0)"
67 | ]
68 | },
69 | "execution_count": 15,
70 | "metadata": {},
71 | "output_type": "execute_result"
72 | }
73 | ],
74 | "source": [
75 | "# 에이다 부스트 학습\n",
76 | "from sklearn.ensemble import AdaBoostClassifier\n",
77 | "clf_ada = AdaBoostClassifier(random_state=0)\n",
78 | "clf_ada.fit(X_tn_std, y_tn)"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 16,
84 | "metadata": {},
85 | "outputs": [
86 | {
87 | "name": "stdout",
88 | "output_type": "stream",
89 | "text": [
90 | "[0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
91 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n",
92 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n",
93 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0]\n"
94 | ]
95 | }
96 | ],
97 | "source": [
98 | "# 예측\n",
99 | "pred_ada = clf_ada.predict(X_te_std)\n",
100 | "print(pred_ada)"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 17,
106 | "metadata": {},
107 | "outputs": [
108 | {
109 | "name": "stdout",
110 | "output_type": "stream",
111 | "text": [
112 | "0.9790209790209791\n"
113 | ]
114 | }
115 | ],
116 | "source": [
117 | "# 정확도\n",
118 | "from sklearn.metrics import accuracy_score\n",
119 | "accuracy = accuracy_score(y_te, pred_ada)\n",
120 | "print(accuracy)"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 18,
126 | "metadata": {},
127 | "outputs": [
128 | {
129 | "name": "stdout",
130 | "output_type": "stream",
131 | "text": [
132 | "[[52 1]\n",
133 | " [ 2 88]]\n"
134 | ]
135 | }
136 | ],
137 | "source": [
138 | "# confusion matrix 확인 \n",
139 | "from sklearn.metrics import confusion_matrix\n",
140 | "conf_matrix = confusion_matrix(y_te, pred_ada)\n",
141 | "print(conf_matrix)"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": 19,
147 | "metadata": {},
148 | "outputs": [
149 | {
150 | "name": "stdout",
151 | "output_type": "stream",
152 | "text": [
153 | " precision recall f1-score support\n",
154 | "\n",
155 | " 0 0.96 0.98 0.97 53\n",
156 | " 1 0.99 0.98 0.98 90\n",
157 | "\n",
158 | " accuracy 0.98 143\n",
159 | " macro avg 0.98 0.98 0.98 143\n",
160 | "weighted avg 0.98 0.98 0.98 143\n",
161 | "\n"
162 | ]
163 | }
164 | ],
165 | "source": [
166 | "# 분류 레포트 확인\n",
167 | "from sklearn.metrics import classification_report\n",
168 | "class_report = classification_report(y_te, pred_ada)\n",
169 | "print(class_report)"
170 | ]
171 | },
172 | {
173 | "cell_type": "markdown",
174 | "metadata": {},
175 | "source": [
176 | "# 통합 코드 "
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "execution_count": 20,
182 | "metadata": {},
183 | "outputs": [
184 | {
185 | "name": "stdout",
186 | "output_type": "stream",
187 | "text": [
188 | "[0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
189 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n",
190 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n",
191 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0]\n",
192 | "0.9790209790209791\n",
193 | "[[52 1]\n",
194 | " [ 2 88]]\n",
195 | " precision recall f1-score support\n",
196 | "\n",
197 | " 0 0.96 0.98 0.97 53\n",
198 | " 1 0.99 0.98 0.98 90\n",
199 | "\n",
200 | " accuracy 0.98 143\n",
201 | " macro avg 0.98 0.98 0.98 143\n",
202 | "weighted avg 0.98 0.98 0.98 143\n",
203 | "\n"
204 | ]
205 | }
206 | ],
207 | "source": [
208 | "from sklearn import datasets\n",
209 | "from sklearn.model_selection import train_test_split\n",
210 | "from sklearn.preprocessing import StandardScaler\n",
211 | "\n",
212 | "from sklearn.ensemble import AdaBoostClassifier\n",
213 | "\n",
214 | "from sklearn.metrics import accuracy_score\n",
215 | "from sklearn.metrics import confusion_matrix\n",
216 | "from sklearn.metrics import classification_report\n",
217 | "\n",
218 | "# 데이터 불러오기\n",
219 | "raw_breast_cancer = datasets.load_breast_cancer()\n",
220 | "\n",
221 | "# 피쳐, 타겟 데이터 지정\n",
222 | "X = raw_breast_cancer.data\n",
223 | "y = raw_breast_cancer.target\n",
224 | "\n",
225 | "# 트레이닝/테스트 데이터 분할\n",
226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
227 | "\n",
228 | "# 데이터 표준화\n",
229 | "std_scale = StandardScaler()\n",
230 | "std_scale.fit(X_tn)\n",
231 | "X_tn_std = std_scale.transform(X_tn)\n",
232 | "X_te_std = std_scale.transform(X_te)\n",
233 | "\n",
234 | "# 에이다 부스트 학습\n",
235 | "clf_ada = AdaBoostClassifier(random_state=0)\n",
236 | "clf_ada.fit(X_tn_std, y_tn)\n",
237 | "\n",
238 | "# 예측\n",
239 | "pred_ada = clf_ada.predict(X_te_std)\n",
240 | "print(pred_ada)\n",
241 | "\n",
242 | "# 정확도\n",
243 | "accuracy = accuracy_score(y_te, pred_ada)\n",
244 | "print(accuracy)\n",
245 | "\n",
246 | "# confusion matrix 확인 \n",
247 | "conf_matrix = confusion_matrix(y_te, pred_ada)\n",
248 | "print(conf_matrix)\n",
249 | "\n",
250 | "# 분류 레포트 확인\n",
251 | "class_report = classification_report(y_te, pred_ada)\n",
252 | "print(class_report)"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": null,
258 | "metadata": {},
259 | "outputs": [],
260 | "source": []
261 | }
262 | ],
263 | "metadata": {
264 | "kernelspec": {
265 | "display_name": "Python 3",
266 | "language": "python",
267 | "name": "python3"
268 | },
269 | "language_info": {
270 | "codemirror_mode": {
271 | "name": "ipython",
272 | "version": 3
273 | },
274 | "file_extension": ".py",
275 | "mimetype": "text/x-python",
276 | "name": "python",
277 | "nbconvert_exporter": "python",
278 | "pygments_lexer": "ipython3",
279 | "version": "3.7.6"
280 | }
281 | },
282 | "nbformat": 4,
283 | "nbformat_minor": 4
284 | }
285 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/09장_앙상블_4절_2_gradient_boost-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_breast_cancer = datasets.load_breast_cancer()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_breast_cancer.data\n",
29 | "y = raw_breast_cancer.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,\n",
66 | " learning_rate=0.01, loss='deviance', max_depth=2,\n",
67 | " max_features=None, max_leaf_nodes=None,\n",
68 | " min_impurity_decrease=0.0, min_impurity_split=None,\n",
69 | " min_samples_leaf=1, min_samples_split=2,\n",
70 | " min_weight_fraction_leaf=0.0, n_estimators=100,\n",
71 | " n_iter_no_change=None, presort='deprecated',\n",
72 | " random_state=0, subsample=1.0, tol=0.0001,\n",
73 | " validation_fraction=0.1, verbose=0,\n",
74 | " warm_start=False)"
75 | ]
76 | },
77 | "execution_count": 5,
78 | "metadata": {},
79 | "output_type": "execute_result"
80 | }
81 | ],
82 | "source": [
83 | "# Gradient Boosting 학습\n",
84 | "from sklearn.ensemble import GradientBoostingClassifier\n",
85 | "clf_gbt = GradientBoostingClassifier(max_depth=2, \n",
86 | " learning_rate=0.01,\n",
87 | " random_state=0)\n",
88 | "clf_gbt.fit(X_tn_std, y_tn)"
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": 6,
94 | "metadata": {},
95 | "outputs": [
96 | {
97 | "name": "stdout",
98 | "output_type": "stream",
99 | "text": [
100 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
101 | " 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1\n",
102 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n",
103 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n"
104 | ]
105 | }
106 | ],
107 | "source": [
108 | "# 예측\n",
109 | "pred_gboost = clf_gbt.predict(X_te_std)\n",
110 | "print(pred_gboost)"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 7,
116 | "metadata": {},
117 | "outputs": [
118 | {
119 | "name": "stdout",
120 | "output_type": "stream",
121 | "text": [
122 | "0.965034965034965\n"
123 | ]
124 | }
125 | ],
126 | "source": [
127 | "# 정확도\n",
128 | "from sklearn.metrics import accuracy_score\n",
129 | "accuracy = accuracy_score(y_te, pred_gboost)\n",
130 | "print(accuracy)"
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": 8,
136 | "metadata": {},
137 | "outputs": [
138 | {
139 | "name": "stdout",
140 | "output_type": "stream",
141 | "text": [
142 | "[[49 4]\n",
143 | " [ 1 89]]\n"
144 | ]
145 | }
146 | ],
147 | "source": [
148 | "# confusion matrix 확인 \n",
149 | "from sklearn.metrics import confusion_matrix\n",
150 | "conf_matrix = confusion_matrix(y_te, pred_gboost)\n",
151 | "print(conf_matrix)"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": 9,
157 | "metadata": {},
158 | "outputs": [
159 | {
160 | "name": "stdout",
161 | "output_type": "stream",
162 | "text": [
163 | " precision recall f1-score support\n",
164 | "\n",
165 | " 0 0.98 0.92 0.95 53\n",
166 | " 1 0.96 0.99 0.97 90\n",
167 | "\n",
168 | " accuracy 0.97 143\n",
169 | " macro avg 0.97 0.96 0.96 143\n",
170 | "weighted avg 0.97 0.97 0.96 143\n",
171 | "\n"
172 | ]
173 | }
174 | ],
175 | "source": [
176 | "# 분류 레포트 확인\n",
177 | "from sklearn.metrics import classification_report\n",
178 | "class_report = classification_report(y_te, pred_gboost)\n",
179 | "print(class_report)"
180 | ]
181 | },
182 | {
183 | "cell_type": "markdown",
184 | "metadata": {},
185 | "source": [
186 | "# 통합 코드"
187 | ]
188 | },
189 | {
190 | "cell_type": "code",
191 | "execution_count": 1,
192 | "metadata": {},
193 | "outputs": [
194 | {
195 | "name": "stdout",
196 | "output_type": "stream",
197 | "text": [
198 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
199 | " 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1\n",
200 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n",
201 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n",
202 | "0.965034965034965\n",
203 | "[[49 4]\n",
204 | " [ 1 89]]\n",
205 | " precision recall f1-score support\n",
206 | "\n",
207 | " 0 0.98 0.92 0.95 53\n",
208 | " 1 0.96 0.99 0.97 90\n",
209 | "\n",
210 | " accuracy 0.97 143\n",
211 | " macro avg 0.97 0.96 0.96 143\n",
212 | "weighted avg 0.97 0.97 0.96 143\n",
213 | "\n"
214 | ]
215 | }
216 | ],
217 | "source": [
218 | "from sklearn import datasets\n",
219 | "from sklearn.model_selection import train_test_split\n",
220 | "from sklearn.preprocessing import StandardScaler\n",
221 | "\n",
222 | "from sklearn.ensemble import GradientBoostingClassifier\n",
223 | "\n",
224 | "from sklearn.metrics import accuracy_score\n",
225 | "from sklearn.metrics import confusion_matrix\n",
226 | "from sklearn.metrics import classification_report\n",
227 | "\n",
228 | "# 데이터 불러오기\n",
229 | "raw_breast_cancer = datasets.load_breast_cancer()\n",
230 | "\n",
231 | "# 피쳐, 타겟 데이터 지정\n",
232 | "X = raw_breast_cancer.data\n",
233 | "y = raw_breast_cancer.target\n",
234 | "\n",
235 | "# 트레이닝/테스트 데이터 분할\n",
236 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
237 | "\n",
238 | "# 데이터 표준화\n",
239 | "std_scale = StandardScaler()\n",
240 | "std_scale.fit(X_tn)\n",
241 | "X_tn_std = std_scale.transform(X_tn)\n",
242 | "X_te_std = std_scale.transform(X_te)\n",
243 | "\n",
244 | "# Gradient Boosting 학습\n",
245 | "clf_gbt = GradientBoostingClassifier(max_depth=2, \n",
246 | " learning_rate=0.01,\n",
247 | " random_state=0)\n",
248 | "clf_gbt.fit(X_tn_std, y_tn)\n",
249 | "\n",
250 | "# 예측\n",
251 | "pred_gboost = clf_gbt.predict(X_te_std)\n",
252 | "print(pred_gboost)\n",
253 | "\n",
254 | "# 정확도\n",
255 | "accuracy = accuracy_score(y_te, pred_gboost)\n",
256 | "print(accuracy)\n",
257 | "\n",
258 | "# confusion matrix 확인 \n",
259 | "conf_matrix = confusion_matrix(y_te, pred_gboost)\n",
260 | "print(conf_matrix)\n",
261 | "\n",
262 | "# 분류 레포트 확인\n",
263 | "class_report = classification_report(y_te, pred_gboost)\n",
264 | "print(class_report)"
265 | ]
266 | },
267 | {
268 | "cell_type": "code",
269 | "execution_count": null,
270 | "metadata": {},
271 | "outputs": [],
272 | "source": []
273 | }
274 | ],
275 | "metadata": {
276 | "kernelspec": {
277 | "display_name": "Python 3",
278 | "language": "python",
279 | "name": "python3"
280 | },
281 | "language_info": {
282 | "codemirror_mode": {
283 | "name": "ipython",
284 | "version": 3
285 | },
286 | "file_extension": ".py",
287 | "mimetype": "text/x-python",
288 | "name": "python",
289 | "nbconvert_exporter": "python",
290 | "pygments_lexer": "ipython3",
291 | "version": "3.7.6"
292 | }
293 | },
294 | "nbformat": 4,
295 | "nbformat_minor": 4
296 | }
297 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/09장_앙상블_5절_스태킹-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_breast_cancer = datasets.load_breast_cancer()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_breast_cancer.data\n",
29 | "y = raw_breast_cancer.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "StackingClassifier(cv=None,\n",
66 | " estimators=[('svm',\n",
67 | " SVC(C=1.0, break_ties=False, cache_size=200,\n",
68 | " class_weight=None, coef0=0.0,\n",
69 | " decision_function_shape='ovr', degree=3,\n",
70 | " gamma='scale', kernel='linear', max_iter=-1,\n",
71 | " probability=False, random_state=1,\n",
72 | " shrinking=True, tol=0.001, verbose=False)),\n",
73 | " ('gnb',\n",
74 | " GaussianNB(priors=None, var_smoothing=1e-09))],\n",
75 | " final_estimator=LogisticRegression(C=1.0, class_weight=None,\n",
76 | " dual=False,\n",
77 | " fit_intercept=True,\n",
78 | " intercept_scaling=1,\n",
79 | " l1_ratio=None,\n",
80 | " max_iter=100,\n",
81 | " multi_class='auto',\n",
82 | " n_jobs=None, penalty='l2',\n",
83 | " random_state=None,\n",
84 | " solver='lbfgs',\n",
85 | " tol=0.0001, verbose=0,\n",
86 | " warm_start=False),\n",
87 | " n_jobs=None, passthrough=False, stack_method='auto',\n",
88 | " verbose=0)"
89 | ]
90 | },
91 | "execution_count": 5,
92 | "metadata": {},
93 | "output_type": "execute_result"
94 | }
95 | ],
96 | "source": [
97 | "# 스태킹 학습\n",
98 | "from sklearn import svm\n",
99 | "from sklearn.naive_bayes import GaussianNB\n",
100 | "from sklearn.linear_model import LogisticRegression\n",
101 | "from sklearn.ensemble import StackingClassifier\n",
102 | "\n",
103 | "clf1 = svm.SVC(kernel='linear', random_state=1) \n",
104 | "clf2 = GaussianNB()\n",
105 | "\n",
106 | "clf_stkg = StackingClassifier(\n",
107 | " estimators=[\n",
108 | " ('svm', clf1), \n",
109 | " ('gnb', clf2)\n",
110 | " ],\n",
111 | " final_estimator=LogisticRegression())\n",
112 | "clf_stkg.fit(X_tn_std, y_tn)"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": 7,
118 | "metadata": {},
119 | "outputs": [
120 | {
121 | "name": "stdout",
122 | "output_type": "stream",
123 | "text": [
124 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
125 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n",
126 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1\n",
127 | " 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n"
128 | ]
129 | }
130 | ],
131 | "source": [
132 | "# 예측\n",
133 | "pred_stkg = clf_stkg.predict(X_te_std)\n",
134 | "print(pred_stkg)"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 8,
140 | "metadata": {},
141 | "outputs": [
142 | {
143 | "name": "stdout",
144 | "output_type": "stream",
145 | "text": [
146 | "0.965034965034965\n"
147 | ]
148 | }
149 | ],
150 | "source": [
151 | "# 정확도\n",
152 | "from sklearn.metrics import accuracy_score\n",
153 | "accuracy = accuracy_score(y_te, pred_stkg)\n",
154 | "print(accuracy)"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 9,
160 | "metadata": {},
161 | "outputs": [
162 | {
163 | "name": "stdout",
164 | "output_type": "stream",
165 | "text": [
166 | "[[50 3]\n",
167 | " [ 2 88]]\n"
168 | ]
169 | }
170 | ],
171 | "source": [
172 | "# confusion matrix 확인 \n",
173 | "from sklearn.metrics import confusion_matrix\n",
174 | "conf_matrix = confusion_matrix(y_te, pred_stkg)\n",
175 | "print(conf_matrix)"
176 | ]
177 | },
178 | {
179 | "cell_type": "code",
180 | "execution_count": 10,
181 | "metadata": {},
182 | "outputs": [
183 | {
184 | "name": "stdout",
185 | "output_type": "stream",
186 | "text": [
187 | " precision recall f1-score support\n",
188 | "\n",
189 | " 0 0.96 0.94 0.95 53\n",
190 | " 1 0.97 0.98 0.97 90\n",
191 | "\n",
192 | " accuracy 0.97 143\n",
193 | " macro avg 0.96 0.96 0.96 143\n",
194 | "weighted avg 0.96 0.97 0.96 143\n",
195 | "\n"
196 | ]
197 | }
198 | ],
199 | "source": [
200 | "# 분류 레포트 확인\n",
201 | "from sklearn.metrics import classification_report\n",
202 | "class_report = classification_report(y_te, pred_stkg)\n",
203 | "print(class_report)"
204 | ]
205 | },
206 | {
207 | "cell_type": "markdown",
208 | "metadata": {},
209 | "source": [
210 | "# 통합코드"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 11,
216 | "metadata": {},
217 | "outputs": [
218 | {
219 | "name": "stdout",
220 | "output_type": "stream",
221 | "text": [
222 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
223 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n",
224 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1\n",
225 | " 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n",
226 | "0.965034965034965\n",
227 | "[[50 3]\n",
228 | " [ 2 88]]\n",
229 | " precision recall f1-score support\n",
230 | "\n",
231 | " 0 0.96 0.94 0.95 53\n",
232 | " 1 0.97 0.98 0.97 90\n",
233 | "\n",
234 | " accuracy 0.97 143\n",
235 | " macro avg 0.96 0.96 0.96 143\n",
236 | "weighted avg 0.96 0.97 0.96 143\n",
237 | "\n"
238 | ]
239 | }
240 | ],
241 | "source": [
242 | "from sklearn import datasets\n",
243 | "from sklearn.model_selection import train_test_split\n",
244 | "from sklearn.preprocessing import StandardScaler\n",
245 | "\n",
246 | "from sklearn import svm\n",
247 | "from sklearn.naive_bayes import GaussianNB\n",
248 | "from sklearn.linear_model import LogisticRegression\n",
249 | "from sklearn.ensemble import StackingClassifier\n",
250 | "\n",
251 | "from sklearn.metrics import accuracy_score\n",
252 | "from sklearn.metrics import confusion_matrix\n",
253 | "from sklearn.metrics import classification_report\n",
254 | "\n",
255 | "\n",
256 | "# 데이터 불러오기\n",
257 | "raw_breast_cancer = datasets.load_breast_cancer()\n",
258 | "\n",
259 | "# 피쳐, 타겟 데이터 지정\n",
260 | "X = raw_breast_cancer.data\n",
261 | "y = raw_breast_cancer.target\n",
262 | "\n",
263 | "# 트레이닝/테스트 데이터 분할\n",
264 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
265 | "\n",
266 | "# 데이터 표준화\n",
267 | "std_scale = StandardScaler()\n",
268 | "std_scale.fit(X_tn)\n",
269 | "X_tn_std = std_scale.transform(X_tn)\n",
270 | "X_te_std = std_scale.transform(X_te)\n",
271 | "\n",
272 | "# 스태킹 학습\n",
273 | "clf1 = svm.SVC(kernel='linear', random_state=1) \n",
274 | "clf2 = GaussianNB()\n",
275 | "\n",
276 | "clf_stkg = StackingClassifier(\n",
277 | " estimators=[\n",
278 | " ('svm', clf1), \n",
279 | " ('gnb', clf2)\n",
280 | " ],\n",
281 | " final_estimator=LogisticRegression())\n",
282 | "clf_stkg.fit(X_tn_std, y_tn)\n",
283 | "\n",
284 | "# 예측\n",
285 | "pred_stkg = clf_stkg.predict(X_te_std)\n",
286 | "print(pred_stkg)\n",
287 | "\n",
288 | "# 정확도\n",
289 | "accuracy = accuracy_score(y_te, pred_stkg)\n",
290 | "print(accuracy)\n",
291 | "\n",
292 | "# confusion matrix 확인 \n",
293 | "conf_matrix = confusion_matrix(y_te, pred_stkg)\n",
294 | "print(conf_matrix)\n",
295 | "\n",
296 | "# 분류 레포트 확인\n",
297 | "class_report = classification_report(y_te, pred_stkg)\n",
298 | "print(class_report)"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {},
305 | "outputs": [],
306 | "source": []
307 | }
308 | ],
309 | "metadata": {
310 | "kernelspec": {
311 | "display_name": "Python 3",
312 | "language": "python",
313 | "name": "python3"
314 | },
315 | "language_info": {
316 | "codemirror_mode": {
317 | "name": "ipython",
318 | "version": 3
319 | },
320 | "file_extension": ".py",
321 | "mimetype": "text/x-python",
322 | "name": "python",
323 | "nbconvert_exporter": "python",
324 | "pygments_lexer": "ipython3",
325 | "version": "3.7.6"
326 | }
327 | },
328 | "nbformat": 4,
329 | "nbformat_minor": 4
330 | }
331 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/12장_딥러닝_2절_1_퍼셉트론-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [
15 | {
16 | "name": "stdout",
17 | "output_type": "stream",
18 | "text": [
19 | "[[2 3]\n",
20 | " [5 1]]\n",
21 | "[2 3 5 1]\n"
22 | ]
23 | }
24 | ],
25 | "source": [
26 | "import numpy as np\n",
27 | "\n",
28 | "# 입력층\n",
29 | "input_data = np.array([[2,3], [5,1]])\n",
30 | "print(input_data)\n",
31 | "x = input_data.reshape(-1)\n",
32 | "print(x)"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 2,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "# 가중치 및 편향\n",
42 | "w1 = np.array([2,1,-3,3])\n",
43 | "w2 = np.array([1,-3,1,3])\n",
44 | "b1 = 3\n",
45 | "b2 = 3"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "metadata": {},
52 | "outputs": [
53 | {
54 | "name": "stdout",
55 | "output_type": "stream",
56 | "text": [
57 | "[[ 2 1 -3 3]\n",
58 | " [ 1 -3 1 3]]\n",
59 | "[3 3]\n",
60 | "[-2 4]\n"
61 | ]
62 | }
63 | ],
64 | "source": [
65 | "# 가중합\n",
66 | "W = np.array([w1, w2])\n",
67 | "print(W)\n",
68 | "b = np.array([b1, b2])\n",
69 | "print(b)\n",
70 | "weight_sum = np.dot(W, x) + b\n",
71 | "print(weight_sum)"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": 5,
77 | "metadata": {},
78 | "outputs": [
79 | {
80 | "name": "stdout",
81 | "output_type": "stream",
82 | "text": [
83 | "[0.11920292 0.98201379]\n"
84 | ]
85 | }
86 | ],
87 | "source": [
88 | "# 출력층\n",
89 | "res = 1/(1+np.exp(-weight_sum))\n",
90 | "print(res)"
91 | ]
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | "# 통합 코드"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 6,
103 | "metadata": {},
104 | "outputs": [],
105 | "source": [
106 | "import numpy as np\n",
107 | "\n",
108 | "# 입력층\n",
109 | "input_data = np.array([[2,3], [5,1]])\n",
110 | "x = input_data.reshape(-1)\n",
111 | "\n",
112 | "# 가중치 및 편향\n",
113 | "w1 = np.array([2,1,-3,3])\n",
114 | "w2 = np.array([1,-3,1,3])\n",
115 | "b1 = 3\n",
116 | "b2 = 3\n",
117 | "\n",
118 | "# 가중합\n",
119 | "W = np.array([w1, w2])\n",
120 | "b = np.array([b1, b2])\n",
121 | "weight_sum = np.dot(W, x) + b\n",
122 | "\n",
123 | "# 출력층\n",
124 | "res = 1/(1+np.exp(-weight_sum))"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": null,
130 | "metadata": {},
131 | "outputs": [],
132 | "source": []
133 | }
134 | ],
135 | "metadata": {
136 | "kernelspec": {
137 | "display_name": "Python 3",
138 | "language": "python",
139 | "name": "python3"
140 | },
141 | "language_info": {
142 | "codemirror_mode": {
143 | "name": "ipython",
144 | "version": 3
145 | },
146 | "file_extension": ".py",
147 | "mimetype": "text/x-python",
148 | "name": "python",
149 | "nbconvert_exporter": "python",
150 | "pygments_lexer": "ipython3",
151 | "version": "3.7.6"
152 | }
153 | },
154 | "nbformat": 4,
155 | "nbformat_minor": 4
156 | }
157 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/12장_딥러닝_3절_7_텐서플로_소개-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Sequential API를 활용한 딥러닝 모형 생성"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "from tensorflow.keras.models import Sequential\n",
17 | "from tensorflow.keras.layers import Dense\n",
18 | "\n",
19 | "model = Sequential()\n",
20 | "model.add(Dense(100, activation='relu', \n",
21 | " input_shape=(32,32,1)))\n",
22 | "model.add(Dense(50, activation='relu'))\n",
23 | "model.add(Dense(5, activation='softmax'))"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 2,
29 | "metadata": {},
30 | "outputs": [
31 | {
32 | "name": "stdout",
33 | "output_type": "stream",
34 | "text": [
35 | "Model: \"sequential\"\n",
36 | "_________________________________________________________________\n",
37 | "Layer (type) Output Shape Param # \n",
38 | "=================================================================\n",
39 | "dense (Dense) (None, 32, 32, 100) 200 \n",
40 | "_________________________________________________________________\n",
41 | "dense_1 (Dense) (None, 32, 32, 50) 5050 \n",
42 | "_________________________________________________________________\n",
43 | "dense_2 (Dense) (None, 32, 32, 5) 255 \n",
44 | "=================================================================\n",
45 | "Total params: 5,505\n",
46 | "Trainable params: 5,505\n",
47 | "Non-trainable params: 0\n",
48 | "_________________________________________________________________\n"
49 | ]
50 | }
51 | ],
52 | "source": [
53 | "model.summary()"
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {},
59 | "source": [
60 | "# 함수형 API를 활용한 딥러닝 모형 생성"
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": 3,
66 | "metadata": {},
67 | "outputs": [],
68 | "source": [
69 | "from tensorflow.keras.layers import Input, Dense\n",
70 | "from tensorflow.keras.models import Model\n",
71 | "\n",
72 | "input_layer = Input(shape=(32,32,1))\n",
73 | "\n",
74 | "x = Dense(units=100, activation = 'relu')(input_layer)\n",
75 | "x = Dense(units=50, activation = 'relu')(x)\n",
76 | "\n",
77 | "output_layer = Dense(units=5, activation='softmax')(x)\n",
78 | "\n",
79 | "model2 = Model(input_layer, output_layer)\n"
80 | ]
81 | },
82 | {
83 | "cell_type": "code",
84 | "execution_count": 4,
85 | "metadata": {},
86 | "outputs": [
87 | {
88 | "name": "stdout",
89 | "output_type": "stream",
90 | "text": [
91 | "Model: \"model\"\n",
92 | "_________________________________________________________________\n",
93 | "Layer (type) Output Shape Param # \n",
94 | "=================================================================\n",
95 | "input_1 (InputLayer) [(None, 32, 32, 1)] 0 \n",
96 | "_________________________________________________________________\n",
97 | "dense_3 (Dense) (None, 32, 32, 100) 200 \n",
98 | "_________________________________________________________________\n",
99 | "dense_4 (Dense) (None, 32, 32, 50) 5050 \n",
100 | "_________________________________________________________________\n",
101 | "dense_5 (Dense) (None, 32, 32, 5) 255 \n",
102 | "=================================================================\n",
103 | "Total params: 5,505\n",
104 | "Trainable params: 5,505\n",
105 | "Non-trainable params: 0\n",
106 | "_________________________________________________________________\n"
107 | ]
108 | }
109 | ],
110 | "source": [
111 | "model2.summary()"
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {},
117 | "source": [
118 | "# 활성화 함수 사용"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": null,
124 | "metadata": {},
125 | "outputs": [],
126 | "source": [
127 | "x = Dense(units=100)(x)\n",
128 | "x = Activation('relu')(x)"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": null,
134 | "metadata": {},
135 | "outputs": [],
136 | "source": [
137 | "x = Dense(units=100, activation='relu')(x)"
138 | ]
139 | }
140 | ],
141 | "metadata": {
142 | "kernelspec": {
143 | "display_name": "Python 3",
144 | "language": "python",
145 | "name": "python3"
146 | },
147 | "language_info": {
148 | "codemirror_mode": {
149 | "name": "ipython",
150 | "version": 3
151 | },
152 | "file_extension": ".py",
153 | "mimetype": "text/x-python",
154 | "name": "python",
155 | "nbconvert_exporter": "python",
156 | "pygments_lexer": "ipython3",
157 | "version": "3.7.6"
158 | }
159 | },
160 | "nbformat": 4,
161 | "nbformat_minor": 4
162 | }
163 |
--------------------------------------------------------------------------------
/.ipynb_checkpoints/12장_딥러닝_7절_1_자연어처리-checkpoint.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 단어의 토큰화"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 55,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "from tensorflow.keras.preprocessing.text import Tokenizer\n",
17 | "\n",
18 | "paper = ['많은 것을 바꾸고 싶다면 많은 것을 받아들여라']\n",
19 | "\n",
20 | "tknz = Tokenizer()\n",
21 | "tknz.fit_on_texts(paper)"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 57,
27 | "metadata": {},
28 | "outputs": [
29 | {
30 | "name": "stdout",
31 | "output_type": "stream",
32 | "text": [
33 | "{'많은': 1, '것을': 2, '바꾸고': 3, '싶다면': 4, '받아들여라': 5}\n",
34 | "OrderedDict([('많은', 2), ('것을', 2), ('바꾸고', 1), ('싶다면', 1), ('받아들여라', 1)])\n"
35 | ]
36 | }
37 | ],
38 | "source": [
39 | "print(tknz.word_index)\n",
40 | "print(tknz.word_counts)"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "# 원 핫 인코딩"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": 70,
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "from tensorflow.keras.utils import to_categorical\n",
57 | "from tensorflow.keras.preprocessing.text import Tokenizer\n",
58 | "\n",
59 | "\n",
60 | "paper = ['많은 것을 바꾸고 싶다면 많은 것을 받아들여라']\n",
61 | "tknz = Tokenizer()\n",
62 | "tknz.fit_on_texts(paper)\n",
63 | "\n",
64 | "idx_paper = tknz.texts_to_sequences(paper)\n",
65 | "n = len(tknz.word_index)+1\n",
66 | "idx_onehot = to_categorical(idx_paper, num_classes=n)"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 71,
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "name": "stdout",
76 | "output_type": "stream",
77 | "text": [
78 | "[[1, 2, 3, 4, 1, 2, 5]]\n",
79 | "6\n",
80 | "[[[0. 1. 0. 0. 0. 0.]\n",
81 | " [0. 0. 1. 0. 0. 0.]\n",
82 | " [0. 0. 0. 1. 0. 0.]\n",
83 | " [0. 0. 0. 0. 1. 0.]\n",
84 | " [0. 1. 0. 0. 0. 0.]\n",
85 | " [0. 0. 1. 0. 0. 0.]\n",
86 | " [0. 0. 0. 0. 0. 1.]]]\n"
87 | ]
88 | }
89 | ],
90 | "source": [
91 | "print(idx_paper)\n",
92 | "print(n)\n",
93 | "print(idx_onehot)"
94 | ]
95 | },
96 | {
97 | "cell_type": "markdown",
98 | "metadata": {},
99 | "source": [
100 | "# 단어 임베딩"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 76,
106 | "metadata": {},
107 | "outputs": [],
108 | "source": [
109 | "from tensorflow.keras.models import Sequential\n",
110 | "from tensorflow.keras.layers import Embedding\n",
111 | "\n",
112 | "model = Sequential()\n",
113 | "model.add(Embedding(input_dim=n, output_dim=3))\n",
114 | "model.compile(optimizer='rmsprop', loss='mse')\n",
115 | "embedding = model.predict(idx_paper)"
116 | ]
117 | },
118 | {
119 | "cell_type": "code",
120 | "execution_count": 77,
121 | "metadata": {},
122 | "outputs": [
123 | {
124 | "name": "stdout",
125 | "output_type": "stream",
126 | "text": [
127 | "[[[-0.02796837 -0.03958071 -0.03936887]\n",
128 | " [-0.02087821 -0.02005102 0.0131931 ]\n",
129 | " [-0.00142742 -0.03759698 0.02437944]\n",
130 | " [ 0.01546348 -0.00769221 -0.01694027]\n",
131 | " [-0.02796837 -0.03958071 -0.03936887]\n",
132 | " [-0.02087821 -0.02005102 0.0131931 ]\n",
133 | " [ 0.024049 -0.03488786 0.02603838]]]\n"
134 | ]
135 | }
136 | ],
137 | "source": [
138 | "print(embedding)"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {},
145 | "outputs": [],
146 | "source": []
147 | }
148 | ],
149 | "metadata": {
150 | "kernelspec": {
151 | "display_name": "Python 3",
152 | "language": "python",
153 | "name": "python3"
154 | },
155 | "language_info": {
156 | "codemirror_mode": {
157 | "name": "ipython",
158 | "version": 3
159 | },
160 | "file_extension": ".py",
161 | "mimetype": "text/x-python",
162 | "name": "python",
163 | "nbconvert_exporter": "python",
164 | "pygments_lexer": "ipython3",
165 | "version": "3.7.6"
166 | }
167 | },
168 | "nbformat": 4,
169 | "nbformat_minor": 4
170 | }
171 |
--------------------------------------------------------------------------------
/07장_3절_파이프라인.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 파이프 라인"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 8,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "from sklearn import datasets\n",
17 | "from sklearn.pipeline import Pipeline\n",
18 | "from sklearn.preprocessing import StandardScaler\n",
19 | "from sklearn.linear_model import LinearRegression\n",
20 | "from sklearn.model_selection import train_test_split\n",
21 | "from sklearn.metrics import mean_squared_error"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 9,
27 | "metadata": {},
28 | "outputs": [
29 | {
30 | "data": {
31 | "text/plain": [
32 | "29.515137790197567"
33 | ]
34 | },
35 | "execution_count": 9,
36 | "metadata": {},
37 | "output_type": "execute_result"
38 | }
39 | ],
40 | "source": [
41 | "raw_boston = datasets.load_boston()\n",
42 | "\n",
43 | "X = raw_boston.data\n",
44 | "y = raw_boston.target\n",
45 | "\n",
46 | "# 트레이닝 / 테스트 데이터 분할\n",
47 | "X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=7)\n",
48 | "\n",
49 | "# 표준화 스케일링\n",
50 | "std_scale = StandardScaler()\n",
51 | "X_tn_std = std_scale.fit_transform(X_tn)\n",
52 | "X_te_std = std_scale.transform(X_te)\n",
53 | "\n",
54 | "# 학습\n",
55 | "clf_linear = LinearRegression()\n",
56 | "clf_linear.fit(X_tn_std, y_tn)\n",
57 | "\n",
58 | "# 예측\n",
59 | "pred_linear = clf_linear.predict(X_te_std)\n",
60 | "\n",
61 | "# 평가\n",
62 | "mean_squared_error(y_te, pred_linear)"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": 10,
68 | "metadata": {},
69 | "outputs": [
70 | {
71 | "data": {
72 | "text/plain": [
73 | "29.515137790197567"
74 | ]
75 | },
76 | "execution_count": 10,
77 | "metadata": {},
78 | "output_type": "execute_result"
79 | }
80 | ],
81 | "source": [
82 | "# 트레이닝 / 테스트 데이터 분할\n",
83 | "X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=7)\n",
84 | "\n",
85 | "# 파이프라인\n",
86 | "linear_pipline = Pipeline([\n",
87 | " ('scaler',StandardScaler()), \n",
88 | " ('linear_regression', LinearRegression()) \n",
89 | "])\n",
90 | "\n",
91 | "# 학습\n",
92 | "linear_pipline.fit(X_tn, y_tn)\n",
93 | "\n",
94 | "# 예측\n",
95 | "pred_linear = linear_pipline.predict(X_te)\n",
96 | "\n",
97 | "# 평가\n",
98 | "mean_squared_error(y_te, pred_linear)"
99 | ]
100 | }
101 | ],
102 | "metadata": {
103 | "kernelspec": {
104 | "display_name": "Python 3",
105 | "language": "python",
106 | "name": "python3"
107 | },
108 | "language_info": {
109 | "codemirror_mode": {
110 | "name": "ipython",
111 | "version": 3
112 | },
113 | "file_extension": ".py",
114 | "mimetype": "text/x-python",
115 | "name": "python",
116 | "nbconvert_exporter": "python",
117 | "pygments_lexer": "ipython3",
118 | "version": "3.7.6"
119 | }
120 | },
121 | "nbformat": 4,
122 | "nbformat_minor": 4
123 | }
124 |
--------------------------------------------------------------------------------
/07장_4절_그리드서치.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 그리드 서치"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 7,
13 | "metadata": {
14 | "scrolled": true
15 | },
16 | "outputs": [
17 | {
18 | "name": "stdout",
19 | "output_type": "stream",
20 | "text": [
21 | "{'k': 3}\n",
22 | "0.9736842105263158\n"
23 | ]
24 | }
25 | ],
26 | "source": [
27 | "from sklearn import datasets\n",
28 | "from sklearn.preprocessing import StandardScaler\n",
29 | "from sklearn.neighbors import KNeighborsClassifier\n",
30 | "from sklearn.model_selection import train_test_split\n",
31 | "\n",
32 | "from sklearn.metrics import accuracy_score\n",
33 | "from sklearn.metrics import confusion_matrix\n",
34 | "from sklearn.metrics import classification_report\n",
35 | "\n",
36 | "# 꽃 데이터 불러오기\n",
37 | "raw_iris = datasets.load_iris()\n",
38 | "\n",
39 | "# 피쳐 / 타겟\n",
40 | "X = raw_iris.data\n",
41 | "y = raw_iris.target\n",
42 | "\n",
43 | "# 트레이닝 / 테스트 데이터 분할\n",
44 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
45 | "\n",
46 | "# 표준화 스케일\n",
47 | "std_scale = StandardScaler()\n",
48 | "std_scale.fit(X_tn)\n",
49 | "X_tn_std = std_scale.transform(X_tn)\n",
50 | "X_te_std = std_scale.transform(X_te)\n",
51 | "\n",
52 | "best_accuracy = 0\n",
53 | "\n",
54 | "for k in [1,2,3,4,5,6,7,8,9,10]:\n",
55 | " clf_knn = KNeighborsClassifier(n_neighbors=k)\n",
56 | " clf_knn.fit(X_tn_std, y_tn)\n",
57 | " knn_pred = clf_knn.predict(X_te_std)\n",
58 | " accuracy = accuracy_score(y_te, knn_pred)\n",
59 | " if accuracy > best_accuracy:\n",
60 | " best_accuracy = accuracy\n",
61 | " final_k = {'k': k}\n",
62 | " \n",
63 | "print(final_k)\n",
64 | "print(accuracy)"
65 | ]
66 | },
67 | {
68 | "cell_type": "code",
69 | "execution_count": null,
70 | "metadata": {},
71 | "outputs": [],
72 | "source": []
73 | }
74 | ],
75 | "metadata": {
76 | "kernelspec": {
77 | "display_name": "Python 3",
78 | "language": "python",
79 | "name": "python3"
80 | },
81 | "language_info": {
82 | "codemirror_mode": {
83 | "name": "ipython",
84 | "version": 3
85 | },
86 | "file_extension": ".py",
87 | "mimetype": "text/x-python",
88 | "name": "python",
89 | "nbconvert_exporter": "python",
90 | "pygments_lexer": "ipython3",
91 | "version": "3.7.6"
92 | }
93 | },
94 | "nbformat": 4,
95 | "nbformat_minor": 4
96 | }
97 |
--------------------------------------------------------------------------------
/07장_모형평가_6절_분류_회귀_군집.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 분류"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "## 정확도"
17 | ]
18 | },
19 | {
20 | "cell_type": "code",
21 | "execution_count": 3,
22 | "metadata": {},
23 | "outputs": [
24 | {
25 | "name": "stdout",
26 | "output_type": "stream",
27 | "text": [
28 | "0.5\n",
29 | "2\n"
30 | ]
31 | }
32 | ],
33 | "source": [
34 | "#import numpy as np\n",
35 | "from sklearn.metrics import accuracy_score\n",
36 | "y_pred = [0, 2, 1, 3]\n",
37 | "y_true = [0, 1, 2, 3]\n",
38 | "print(accuracy_score(y_true, y_pred))\n",
39 | "print(accuracy_score(y_true, y_pred, normalize=False))"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 3,
45 | "metadata": {},
46 | "outputs": [],
47 | "source": [
48 | "## confusionm matrix"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 4,
54 | "metadata": {},
55 | "outputs": [
56 | {
57 | "data": {
58 | "text/plain": [
59 | "array([[2, 0, 0],\n",
60 | " [0, 0, 1],\n",
61 | " [1, 0, 2]])"
62 | ]
63 | },
64 | "execution_count": 4,
65 | "metadata": {},
66 | "output_type": "execute_result"
67 | }
68 | ],
69 | "source": [
70 | "from sklearn.metrics import confusion_matrix\n",
71 | "y_true = [2, 0, 2, 2, 0, 1]\n",
72 | "y_pred = [0, 0, 2, 2, 0, 2]\n",
73 | "confusion_matrix(y_true, y_pred)"
74 | ]
75 | },
76 | {
77 | "cell_type": "code",
78 | "execution_count": 5,
79 | "metadata": {},
80 | "outputs": [],
81 | "source": [
82 | "## classification report "
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 6,
88 | "metadata": {},
89 | "outputs": [
90 | {
91 | "name": "stdout",
92 | "output_type": "stream",
93 | "text": [
94 | " precision recall f1-score support\n",
95 | "\n",
96 | " class 0 0.67 1.00 0.80 2\n",
97 | " class 1 0.00 0.00 0.00 1\n",
98 | " class 2 1.00 0.50 0.67 2\n",
99 | "\n",
100 | " accuracy 0.60 5\n",
101 | " macro avg 0.56 0.50 0.49 5\n",
102 | "weighted avg 0.67 0.60 0.59 5\n",
103 | "\n"
104 | ]
105 | }
106 | ],
107 | "source": [
108 | "from sklearn.metrics import classification_report\n",
109 | "y_true = [0, 1, 2, 2, 0]\n",
110 | "y_pred = [0, 0, 2, 1, 0]\n",
111 | "target_names = ['class 0', 'class 1', 'class 2']\n",
112 | "print(classification_report(y_true, y_pred, target_names=target_names))"
113 | ]
114 | },
115 | {
116 | "cell_type": "markdown",
117 | "metadata": {},
118 | "source": [
119 | "# 회귀"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": null,
125 | "metadata": {},
126 | "outputs": [],
127 | "source": [
128 | "# mean absolute error"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": 5,
134 | "metadata": {},
135 | "outputs": [
136 | {
137 | "name": "stdout",
138 | "output_type": "stream",
139 | "text": [
140 | "0.5\n"
141 | ]
142 | }
143 | ],
144 | "source": [
145 | "from sklearn.metrics import mean_absolute_error\n",
146 | "y_true = [3, -0.5, 2, 7]\n",
147 | "y_pred = [2.5, 0.0, 2, 8]\n",
148 | "\n",
149 | "print(mean_absolute_error(y_true, y_pred))"
150 | ]
151 | },
152 | {
153 | "cell_type": "code",
154 | "execution_count": 6,
155 | "metadata": {},
156 | "outputs": [],
157 | "source": [
158 | "# mean squared error"
159 | ]
160 | },
161 | {
162 | "cell_type": "code",
163 | "execution_count": 7,
164 | "metadata": {},
165 | "outputs": [
166 | {
167 | "data": {
168 | "text/plain": [
169 | "0.375"
170 | ]
171 | },
172 | "execution_count": 7,
173 | "metadata": {},
174 | "output_type": "execute_result"
175 | }
176 | ],
177 | "source": [
178 | "from sklearn.metrics import mean_squared_error\n",
179 | "y_true = [3, -0.5, 2, 7]\n",
180 | "y_pred = [2.5, 0.0, 2, 8]\n",
181 | "print(mean_squared_error(y_true, y_pred))"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": 8,
187 | "metadata": {},
188 | "outputs": [],
189 | "source": [
190 | "# R2"
191 | ]
192 | },
193 | {
194 | "cell_type": "code",
195 | "execution_count": 10,
196 | "metadata": {},
197 | "outputs": [
198 | {
199 | "name": "stdout",
200 | "output_type": "stream",
201 | "text": [
202 | "0.9486081370449679\n"
203 | ]
204 | }
205 | ],
206 | "source": [
207 | "from sklearn.metrics import r2_score\n",
208 | "y_true = [3, -0.5, 2, 7]\n",
209 | "y_pred = [2.5, 0.0, 2, 8]\n",
210 | "print(r2_score(y_true, y_pred))"
211 | ]
212 | },
213 | {
214 | "cell_type": "markdown",
215 | "metadata": {},
216 | "source": [
217 | "# 군집"
218 | ]
219 | },
220 | {
221 | "cell_type": "code",
222 | "execution_count": null,
223 | "metadata": {},
224 | "outputs": [],
225 | "source": [
226 | "# adjusted rand index"
227 | ]
228 | },
229 | {
230 | "cell_type": "code",
231 | "execution_count": 2,
232 | "metadata": {},
233 | "outputs": [
234 | {
235 | "name": "stdout",
236 | "output_type": "stream",
237 | "text": [
238 | "0.24242424242424246\n"
239 | ]
240 | }
241 | ],
242 | "source": [
243 | "from sklearn.metrics import adjusted_rand_score\n",
244 | "labels_true = [0, 0, 0, 1, 1, 1]\n",
245 | "labels_pred = [0, 0, 1, 1, 2, 2]\n",
246 | "\n",
247 | "print(adjusted_rand_score(labels_true, labels_pred))"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": 3,
253 | "metadata": {},
254 | "outputs": [],
255 | "source": [
256 | "# silloutte score"
257 | ]
258 | },
259 | {
260 | "cell_type": "code",
261 | "execution_count": 2,
262 | "metadata": {},
263 | "outputs": [
264 | {
265 | "name": "stdout",
266 | "output_type": "stream",
267 | "text": [
268 | "0.5789497702625118\n"
269 | ]
270 | }
271 | ],
272 | "source": [
273 | "from sklearn.metrics import silhouette_score\n",
274 | "X = [[1, 2], [4, 5], [2, 1], [6, 7], [2, 3]]\n",
275 | "labels = [0, 1, 0, 1, 0] \n",
276 | "sil_score = silhouette_score(X, labels)\n",
277 | "print(sil_score)"
278 | ]
279 | },
280 | {
281 | "cell_type": "code",
282 | "execution_count": null,
283 | "metadata": {},
284 | "outputs": [],
285 | "source": []
286 | }
287 | ],
288 | "metadata": {
289 | "kernelspec": {
290 | "display_name": "Python 3",
291 | "language": "python",
292 | "name": "python3"
293 | },
294 | "language_info": {
295 | "codemirror_mode": {
296 | "name": "ipython",
297 | "version": 3
298 | },
299 | "file_extension": ".py",
300 | "mimetype": "text/x-python",
301 | "name": "python",
302 | "nbconvert_exporter": "python",
303 | "pygments_lexer": "ipython3",
304 | "version": "3.7.6"
305 | }
306 | },
307 | "nbformat": 4,
308 | "nbformat_minor": 4
309 | }
310 |
--------------------------------------------------------------------------------
/08장_지도학습_3절_k최근접이웃.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_iris = datasets.load_iris()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐/타겟\n",
28 | "X = raw_iris.data\n",
29 | "y = raw_iris.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "#데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
66 | " metric_params=None, n_jobs=None, n_neighbors=2, p=2,\n",
67 | " weights='uniform')"
68 | ]
69 | },
70 | "execution_count": 5,
71 | "metadata": {},
72 | "output_type": "execute_result"
73 | }
74 | ],
75 | "source": [
76 | "# 학습\n",
77 | "from sklearn.neighbors import KNeighborsClassifier\n",
78 | "clf_knn = KNeighborsClassifier(n_neighbors=2)\n",
79 | "clf_knn.fit(X_tn_std, y_tn)"
80 | ]
81 | },
82 | {
83 | "cell_type": "code",
84 | "execution_count": 8,
85 | "metadata": {},
86 | "outputs": [
87 | {
88 | "name": "stdout",
89 | "output_type": "stream",
90 | "text": [
91 | "[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n",
92 | " 2]\n"
93 | ]
94 | }
95 | ],
96 | "source": [
97 | "# 예측\n",
98 | "knn_pred = clf_knn.predict(X_te_std)\n",
99 | "print(knn_pred)"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": 9,
105 | "metadata": {},
106 | "outputs": [
107 | {
108 | "name": "stdout",
109 | "output_type": "stream",
110 | "text": [
111 | "0.9473684210526315\n"
112 | ]
113 | }
114 | ],
115 | "source": [
116 | "# 정확도\n",
117 | "from sklearn.metrics import accuracy_score\n",
118 | "accuracy = accuracy_score(y_te, knn_pred)\n",
119 | "print(accuracy)"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": 10,
125 | "metadata": {},
126 | "outputs": [
127 | {
128 | "name": "stdout",
129 | "output_type": "stream",
130 | "text": [
131 | "[[13 0 0]\n",
132 | " [ 0 15 1]\n",
133 | " [ 0 1 8]]\n"
134 | ]
135 | }
136 | ],
137 | "source": [
138 | "# confusion matrix 확인 \n",
139 | "from sklearn.metrics import confusion_matrix\n",
140 | "conf_matrix = confusion_matrix(y_te, knn_pred)\n",
141 | "print(conf_matrix)"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": 22,
147 | "metadata": {},
148 | "outputs": [
149 | {
150 | "name": "stdout",
151 | "output_type": "stream",
152 | "text": [
153 | " precision recall f1-score support\n",
154 | "\n",
155 | " 0 1.00 1.00 1.00 13\n",
156 | " 1 0.94 0.94 0.94 16\n",
157 | " 2 0.89 0.89 0.89 9\n",
158 | "\n",
159 | " accuracy 0.95 38\n",
160 | " macro avg 0.94 0.94 0.94 38\n",
161 | "weighted avg 0.95 0.95 0.95 38\n",
162 | "\n"
163 | ]
164 | }
165 | ],
166 | "source": [
167 | "# 분류 레포트 확인\n",
168 | "from sklearn.metrics import classification_report\n",
169 | "class_report = classification_report(y_te, knn_pred)\n",
170 | "print(class_report)"
171 | ]
172 | },
173 | {
174 | "cell_type": "markdown",
175 | "metadata": {},
176 | "source": [
177 | "# 통합 코드"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": 11,
183 | "metadata": {},
184 | "outputs": [
185 | {
186 | "name": "stdout",
187 | "output_type": "stream",
188 | "text": [
189 | "[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n",
190 | " 2]\n",
191 | "0.9473684210526315\n",
192 | "[[13 0 0]\n",
193 | " [ 0 15 1]\n",
194 | " [ 0 1 8]]\n",
195 | " precision recall f1-score support\n",
196 | "\n",
197 | " 0 1.00 1.00 1.00 13\n",
198 | " 1 0.94 0.94 0.94 16\n",
199 | " 2 0.89 0.89 0.89 9\n",
200 | "\n",
201 | " accuracy 0.95 38\n",
202 | " macro avg 0.94 0.94 0.94 38\n",
203 | "weighted avg 0.95 0.95 0.95 38\n",
204 | "\n"
205 | ]
206 | }
207 | ],
208 | "source": [
209 | "from sklearn import datasets\n",
210 | "from sklearn.preprocessing import StandardScaler\n",
211 | "from sklearn.neighbors import KNeighborsClassifier\n",
212 | "from sklearn.model_selection import train_test_split\n",
213 | "\n",
214 | "from sklearn.metrics import accuracy_score\n",
215 | "from sklearn.metrics import confusion_matrix\n",
216 | "from sklearn.metrics import classification_report\n",
217 | "\n",
218 | "# 꽃 데이터 불러오기\n",
219 | "raw_iris = datasets.load_iris()\n",
220 | "\n",
221 | "# 피쳐 / 타겟\n",
222 | "X = raw_iris.data\n",
223 | "y = raw_iris.target\n",
224 | "\n",
225 | "# 트레이닝 / 테스트 데이터 분할\n",
226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
227 | "\n",
228 | "\n",
229 | "# 표준화 스케일\n",
230 | "std_scale = StandardScaler()\n",
231 | "std_scale.fit(X_tn)\n",
232 | "X_tn_std = std_scale.transform(X_tn)\n",
233 | "X_te_std = std_scale.transform(X_te)\n",
234 | "\n",
235 | "#학습\n",
236 | "clf_knn = KNeighborsClassifier(n_neighbors=2)\n",
237 | "clf_knn.fit(X_tn_std, y_tn)\n",
238 | "\n",
239 | "# 예측\n",
240 | "knn_pred = clf_knn.predict(X_te_std)\n",
241 | "print(knn_pred)\n",
242 | "\n",
243 | "# 정확도\n",
244 | "accuracy = accuracy_score(y_te, knn_pred)\n",
245 | "print(accuracy)\n",
246 | "\n",
247 | "# confusion matrix 확인 \n",
248 | "conf_matrix = confusion_matrix(y_te, knn_pred)\n",
249 | "print(conf_matrix)\n",
250 | "\n",
251 | "# 분류 레포트 확인\n",
252 | "class_report = classification_report(y_te, knn_pred)\n",
253 | "print(class_report)"
254 | ]
255 | },
256 | {
257 | "cell_type": "code",
258 | "execution_count": null,
259 | "metadata": {},
260 | "outputs": [],
261 | "source": []
262 | }
263 | ],
264 | "metadata": {
265 | "kernelspec": {
266 | "display_name": "Python 3",
267 | "language": "python",
268 | "name": "python3"
269 | },
270 | "language_info": {
271 | "codemirror_mode": {
272 | "name": "ipython",
273 | "version": 3
274 | },
275 | "file_extension": ".py",
276 | "mimetype": "text/x-python",
277 | "name": "python",
278 | "nbconvert_exporter": "python",
279 | "pygments_lexer": "ipython3",
280 | "version": "3.7.6"
281 | }
282 | },
283 | "nbformat": 4,
284 | "nbformat_minor": 4
285 | }
286 |
--------------------------------------------------------------------------------
/08장_지도학습_6절_나이브베이즈.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드 "
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "#데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "GaussianNB(priors=None, var_smoothing=1e-09)"
66 | ]
67 | },
68 | "execution_count": 5,
69 | "metadata": {},
70 | "output_type": "execute_result"
71 | }
72 | ],
73 | "source": [
74 | "# 나이브 베이즈 학습\n",
75 | "from sklearn.naive_bayes import GaussianNB\n",
76 | "clf_gnb = GaussianNB()\n",
77 | "clf_gnb.fit(X_tn_std, y_tn)"
78 | ]
79 | },
80 | {
81 | "cell_type": "code",
82 | "execution_count": 12,
83 | "metadata": {},
84 | "outputs": [
85 | {
86 | "name": "stdout",
87 | "output_type": "stream",
88 | "text": [
89 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
90 | " 1 1 2 0 0 1 1 1]\n"
91 | ]
92 | }
93 | ],
94 | "source": [
95 | "# 예측\n",
96 | "pred_gnb = clf_gnb.predict(X_te_std)\n",
97 | "print(pred_gnb)"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 15,
103 | "metadata": {},
104 | "outputs": [
105 | {
106 | "name": "stdout",
107 | "output_type": "stream",
108 | "text": [
109 | "0.9523809523809524\n"
110 | ]
111 | }
112 | ],
113 | "source": [
114 | "# 리콜\n",
115 | "from sklearn.metrics import recall_score\n",
116 | "recall = recall_score(y_te, pred_gnb, average='macro')\n",
117 | "print(recall)"
118 | ]
119 | },
120 | {
121 | "cell_type": "code",
122 | "execution_count": 26,
123 | "metadata": {},
124 | "outputs": [
125 | {
126 | "name": "stdout",
127 | "output_type": "stream",
128 | "text": [
129 | "[[16 0 0]\n",
130 | " [ 2 18 1]\n",
131 | " [ 0 0 8]]\n"
132 | ]
133 | }
134 | ],
135 | "source": [
136 | "# confusion matrix 확인 \n",
137 | "from sklearn.metrics import confusion_matrix\n",
138 | "conf_matrix = confusion_matrix(y_te, pred_gnb)\n",
139 | "print(conf_matrix)"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": 27,
145 | "metadata": {},
146 | "outputs": [
147 | {
148 | "name": "stdout",
149 | "output_type": "stream",
150 | "text": [
151 | " precision recall f1-score support\n",
152 | "\n",
153 | " 0 0.89 1.00 0.94 16\n",
154 | " 1 1.00 0.86 0.92 21\n",
155 | " 2 0.89 1.00 0.94 8\n",
156 | "\n",
157 | " accuracy 0.93 45\n",
158 | " macro avg 0.93 0.95 0.94 45\n",
159 | "weighted avg 0.94 0.93 0.93 45\n",
160 | "\n"
161 | ]
162 | }
163 | ],
164 | "source": [
165 | "# 분류 레포트 확인\n",
166 | "from sklearn.metrics import classification_report\n",
167 | "class_report = classification_report(y_te, pred_gnb)\n",
168 | "print(class_report)"
169 | ]
170 | },
171 | {
172 | "cell_type": "markdown",
173 | "metadata": {},
174 | "source": [
175 | "# 통합코드"
176 | ]
177 | },
178 | {
179 | "cell_type": "code",
180 | "execution_count": 1,
181 | "metadata": {},
182 | "outputs": [
183 | {
184 | "name": "stdout",
185 | "output_type": "stream",
186 | "text": [
187 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
188 | " 1 1 2 0 0 1 1 1]\n",
189 | "0.9523809523809524\n",
190 | "[[16 0 0]\n",
191 | " [ 2 18 1]\n",
192 | " [ 0 0 8]]\n",
193 | " precision recall f1-score support\n",
194 | "\n",
195 | " 0 0.89 1.00 0.94 16\n",
196 | " 1 1.00 0.86 0.92 21\n",
197 | " 2 0.89 1.00 0.94 8\n",
198 | "\n",
199 | " accuracy 0.93 45\n",
200 | " macro avg 0.93 0.95 0.94 45\n",
201 | "weighted avg 0.94 0.93 0.93 45\n",
202 | "\n"
203 | ]
204 | }
205 | ],
206 | "source": [
207 | "from sklearn import datasets\n",
208 | "from sklearn.preprocessing import StandardScaler\n",
209 | "from sklearn.model_selection import train_test_split\n",
210 | "\n",
211 | "from sklearn.naive_bayes import GaussianNB\n",
212 | "\n",
213 | "from sklearn.metrics import recall_score\n",
214 | "from sklearn.metrics import confusion_matrix\n",
215 | "from sklearn.metrics import classification_report\n",
216 | "\n",
217 | "\n",
218 | "# 데이터 불러오기\n",
219 | "raw_wine = datasets.load_wine()\n",
220 | "\n",
221 | "# 피쳐, 타겟 데이터 지정\n",
222 | "X = raw_wine.data\n",
223 | "y = raw_wine.target\n",
224 | "\n",
225 | "# 트레이닝/테스트 데이터 분할\n",
226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
227 | "\n",
228 | "# 데이터 표준화\n",
229 | "std_scale = StandardScaler()\n",
230 | "std_scale.fit(X_tn)\n",
231 | "X_tn_std = std_scale.transform(X_tn)\n",
232 | "X_te_std = std_scale.transform(X_te)\n",
233 | "\n",
234 | "# 나이브 베이즈 학습\n",
235 | "clf_gnb = GaussianNB()\n",
236 | "clf_gnb.fit(X_tn_std, y_tn)\n",
237 | "\n",
238 | "# 예측\n",
239 | "pred_gnb = clf_gnb.predict(X_te_std)\n",
240 | "print(pred_gnb)\n",
241 | "\n",
242 | "# 리콜\n",
243 | "recall = recall_score(y_te, pred_gnb, average='macro')\n",
244 | "print(recall)\n",
245 | "\n",
246 | "# confusion matrix 확인 \n",
247 | "conf_matrix = confusion_matrix(y_te, pred_gnb)\n",
248 | "print(conf_matrix)\n",
249 | "\n",
250 | "# 분류 레포트 확인\n",
251 | "class_report = classification_report(y_te, pred_gnb)\n",
252 | "print(class_report)"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": null,
258 | "metadata": {},
259 | "outputs": [],
260 | "source": []
261 | }
262 | ],
263 | "metadata": {
264 | "kernelspec": {
265 | "display_name": "Python 3",
266 | "language": "python",
267 | "name": "python3"
268 | },
269 | "language_info": {
270 | "codemirror_mode": {
271 | "name": "ipython",
272 | "version": 3
273 | },
274 | "file_extension": ".py",
275 | "mimetype": "text/x-python",
276 | "name": "python",
277 | "nbconvert_exporter": "python",
278 | "pygments_lexer": "ipython3",
279 | "version": "3.7.6"
280 | }
281 | },
282 | "nbformat": 4,
283 | "nbformat_minor": 4
284 | }
285 |
--------------------------------------------------------------------------------
/08장_지도학습_7절_의사결정나무.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n",
66 | " max_depth=None, max_features=None, max_leaf_nodes=None,\n",
67 | " min_impurity_decrease=0.0, min_impurity_split=None,\n",
68 | " min_samples_leaf=1, min_samples_split=2,\n",
69 | " min_weight_fraction_leaf=0.0, presort='deprecated',\n",
70 | " random_state=0, splitter='best')"
71 | ]
72 | },
73 | "execution_count": 5,
74 | "metadata": {},
75 | "output_type": "execute_result"
76 | }
77 | ],
78 | "source": [
79 | "# 의사결정나무 학습\n",
80 | "from sklearn import tree \n",
81 | "clf_tree = tree.DecisionTreeClassifier(random_state=0)\n",
82 | "clf_tree.fit(X_tn_std, y_tn)"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 6,
88 | "metadata": {},
89 | "outputs": [
90 | {
91 | "name": "stdout",
92 | "output_type": "stream",
93 | "text": [
94 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
95 | " 1 1 2 1 0 1 1 1]\n"
96 | ]
97 | }
98 | ],
99 | "source": [
100 | "# 예측\n",
101 | "pred_tree = clf_tree.predict(X_te_std)\n",
102 | "print(pred_tree)"
103 | ]
104 | },
105 | {
106 | "cell_type": "code",
107 | "execution_count": 7,
108 | "metadata": {},
109 | "outputs": [
110 | {
111 | "name": "stdout",
112 | "output_type": "stream",
113 | "text": [
114 | "0.9349141206870346\n"
115 | ]
116 | }
117 | ],
118 | "source": [
119 | "# f1 score\n",
120 | "from sklearn.metrics import f1_score\n",
121 | "f1 = f1_score(y_te, pred_tree, average='macro')\n",
122 | "print(f1)"
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": 8,
128 | "metadata": {},
129 | "outputs": [
130 | {
131 | "name": "stdout",
132 | "output_type": "stream",
133 | "text": [
134 | "[[14 2 0]\n",
135 | " [ 0 20 1]\n",
136 | " [ 0 0 8]]\n"
137 | ]
138 | }
139 | ],
140 | "source": [
141 | "# confusion matrix 확인 \n",
142 | "from sklearn.metrics import confusion_matrix\n",
143 | "conf_matrix = confusion_matrix(y_te, pred_tree)\n",
144 | "print(conf_matrix)"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 9,
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "name": "stdout",
154 | "output_type": "stream",
155 | "text": [
156 | " precision recall f1-score support\n",
157 | "\n",
158 | " 0 1.00 0.88 0.93 16\n",
159 | " 1 0.91 0.95 0.93 21\n",
160 | " 2 0.89 1.00 0.94 8\n",
161 | "\n",
162 | " accuracy 0.93 45\n",
163 | " macro avg 0.93 0.94 0.93 45\n",
164 | "weighted avg 0.94 0.93 0.93 45\n",
165 | "\n"
166 | ]
167 | }
168 | ],
169 | "source": [
170 | "# 분류 레포트 확인\n",
171 | "from sklearn.metrics import classification_report\n",
172 | "class_report = classification_report(y_te, pred_tree)\n",
173 | "print(class_report)"
174 | ]
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "metadata": {},
179 | "source": [
180 | "# 통합 코드"
181 | ]
182 | },
183 | {
184 | "cell_type": "code",
185 | "execution_count": 10,
186 | "metadata": {},
187 | "outputs": [
188 | {
189 | "name": "stdout",
190 | "output_type": "stream",
191 | "text": [
192 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
193 | " 1 1 2 1 0 1 1 1]\n",
194 | "0.9349141206870346\n",
195 | "[[14 2 0]\n",
196 | " [ 0 20 1]\n",
197 | " [ 0 0 8]]\n",
198 | " precision recall f1-score support\n",
199 | "\n",
200 | " 0 1.00 0.88 0.93 16\n",
201 | " 1 0.91 0.95 0.93 21\n",
202 | " 2 0.89 1.00 0.94 8\n",
203 | "\n",
204 | " accuracy 0.93 45\n",
205 | " macro avg 0.93 0.94 0.93 45\n",
206 | "weighted avg 0.94 0.93 0.93 45\n",
207 | "\n"
208 | ]
209 | }
210 | ],
211 | "source": [
212 | "from sklearn import datasets\n",
213 | "from sklearn.preprocessing import StandardScaler\n",
214 | "from sklearn.model_selection import train_test_split\n",
215 | "\n",
216 | "from sklearn import tree \n",
217 | "\n",
218 | "from sklearn.metrics import f1_score\n",
219 | "from sklearn.metrics import confusion_matrix\n",
220 | "from sklearn.metrics import classification_report\n",
221 | "\n",
222 | "\n",
223 | "# 데이터 불러오기\n",
224 | "raw_wine = datasets.load_wine()\n",
225 | "\n",
226 | "# 피쳐, 타겟 데이터 지정\n",
227 | "X = raw_wine.data\n",
228 | "y = raw_wine.target\n",
229 | "\n",
230 | "# 트레이닝/테스트 데이터 분할\n",
231 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
232 | "\n",
233 | "# 데이터 표준화\n",
234 | "std_scale = StandardScaler()\n",
235 | "std_scale.fit(X_tn)\n",
236 | "X_tn_std = std_scale.transform(X_tn)\n",
237 | "X_te_std = std_scale.transform(X_te)\n",
238 | "\n",
239 | "# 의사결정나무 학습\n",
240 | "clf_tree = tree.DecisionTreeClassifier(random_state=0)\n",
241 | "clf_tree.fit(X_tn_std, y_tn)\n",
242 | "\n",
243 | "# 예측\n",
244 | "pred_tree = clf_tree.predict(X_te_std)\n",
245 | "print(pred_tree)\n",
246 | "\n",
247 | "# f1 score\n",
248 | "from sklearn.metrics import f1_score\n",
249 | "f1 = f1_score(y_te, pred_tree, average='macro')\n",
250 | "print(f1)\n",
251 | "\n",
252 | "# confusion matrix 확인 \n",
253 | "conf_matrix = confusion_matrix(y_te, pred_tree)\n",
254 | "print(conf_matrix)\n",
255 | "\n",
256 | "# 분류 레포트 확인\n",
257 | "class_report = classification_report(y_te, pred_tree)\n",
258 | "print(class_report)"
259 | ]
260 | },
261 | {
262 | "cell_type": "code",
263 | "execution_count": null,
264 | "metadata": {},
265 | "outputs": [],
266 | "source": []
267 | },
268 | {
269 | "cell_type": "code",
270 | "execution_count": null,
271 | "metadata": {},
272 | "outputs": [],
273 | "source": []
274 | },
275 | {
276 | "cell_type": "code",
277 | "execution_count": null,
278 | "metadata": {},
279 | "outputs": [],
280 | "source": []
281 | }
282 | ],
283 | "metadata": {
284 | "kernelspec": {
285 | "display_name": "Python 3",
286 | "language": "python",
287 | "name": "python3"
288 | },
289 | "language_info": {
290 | "codemirror_mode": {
291 | "name": "ipython",
292 | "version": 3
293 | },
294 | "file_extension": ".py",
295 | "mimetype": "text/x-python",
296 | "name": "python",
297 | "nbconvert_exporter": "python",
298 | "pygments_lexer": "ipython3",
299 | "version": "3.7.6"
300 | }
301 | },
302 | "nbformat": 4,
303 | "nbformat_minor": 4
304 | }
305 |
--------------------------------------------------------------------------------
/08장_지도학습_8절_서포트벡터머신.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별코드 "
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,\n",
66 | " decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',\n",
67 | " max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,\n",
68 | " verbose=False)"
69 | ]
70 | },
71 | "execution_count": 5,
72 | "metadata": {},
73 | "output_type": "execute_result"
74 | }
75 | ],
76 | "source": [
77 | "# 서포트벡터머신 학습\n",
78 | "from sklearn import svm \n",
79 | "clf_svm_lr = svm.SVC(kernel='linear', random_state=0)\n",
80 | "clf_svm_lr.fit(X_tn_std, y_tn)"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": 6,
86 | "metadata": {},
87 | "outputs": [
88 | {
89 | "name": "stdout",
90 | "output_type": "stream",
91 | "text": [
92 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
93 | " 1 1 2 0 0 1 1 1]\n"
94 | ]
95 | }
96 | ],
97 | "source": [
98 | "# 예측\n",
99 | "pred_svm = clf_svm_lr.predict(X_te_std)\n",
100 | "print(pred_svm)"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 7,
106 | "metadata": {},
107 | "outputs": [
108 | {
109 | "name": "stdout",
110 | "output_type": "stream",
111 | "text": [
112 | "1.0\n"
113 | ]
114 | }
115 | ],
116 | "source": [
117 | "# 정확도\n",
118 | "from sklearn.metrics import accuracy_score\n",
119 | "accuracy = accuracy_score(y_te, pred_svm)\n",
120 | "print(accuracy)"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 8,
126 | "metadata": {},
127 | "outputs": [
128 | {
129 | "name": "stdout",
130 | "output_type": "stream",
131 | "text": [
132 | "[[16 0 0]\n",
133 | " [ 0 21 0]\n",
134 | " [ 0 0 8]]\n"
135 | ]
136 | }
137 | ],
138 | "source": [
139 | "# confusion matrix 확인 \n",
140 | "from sklearn.metrics import confusion_matrix\n",
141 | "conf_matrix = confusion_matrix(y_te, pred_svm)\n",
142 | "print(conf_matrix)"
143 | ]
144 | },
145 | {
146 | "cell_type": "code",
147 | "execution_count": 9,
148 | "metadata": {},
149 | "outputs": [
150 | {
151 | "name": "stdout",
152 | "output_type": "stream",
153 | "text": [
154 | " precision recall f1-score support\n",
155 | "\n",
156 | " 0 1.00 1.00 1.00 16\n",
157 | " 1 1.00 1.00 1.00 21\n",
158 | " 2 1.00 1.00 1.00 8\n",
159 | "\n",
160 | " accuracy 1.00 45\n",
161 | " macro avg 1.00 1.00 1.00 45\n",
162 | "weighted avg 1.00 1.00 1.00 45\n",
163 | "\n"
164 | ]
165 | }
166 | ],
167 | "source": [
168 | "# 분류 레포트 확인\n",
169 | "from sklearn.metrics import classification_report\n",
170 | "class_report = classification_report(y_te, pred_svm)\n",
171 | "print(class_report)"
172 | ]
173 | },
174 | {
175 | "cell_type": "markdown",
176 | "metadata": {},
177 | "source": [
178 | "# 통합코드"
179 | ]
180 | },
181 | {
182 | "cell_type": "code",
183 | "execution_count": 1,
184 | "metadata": {},
185 | "outputs": [
186 | {
187 | "name": "stdout",
188 | "output_type": "stream",
189 | "text": [
190 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
191 | " 1 1 2 0 0 1 1 1]\n",
192 | "1.0\n",
193 | "[[16 0 0]\n",
194 | " [ 0 21 0]\n",
195 | " [ 0 0 8]]\n",
196 | " precision recall f1-score support\n",
197 | "\n",
198 | " 0 1.00 1.00 1.00 16\n",
199 | " 1 1.00 1.00 1.00 21\n",
200 | " 2 1.00 1.00 1.00 8\n",
201 | "\n",
202 | " accuracy 1.00 45\n",
203 | " macro avg 1.00 1.00 1.00 45\n",
204 | "weighted avg 1.00 1.00 1.00 45\n",
205 | "\n"
206 | ]
207 | }
208 | ],
209 | "source": [
210 | "from sklearn import datasets\n",
211 | "from sklearn.preprocessing import StandardScaler\n",
212 | "from sklearn.model_selection import train_test_split\n",
213 | "\n",
214 | "from sklearn import svm \n",
215 | "\n",
216 | "from sklearn.metrics import accuracy_score\n",
217 | "from sklearn.metrics import confusion_matrix\n",
218 | "from sklearn.metrics import classification_report\n",
219 | "\n",
220 | "# 데이터 불러오기\n",
221 | "raw_wine = datasets.load_wine()\n",
222 | "\n",
223 | "# 피쳐, 타겟 데이터 지정\n",
224 | "X = raw_wine.data\n",
225 | "y = raw_wine.target\n",
226 | "\n",
227 | "# 트레이닝/테스트 데이터 분할\n",
228 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
229 | "\n",
230 | "# 데이터 표준화\n",
231 | "std_scale = StandardScaler()\n",
232 | "std_scale.fit(X_tn)\n",
233 | "X_tn_std = std_scale.transform(X_tn)\n",
234 | "X_te_std = std_scale.transform(X_te)\n",
235 | "\n",
236 | "# 서포트벡터머신 학습\n",
237 | "clf_svm_lr = svm.SVC(kernel='linear', random_state=0)\n",
238 | "clf_svm_lr.fit(X_tn_std, y_tn)\n",
239 | "\n",
240 | "# 예측\n",
241 | "pred_svm = clf_svm_lr.predict(X_te_std)\n",
242 | "print(pred_svm)\n",
243 | "\n",
244 | "# 정확도\n",
245 | "accuracy = accuracy_score(y_te, pred_svm)\n",
246 | "print(accuracy)\n",
247 | "\n",
248 | "# confusion matrix 확인 \n",
249 | "conf_matrix = confusion_matrix(y_te, pred_svm)\n",
250 | "print(conf_matrix)\n",
251 | "\n",
252 | "# 분류 레포트 확인\n",
253 | "class_report = classification_report(y_te, pred_svm)\n",
254 | "print(class_report)"
255 | ]
256 | },
257 | {
258 | "cell_type": "code",
259 | "execution_count": null,
260 | "metadata": {},
261 | "outputs": [],
262 | "source": []
263 | }
264 | ],
265 | "metadata": {
266 | "kernelspec": {
267 | "display_name": "Python 3",
268 | "language": "python",
269 | "name": "python3"
270 | },
271 | "language_info": {
272 | "codemirror_mode": {
273 | "name": "ipython",
274 | "version": 3
275 | },
276 | "file_extension": ".py",
277 | "mimetype": "text/x-python",
278 | "name": "python",
279 | "nbconvert_exporter": "python",
280 | "pygments_lexer": "ipython3",
281 | "version": "3.7.6"
282 | }
283 | },
284 | "nbformat": 4,
285 | "nbformat_minor": 4
286 | }
287 |
--------------------------------------------------------------------------------
/09장_앙상블_2절_보팅.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_iris = datasets.load_iris()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_iris.data\n",
29 | "y = raw_iris.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "VotingClassifier(estimators=[('lr',\n",
66 | " LogisticRegression(C=1.0, class_weight=None,\n",
67 | " dual=False, fit_intercept=True,\n",
68 | " intercept_scaling=1,\n",
69 | " l1_ratio=None, max_iter=100,\n",
70 | " multi_class='multinomial',\n",
71 | " n_jobs=None, penalty='l2',\n",
72 | " random_state=1, solver='lbfgs',\n",
73 | " tol=0.0001, verbose=0,\n",
74 | " warm_start=False)),\n",
75 | " ('svm',\n",
76 | " SVC(C=1.0, break_ties=False, cache_size=200,\n",
77 | " class_weight=None, coef0=0.0,\n",
78 | " decision_function_shape='ovr', degree=3,\n",
79 | " gamma='scale', kernel='linear', max_iter=-1,\n",
80 | " probability=False, random_state=1,\n",
81 | " shrinking=True, tol=0.001, verbose=False)),\n",
82 | " ('gnb',\n",
83 | " GaussianNB(priors=None, var_smoothing=1e-09))],\n",
84 | " flatten_transform=True, n_jobs=None, voting='hard',\n",
85 | " weights=[1, 1, 1])"
86 | ]
87 | },
88 | "execution_count": 5,
89 | "metadata": {},
90 | "output_type": "execute_result"
91 | }
92 | ],
93 | "source": [
94 | "# 보팅 학습\n",
95 | "from sklearn.linear_model import LogisticRegression\n",
96 | "from sklearn import svm\n",
97 | "from sklearn.naive_bayes import GaussianNB\n",
98 | "from sklearn.ensemble import VotingClassifier\n",
99 | "\n",
100 | "clf1 = LogisticRegression(multi_class='multinomial', \n",
101 | " random_state=1)\n",
102 | "clf2 = svm.SVC(kernel='linear', \n",
103 | " random_state=1) \n",
104 | "clf3 = GaussianNB()\n",
105 | "\n",
106 | "clf_voting = VotingClassifier(\n",
107 | " estimators=[\n",
108 | " ('lr', clf1), \n",
109 | " ('svm', clf2), \n",
110 | " ('gnb', clf3)\n",
111 | " ],\n",
112 | " voting='hard',\n",
113 | " weights=[1,1,1])\n",
114 | "clf_voting.fit(X_tn_std, y_tn)"
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": 6,
120 | "metadata": {},
121 | "outputs": [
122 | {
123 | "name": "stdout",
124 | "output_type": "stream",
125 | "text": [
126 | "[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n",
127 | " 2]\n"
128 | ]
129 | }
130 | ],
131 | "source": [
132 | "# 예측\n",
133 | "pred_voting = clf_voting.predict(X_te_std)\n",
134 | "print(pred_voting)"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 7,
140 | "metadata": {},
141 | "outputs": [
142 | {
143 | "name": "stdout",
144 | "output_type": "stream",
145 | "text": [
146 | "0.9736842105263158\n"
147 | ]
148 | }
149 | ],
150 | "source": [
151 | "# 정확도\n",
152 | "from sklearn.metrics import accuracy_score\n",
153 | "accuracy = accuracy_score(y_te, pred_voting)\n",
154 | "print(accuracy)"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 8,
160 | "metadata": {},
161 | "outputs": [
162 | {
163 | "name": "stdout",
164 | "output_type": "stream",
165 | "text": [
166 | "[[13 0 0]\n",
167 | " [ 0 15 1]\n",
168 | " [ 0 0 9]]\n"
169 | ]
170 | }
171 | ],
172 | "source": [
173 | "# confusion matrix 확인 \n",
174 | "from sklearn.metrics import confusion_matrix\n",
175 | "conf_matrix = confusion_matrix(y_te, pred_voting)\n",
176 | "print(conf_matrix)"
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "execution_count": 9,
182 | "metadata": {},
183 | "outputs": [
184 | {
185 | "name": "stdout",
186 | "output_type": "stream",
187 | "text": [
188 | " precision recall f1-score support\n",
189 | "\n",
190 | " 0 1.00 1.00 1.00 13\n",
191 | " 1 1.00 0.94 0.97 16\n",
192 | " 2 0.90 1.00 0.95 9\n",
193 | "\n",
194 | " accuracy 0.97 38\n",
195 | " macro avg 0.97 0.98 0.97 38\n",
196 | "weighted avg 0.98 0.97 0.97 38\n",
197 | "\n"
198 | ]
199 | }
200 | ],
201 | "source": [
202 | "# 분류 레포트 확인\n",
203 | "from sklearn.metrics import classification_report\n",
204 | "class_report = classification_report(y_te, pred_voting)\n",
205 | "print(class_report)"
206 | ]
207 | },
208 | {
209 | "cell_type": "markdown",
210 | "metadata": {},
211 | "source": [
212 | "# 통합 코드"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": 11,
218 | "metadata": {},
219 | "outputs": [
220 | {
221 | "name": "stdout",
222 | "output_type": "stream",
223 | "text": [
224 | "[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n",
225 | " 2]\n",
226 | "0.9736842105263158\n",
227 | "[[13 0 0]\n",
228 | " [ 0 15 1]\n",
229 | " [ 0 0 9]]\n",
230 | " precision recall f1-score support\n",
231 | "\n",
232 | " 0 1.00 1.00 1.00 13\n",
233 | " 1 1.00 0.94 0.97 16\n",
234 | " 2 0.90 1.00 0.95 9\n",
235 | "\n",
236 | " accuracy 0.97 38\n",
237 | " macro avg 0.97 0.98 0.97 38\n",
238 | "weighted avg 0.98 0.97 0.97 38\n",
239 | "\n"
240 | ]
241 | }
242 | ],
243 | "source": [
244 | "from sklearn import datasets\n",
245 | "from sklearn.model_selection import train_test_split\n",
246 | "from sklearn.preprocessing import StandardScaler\n",
247 | "\n",
248 | "from sklearn.linear_model import LogisticRegression\n",
249 | "from sklearn import svm\n",
250 | "from sklearn.naive_bayes import GaussianNB\n",
251 | "from sklearn.ensemble import VotingClassifier\n",
252 | "\n",
253 | "from sklearn.metrics import accuracy_score\n",
254 | "from sklearn.metrics import confusion_matrix\n",
255 | "from sklearn.metrics import classification_report\n",
256 | "\n",
257 | "# 데이터 불러오기\n",
258 | "raw_iris = datasets.load_iris()\n",
259 | "\n",
260 | "# 피쳐, 타겟 데이터 지정\n",
261 | "X = raw_iris.data\n",
262 | "y = raw_iris.target\n",
263 | "\n",
264 | "# 트레이닝/테스트 데이터 분할\n",
265 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
266 | "\n",
267 | "# 데이터 표준화\n",
268 | "std_scale = StandardScaler()\n",
269 | "std_scale.fit(X_tn)\n",
270 | "X_tn_std = std_scale.transform(X_tn)\n",
271 | "X_te_std = std_scale.transform(X_te)\n",
272 | "\n",
273 | "# 보팅 학습\n",
274 | "clf1 = LogisticRegression(multi_class='multinomial', \n",
275 | " random_state=1)\n",
276 | "clf2 = svm.SVC(kernel='linear', \n",
277 | " random_state=1) \n",
278 | "clf3 = GaussianNB()\n",
279 | "\n",
280 | "clf_voting = VotingClassifier(\n",
281 | " estimators=[\n",
282 | " ('lr', clf1), \n",
283 | " ('svm', clf2), \n",
284 | " ('gnb', clf3)\n",
285 | " ],\n",
286 | " voting='hard',\n",
287 | " weights=[1,1,1])\n",
288 | "clf_voting.fit(X_tn_std, y_tn)\n",
289 | "\n",
290 | "# 예측\n",
291 | "pred_voting = clf_voting.predict(X_te_std)\n",
292 | "print(pred_voting)\n",
293 | "\n",
294 | "# 정확도\n",
295 | "accuracy = accuracy_score(y_te, pred_voting)\n",
296 | "print(accuracy)\n",
297 | "\n",
298 | "# confusion matrix 확인 \n",
299 | "conf_matrix = confusion_matrix(y_te, pred_voting)\n",
300 | "print(conf_matrix)\n",
301 | "\n",
302 | "# 분류 레포트 확인\n",
303 | "class_report = classification_report(y_te, pred_voting)\n",
304 | "print(class_report)"
305 | ]
306 | },
307 | {
308 | "cell_type": "code",
309 | "execution_count": null,
310 | "metadata": {},
311 | "outputs": [],
312 | "source": []
313 | },
314 | {
315 | "cell_type": "code",
316 | "execution_count": null,
317 | "metadata": {},
318 | "outputs": [],
319 | "source": []
320 | }
321 | ],
322 | "metadata": {
323 | "kernelspec": {
324 | "display_name": "Python 3",
325 | "language": "python",
326 | "name": "python3"
327 | },
328 | "language_info": {
329 | "codemirror_mode": {
330 | "name": "ipython",
331 | "version": 3
332 | },
333 | "file_extension": ".py",
334 | "mimetype": "text/x-python",
335 | "name": "python",
336 | "nbconvert_exporter": "python",
337 | "pygments_lexer": "ipython3",
338 | "version": "3.7.6"
339 | }
340 | },
341 | "nbformat": 4,
342 | "nbformat_minor": 4
343 | }
344 |
--------------------------------------------------------------------------------
/09장_앙상블_3절_1_랜덤포레스트.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 2,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 3,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 4,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 5,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 6,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n",
66 | " criterion='gini', max_depth=2, max_features='auto',\n",
67 | " max_leaf_nodes=None, max_samples=None,\n",
68 | " min_impurity_decrease=0.0, min_impurity_split=None,\n",
69 | " min_samples_leaf=1, min_samples_split=2,\n",
70 | " min_weight_fraction_leaf=0.0, n_estimators=100,\n",
71 | " n_jobs=None, oob_score=False, random_state=0, verbose=0,\n",
72 | " warm_start=False)"
73 | ]
74 | },
75 | "execution_count": 6,
76 | "metadata": {},
77 | "output_type": "execute_result"
78 | }
79 | ],
80 | "source": [
81 | "from sklearn.ensemble import RandomForestClassifier\n",
82 | "clf_rf = RandomForestClassifier(max_depth=2, \n",
83 | " random_state=0)\n",
84 | "clf_rf.fit(X_tn_std, y_tn)"
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 7,
90 | "metadata": {},
91 | "outputs": [
92 | {
93 | "name": "stdout",
94 | "output_type": "stream",
95 | "text": [
96 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
97 | " 1 1 2 0 0 1 1 1]\n"
98 | ]
99 | }
100 | ],
101 | "source": [
102 | "# 예측\n",
103 | "pred_rf = clf_rf.predict(X_te_std)\n",
104 | "print(pred_rf)"
105 | ]
106 | },
107 | {
108 | "cell_type": "code",
109 | "execution_count": 8,
110 | "metadata": {},
111 | "outputs": [
112 | {
113 | "name": "stdout",
114 | "output_type": "stream",
115 | "text": [
116 | "0.9555555555555556\n"
117 | ]
118 | }
119 | ],
120 | "source": [
121 | "# 정확도\n",
122 | "from sklearn.metrics import accuracy_score\n",
123 | "accuracy = accuracy_score(y_te, pred_rf)\n",
124 | "print(accuracy)"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": 9,
130 | "metadata": {},
131 | "outputs": [
132 | {
133 | "name": "stdout",
134 | "output_type": "stream",
135 | "text": [
136 | "[[16 0 0]\n",
137 | " [ 1 19 1]\n",
138 | " [ 0 0 8]]\n"
139 | ]
140 | }
141 | ],
142 | "source": [
143 | "# confusion matrix 확인 \n",
144 | "from sklearn.metrics import confusion_matrix\n",
145 | "conf_matrix = confusion_matrix(y_te, pred_rf)\n",
146 | "print(conf_matrix)"
147 | ]
148 | },
149 | {
150 | "cell_type": "code",
151 | "execution_count": 10,
152 | "metadata": {},
153 | "outputs": [
154 | {
155 | "name": "stdout",
156 | "output_type": "stream",
157 | "text": [
158 | " precision recall f1-score support\n",
159 | "\n",
160 | " 0 0.94 1.00 0.97 16\n",
161 | " 1 1.00 0.90 0.95 21\n",
162 | " 2 0.89 1.00 0.94 8\n",
163 | "\n",
164 | " accuracy 0.96 45\n",
165 | " macro avg 0.94 0.97 0.95 45\n",
166 | "weighted avg 0.96 0.96 0.96 45\n",
167 | "\n"
168 | ]
169 | }
170 | ],
171 | "source": [
172 | "# 분류 레포트 확인\n",
173 | "from sklearn.metrics import classification_report\n",
174 | "class_report = classification_report(y_te, pred_rf)\n",
175 | "print(class_report)"
176 | ]
177 | },
178 | {
179 | "cell_type": "markdown",
180 | "metadata": {},
181 | "source": [
182 | "# 통합 코드"
183 | ]
184 | },
185 | {
186 | "cell_type": "code",
187 | "execution_count": 11,
188 | "metadata": {},
189 | "outputs": [
190 | {
191 | "name": "stdout",
192 | "output_type": "stream",
193 | "text": [
194 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
195 | " 1 1 2 0 0 1 1 1]\n",
196 | "0.9555555555555556\n",
197 | "[[16 0 0]\n",
198 | " [ 1 19 1]\n",
199 | " [ 0 0 8]]\n",
200 | " precision recall f1-score support\n",
201 | "\n",
202 | " 0 0.94 1.00 0.97 16\n",
203 | " 1 1.00 0.90 0.95 21\n",
204 | " 2 0.89 1.00 0.94 8\n",
205 | "\n",
206 | " accuracy 0.96 45\n",
207 | " macro avg 0.94 0.97 0.95 45\n",
208 | "weighted avg 0.96 0.96 0.96 45\n",
209 | "\n"
210 | ]
211 | }
212 | ],
213 | "source": [
214 | "from sklearn import datasets\n",
215 | "from sklearn.model_selection import train_test_split\n",
216 | "from sklearn.preprocessing import StandardScaler\n",
217 | "\n",
218 | "from sklearn.ensemble import RandomForestClassifier\n",
219 | "\n",
220 | "from sklearn.metrics import accuracy_score\n",
221 | "from sklearn.metrics import confusion_matrix\n",
222 | "from sklearn.metrics import classification_report\n",
223 | "\n",
224 | "# 데이터 불러오기\n",
225 | "raw_wine = datasets.load_wine()\n",
226 | "\n",
227 | "# 피쳐, 타겟 데이터 지정\n",
228 | "X = raw_wine.data\n",
229 | "y = raw_wine.target\n",
230 | "\n",
231 | "# 트레이닝/테스트 데이터 분할\n",
232 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
233 | "\n",
234 | "# 데이터 표준화\n",
235 | "std_scale = StandardScaler()\n",
236 | "std_scale.fit(X_tn)\n",
237 | "X_tn_std = std_scale.transform(X_tn)\n",
238 | "X_te_std = std_scale.transform(X_te)\n",
239 | "\n",
240 | "# 랜덤포레스트 학습\n",
241 | "clf_rf = RandomForestClassifier(max_depth=2, \n",
242 | " random_state=0)\n",
243 | "clf_rf.fit(X_tn_std, y_tn)\n",
244 | "\n",
245 | "# 예측\n",
246 | "pred_rf = clf_rf.predict(X_te_std)\n",
247 | "print(pred_rf)\n",
248 | "\n",
249 | "# 정확도\n",
250 | "accuracy = accuracy_score(y_te, pred_rf)\n",
251 | "print(accuracy)\n",
252 | "\n",
253 | "# confusion matrix 확인 \n",
254 | "conf_matrix = confusion_matrix(y_te, pred_rf)\n",
255 | "print(conf_matrix)\n",
256 | "\n",
257 | "# 분류 레포트 확인\n",
258 | "class_report = classification_report(y_te, pred_rf)\n",
259 | "print(class_report)"
260 | ]
261 | },
262 | {
263 | "cell_type": "code",
264 | "execution_count": null,
265 | "metadata": {},
266 | "outputs": [],
267 | "source": []
268 | }
269 | ],
270 | "metadata": {
271 | "kernelspec": {
272 | "display_name": "Python 3",
273 | "language": "python",
274 | "name": "python3"
275 | },
276 | "language_info": {
277 | "codemirror_mode": {
278 | "name": "ipython",
279 | "version": 3
280 | },
281 | "file_extension": ".py",
282 | "mimetype": "text/x-python",
283 | "name": "python",
284 | "nbconvert_exporter": "python",
285 | "pygments_lexer": "ipython3",
286 | "version": "3.7.6"
287 | }
288 | },
289 | "nbformat": 4,
290 | "nbformat_minor": 4
291 | }
292 |
--------------------------------------------------------------------------------
/09장_앙상블_3절_2_배깅.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_wine = datasets.load_wine()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_wine.data\n",
29 | "y = raw_wine.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 16,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "BaggingClassifier(base_estimator=GaussianNB(priors=None, var_smoothing=1e-09),\n",
66 | " bootstrap=True, bootstrap_features=False, max_features=1.0,\n",
67 | " max_samples=1.0, n_estimators=10, n_jobs=None,\n",
68 | " oob_score=False, random_state=0, verbose=0, warm_start=False)"
69 | ]
70 | },
71 | "execution_count": 16,
72 | "metadata": {},
73 | "output_type": "execute_result"
74 | }
75 | ],
76 | "source": [
77 | "# 배깅 학습\n",
78 | "from sklearn.naive_bayes import GaussianNB\n",
79 | "from sklearn.ensemble import BaggingClassifier\n",
80 | "clf_bagging = BaggingClassifier(base_estimator=GaussianNB(),\n",
81 | " n_estimators=10, \n",
82 | " random_state=0)\n",
83 | "clf_bagging.fit(X_tn_std, y_tn)"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 17,
89 | "metadata": {},
90 | "outputs": [
91 | {
92 | "name": "stdout",
93 | "output_type": "stream",
94 | "text": [
95 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
96 | " 1 1 2 0 0 1 1 1]\n"
97 | ]
98 | }
99 | ],
100 | "source": [
101 | "# 예측\n",
102 | "pred_bagging = clf_bagging.predict(X_te_std)\n",
103 | "print(pred_bagging)"
104 | ]
105 | },
106 | {
107 | "cell_type": "code",
108 | "execution_count": 18,
109 | "metadata": {},
110 | "outputs": [
111 | {
112 | "name": "stdout",
113 | "output_type": "stream",
114 | "text": [
115 | "0.9555555555555556\n"
116 | ]
117 | }
118 | ],
119 | "source": [
120 | "# 정확도\n",
121 | "from sklearn.metrics import accuracy_score\n",
122 | "accuracy = accuracy_score(y_te, pred_bagging)\n",
123 | "print(accuracy)"
124 | ]
125 | },
126 | {
127 | "cell_type": "code",
128 | "execution_count": 19,
129 | "metadata": {},
130 | "outputs": [
131 | {
132 | "name": "stdout",
133 | "output_type": "stream",
134 | "text": [
135 | "[[16 0 0]\n",
136 | " [ 1 19 1]\n",
137 | " [ 0 0 8]]\n"
138 | ]
139 | }
140 | ],
141 | "source": [
142 | "# confusion matrix 확인 \n",
143 | "from sklearn.metrics import confusion_matrix\n",
144 | "conf_matrix = confusion_matrix(y_te, pred_bagging)\n",
145 | "print(conf_matrix)"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": 20,
151 | "metadata": {},
152 | "outputs": [
153 | {
154 | "name": "stdout",
155 | "output_type": "stream",
156 | "text": [
157 | " precision recall f1-score support\n",
158 | "\n",
159 | " 0 0.94 1.00 0.97 16\n",
160 | " 1 1.00 0.90 0.95 21\n",
161 | " 2 0.89 1.00 0.94 8\n",
162 | "\n",
163 | " accuracy 0.96 45\n",
164 | " macro avg 0.94 0.97 0.95 45\n",
165 | "weighted avg 0.96 0.96 0.96 45\n",
166 | "\n"
167 | ]
168 | }
169 | ],
170 | "source": [
171 | "# 분류 레포트 확인\n",
172 | "from sklearn.metrics import classification_report\n",
173 | "class_report = classification_report(y_te, pred_bagging)\n",
174 | "print(class_report)"
175 | ]
176 | },
177 | {
178 | "cell_type": "markdown",
179 | "metadata": {},
180 | "source": [
181 | "# 통합 코드"
182 | ]
183 | },
184 | {
185 | "cell_type": "code",
186 | "execution_count": 21,
187 | "metadata": {},
188 | "outputs": [
189 | {
190 | "name": "stdout",
191 | "output_type": "stream",
192 | "text": [
193 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n",
194 | " 1 1 2 0 0 1 1 1]\n",
195 | "0.9555555555555556\n",
196 | "[[16 0 0]\n",
197 | " [ 1 19 1]\n",
198 | " [ 0 0 8]]\n",
199 | " precision recall f1-score support\n",
200 | "\n",
201 | " 0 0.94 1.00 0.97 16\n",
202 | " 1 1.00 0.90 0.95 21\n",
203 | " 2 0.89 1.00 0.94 8\n",
204 | "\n",
205 | " accuracy 0.96 45\n",
206 | " macro avg 0.94 0.97 0.95 45\n",
207 | "weighted avg 0.96 0.96 0.96 45\n",
208 | "\n"
209 | ]
210 | }
211 | ],
212 | "source": [
213 | "from sklearn import datasets\n",
214 | "from sklearn.model_selection import train_test_split\n",
215 | "from sklearn.preprocessing import StandardScaler\n",
216 | "\n",
217 | "from sklearn.naive_bayes import GaussianNB\n",
218 | "from sklearn.ensemble import BaggingClassifier\n",
219 | "\n",
220 | "from sklearn.metrics import accuracy_score\n",
221 | "from sklearn.metrics import confusion_matrix\n",
222 | "from sklearn.metrics import classification_report\n",
223 | "\n",
224 | "# 데이터 불러오기\n",
225 | "raw_wine = datasets.load_wine()\n",
226 | "\n",
227 | "# 피쳐, 타겟 데이터 지정\n",
228 | "X = raw_wine.data\n",
229 | "y = raw_wine.target\n",
230 | "\n",
231 | "# 트레이닝/테스트 데이터 분할\n",
232 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
233 | "\n",
234 | "# 데이터 표준화\n",
235 | "std_scale = StandardScaler()\n",
236 | "std_scale.fit(X_tn)\n",
237 | "X_tn_std = std_scale.transform(X_tn)\n",
238 | "X_te_std = std_scale.transform(X_te)\n",
239 | "\n",
240 | "# 배깅 학습\n",
241 | "clf_bagging = BaggingClassifier(base_estimator=GaussianNB(),\n",
242 | " n_estimators=10, \n",
243 | " random_state=0)\n",
244 | "clf_bagging.fit(X_tn_std, y_tn)\n",
245 | "\n",
246 | "# 예측\n",
247 | "pred_bagging = clf_bagging.predict(X_te_std)\n",
248 | "print(pred_bagging)\n",
249 | "\n",
250 | "# 정확도\n",
251 | "accuracy = accuracy_score(y_te, pred_bagging)\n",
252 | "print(accuracy)\n",
253 | "\n",
254 | "# confusion matrix 확인 \n",
255 | "conf_matrix = confusion_matrix(y_te, pred_bagging)\n",
256 | "print(conf_matrix)\n",
257 | "\n",
258 | "# 분류 레포트 확인\n",
259 | "class_report = classification_report(y_te, pred_bagging)\n",
260 | "print(class_report)"
261 | ]
262 | },
263 | {
264 | "cell_type": "code",
265 | "execution_count": null,
266 | "metadata": {},
267 | "outputs": [],
268 | "source": []
269 | }
270 | ],
271 | "metadata": {
272 | "kernelspec": {
273 | "display_name": "Python 3",
274 | "language": "python",
275 | "name": "python3"
276 | },
277 | "language_info": {
278 | "codemirror_mode": {
279 | "name": "ipython",
280 | "version": 3
281 | },
282 | "file_extension": ".py",
283 | "mimetype": "text/x-python",
284 | "name": "python",
285 | "nbconvert_exporter": "python",
286 | "pygments_lexer": "ipython3",
287 | "version": "3.7.6"
288 | }
289 | },
290 | "nbformat": 4,
291 | "nbformat_minor": 4
292 | }
293 |
--------------------------------------------------------------------------------
/09장_앙상블_4절_1_adaboost.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 11,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_breast_cancer = datasets.load_breast_cancer()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 12,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_breast_cancer.data\n",
29 | "y = raw_breast_cancer.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 13,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 14,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 15,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1.0,\n",
66 | " n_estimators=50, random_state=0)"
67 | ]
68 | },
69 | "execution_count": 15,
70 | "metadata": {},
71 | "output_type": "execute_result"
72 | }
73 | ],
74 | "source": [
75 | "# 에이다 부스트 학습\n",
76 | "from sklearn.ensemble import AdaBoostClassifier\n",
77 | "clf_ada = AdaBoostClassifier(random_state=0)\n",
78 | "clf_ada.fit(X_tn_std, y_tn)"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 16,
84 | "metadata": {},
85 | "outputs": [
86 | {
87 | "name": "stdout",
88 | "output_type": "stream",
89 | "text": [
90 | "[0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
91 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n",
92 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n",
93 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0]\n"
94 | ]
95 | }
96 | ],
97 | "source": [
98 | "# 예측\n",
99 | "pred_ada = clf_ada.predict(X_te_std)\n",
100 | "print(pred_ada)"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 17,
106 | "metadata": {},
107 | "outputs": [
108 | {
109 | "name": "stdout",
110 | "output_type": "stream",
111 | "text": [
112 | "0.9790209790209791\n"
113 | ]
114 | }
115 | ],
116 | "source": [
117 | "# 정확도\n",
118 | "from sklearn.metrics import accuracy_score\n",
119 | "accuracy = accuracy_score(y_te, pred_ada)\n",
120 | "print(accuracy)"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 18,
126 | "metadata": {},
127 | "outputs": [
128 | {
129 | "name": "stdout",
130 | "output_type": "stream",
131 | "text": [
132 | "[[52 1]\n",
133 | " [ 2 88]]\n"
134 | ]
135 | }
136 | ],
137 | "source": [
138 | "# confusion matrix 확인 \n",
139 | "from sklearn.metrics import confusion_matrix\n",
140 | "conf_matrix = confusion_matrix(y_te, pred_ada)\n",
141 | "print(conf_matrix)"
142 | ]
143 | },
144 | {
145 | "cell_type": "code",
146 | "execution_count": 19,
147 | "metadata": {},
148 | "outputs": [
149 | {
150 | "name": "stdout",
151 | "output_type": "stream",
152 | "text": [
153 | " precision recall f1-score support\n",
154 | "\n",
155 | " 0 0.96 0.98 0.97 53\n",
156 | " 1 0.99 0.98 0.98 90\n",
157 | "\n",
158 | " accuracy 0.98 143\n",
159 | " macro avg 0.98 0.98 0.98 143\n",
160 | "weighted avg 0.98 0.98 0.98 143\n",
161 | "\n"
162 | ]
163 | }
164 | ],
165 | "source": [
166 | "# 분류 레포트 확인\n",
167 | "from sklearn.metrics import classification_report\n",
168 | "class_report = classification_report(y_te, pred_ada)\n",
169 | "print(class_report)"
170 | ]
171 | },
172 | {
173 | "cell_type": "markdown",
174 | "metadata": {},
175 | "source": [
176 | "# 통합 코드 "
177 | ]
178 | },
179 | {
180 | "cell_type": "code",
181 | "execution_count": 20,
182 | "metadata": {},
183 | "outputs": [
184 | {
185 | "name": "stdout",
186 | "output_type": "stream",
187 | "text": [
188 | "[0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
189 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n",
190 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n",
191 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0]\n",
192 | "0.9790209790209791\n",
193 | "[[52 1]\n",
194 | " [ 2 88]]\n",
195 | " precision recall f1-score support\n",
196 | "\n",
197 | " 0 0.96 0.98 0.97 53\n",
198 | " 1 0.99 0.98 0.98 90\n",
199 | "\n",
200 | " accuracy 0.98 143\n",
201 | " macro avg 0.98 0.98 0.98 143\n",
202 | "weighted avg 0.98 0.98 0.98 143\n",
203 | "\n"
204 | ]
205 | }
206 | ],
207 | "source": [
208 | "from sklearn import datasets\n",
209 | "from sklearn.model_selection import train_test_split\n",
210 | "from sklearn.preprocessing import StandardScaler\n",
211 | "\n",
212 | "from sklearn.ensemble import AdaBoostClassifier\n",
213 | "\n",
214 | "from sklearn.metrics import accuracy_score\n",
215 | "from sklearn.metrics import confusion_matrix\n",
216 | "from sklearn.metrics import classification_report\n",
217 | "\n",
218 | "# 데이터 불러오기\n",
219 | "raw_breast_cancer = datasets.load_breast_cancer()\n",
220 | "\n",
221 | "# 피쳐, 타겟 데이터 지정\n",
222 | "X = raw_breast_cancer.data\n",
223 | "y = raw_breast_cancer.target\n",
224 | "\n",
225 | "# 트레이닝/테스트 데이터 분할\n",
226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
227 | "\n",
228 | "# 데이터 표준화\n",
229 | "std_scale = StandardScaler()\n",
230 | "std_scale.fit(X_tn)\n",
231 | "X_tn_std = std_scale.transform(X_tn)\n",
232 | "X_te_std = std_scale.transform(X_te)\n",
233 | "\n",
234 | "# 에이다 부스트 학습\n",
235 | "clf_ada = AdaBoostClassifier(random_state=0)\n",
236 | "clf_ada.fit(X_tn_std, y_tn)\n",
237 | "\n",
238 | "# 예측\n",
239 | "pred_ada = clf_ada.predict(X_te_std)\n",
240 | "print(pred_ada)\n",
241 | "\n",
242 | "# 정확도\n",
243 | "accuracy = accuracy_score(y_te, pred_ada)\n",
244 | "print(accuracy)\n",
245 | "\n",
246 | "# confusion matrix 확인 \n",
247 | "conf_matrix = confusion_matrix(y_te, pred_ada)\n",
248 | "print(conf_matrix)\n",
249 | "\n",
250 | "# 분류 레포트 확인\n",
251 | "class_report = classification_report(y_te, pred_ada)\n",
252 | "print(class_report)"
253 | ]
254 | },
255 | {
256 | "cell_type": "code",
257 | "execution_count": null,
258 | "metadata": {},
259 | "outputs": [],
260 | "source": []
261 | }
262 | ],
263 | "metadata": {
264 | "kernelspec": {
265 | "display_name": "Python 3",
266 | "language": "python",
267 | "name": "python3"
268 | },
269 | "language_info": {
270 | "codemirror_mode": {
271 | "name": "ipython",
272 | "version": 3
273 | },
274 | "file_extension": ".py",
275 | "mimetype": "text/x-python",
276 | "name": "python",
277 | "nbconvert_exporter": "python",
278 | "pygments_lexer": "ipython3",
279 | "version": "3.7.6"
280 | }
281 | },
282 | "nbformat": 4,
283 | "nbformat_minor": 4
284 | }
285 |
--------------------------------------------------------------------------------
/09장_앙상블_4절_2_gradient_boost.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별 코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_breast_cancer = datasets.load_breast_cancer()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_breast_cancer.data\n",
29 | "y = raw_breast_cancer.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,\n",
66 | " learning_rate=0.01, loss='deviance', max_depth=2,\n",
67 | " max_features=None, max_leaf_nodes=None,\n",
68 | " min_impurity_decrease=0.0, min_impurity_split=None,\n",
69 | " min_samples_leaf=1, min_samples_split=2,\n",
70 | " min_weight_fraction_leaf=0.0, n_estimators=100,\n",
71 | " n_iter_no_change=None, presort='deprecated',\n",
72 | " random_state=0, subsample=1.0, tol=0.0001,\n",
73 | " validation_fraction=0.1, verbose=0,\n",
74 | " warm_start=False)"
75 | ]
76 | },
77 | "execution_count": 5,
78 | "metadata": {},
79 | "output_type": "execute_result"
80 | }
81 | ],
82 | "source": [
83 | "# Gradient Boosting 학습\n",
84 | "from sklearn.ensemble import GradientBoostingClassifier\n",
85 | "clf_gbt = GradientBoostingClassifier(max_depth=2, \n",
86 | " learning_rate=0.01,\n",
87 | " random_state=0)\n",
88 | "clf_gbt.fit(X_tn_std, y_tn)"
89 | ]
90 | },
91 | {
92 | "cell_type": "code",
93 | "execution_count": 6,
94 | "metadata": {},
95 | "outputs": [
96 | {
97 | "name": "stdout",
98 | "output_type": "stream",
99 | "text": [
100 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
101 | " 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1\n",
102 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n",
103 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n"
104 | ]
105 | }
106 | ],
107 | "source": [
108 | "# 예측\n",
109 | "pred_gboost = clf_gbt.predict(X_te_std)\n",
110 | "print(pred_gboost)"
111 | ]
112 | },
113 | {
114 | "cell_type": "code",
115 | "execution_count": 7,
116 | "metadata": {},
117 | "outputs": [
118 | {
119 | "name": "stdout",
120 | "output_type": "stream",
121 | "text": [
122 | "0.965034965034965\n"
123 | ]
124 | }
125 | ],
126 | "source": [
127 | "# 정확도\n",
128 | "from sklearn.metrics import accuracy_score\n",
129 | "accuracy = accuracy_score(y_te, pred_gboost)\n",
130 | "print(accuracy)"
131 | ]
132 | },
133 | {
134 | "cell_type": "code",
135 | "execution_count": 8,
136 | "metadata": {},
137 | "outputs": [
138 | {
139 | "name": "stdout",
140 | "output_type": "stream",
141 | "text": [
142 | "[[49 4]\n",
143 | " [ 1 89]]\n"
144 | ]
145 | }
146 | ],
147 | "source": [
148 | "# confusion matrix 확인 \n",
149 | "from sklearn.metrics import confusion_matrix\n",
150 | "conf_matrix = confusion_matrix(y_te, pred_gboost)\n",
151 | "print(conf_matrix)"
152 | ]
153 | },
154 | {
155 | "cell_type": "code",
156 | "execution_count": 9,
157 | "metadata": {},
158 | "outputs": [
159 | {
160 | "name": "stdout",
161 | "output_type": "stream",
162 | "text": [
163 | " precision recall f1-score support\n",
164 | "\n",
165 | " 0 0.98 0.92 0.95 53\n",
166 | " 1 0.96 0.99 0.97 90\n",
167 | "\n",
168 | " accuracy 0.97 143\n",
169 | " macro avg 0.97 0.96 0.96 143\n",
170 | "weighted avg 0.97 0.97 0.96 143\n",
171 | "\n"
172 | ]
173 | }
174 | ],
175 | "source": [
176 | "# 분류 레포트 확인\n",
177 | "from sklearn.metrics import classification_report\n",
178 | "class_report = classification_report(y_te, pred_gboost)\n",
179 | "print(class_report)"
180 | ]
181 | },
182 | {
183 | "cell_type": "markdown",
184 | "metadata": {},
185 | "source": [
186 | "# 통합 코드"
187 | ]
188 | },
189 | {
190 | "cell_type": "code",
191 | "execution_count": 1,
192 | "metadata": {},
193 | "outputs": [
194 | {
195 | "name": "stdout",
196 | "output_type": "stream",
197 | "text": [
198 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
199 | " 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1\n",
200 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n",
201 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n",
202 | "0.965034965034965\n",
203 | "[[49 4]\n",
204 | " [ 1 89]]\n",
205 | " precision recall f1-score support\n",
206 | "\n",
207 | " 0 0.98 0.92 0.95 53\n",
208 | " 1 0.96 0.99 0.97 90\n",
209 | "\n",
210 | " accuracy 0.97 143\n",
211 | " macro avg 0.97 0.96 0.96 143\n",
212 | "weighted avg 0.97 0.97 0.96 143\n",
213 | "\n"
214 | ]
215 | }
216 | ],
217 | "source": [
218 | "from sklearn import datasets\n",
219 | "from sklearn.model_selection import train_test_split\n",
220 | "from sklearn.preprocessing import StandardScaler\n",
221 | "\n",
222 | "from sklearn.ensemble import GradientBoostingClassifier\n",
223 | "\n",
224 | "from sklearn.metrics import accuracy_score\n",
225 | "from sklearn.metrics import confusion_matrix\n",
226 | "from sklearn.metrics import classification_report\n",
227 | "\n",
228 | "# 데이터 불러오기\n",
229 | "raw_breast_cancer = datasets.load_breast_cancer()\n",
230 | "\n",
231 | "# 피쳐, 타겟 데이터 지정\n",
232 | "X = raw_breast_cancer.data\n",
233 | "y = raw_breast_cancer.target\n",
234 | "\n",
235 | "# 트레이닝/테스트 데이터 분할\n",
236 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
237 | "\n",
238 | "# 데이터 표준화\n",
239 | "std_scale = StandardScaler()\n",
240 | "std_scale.fit(X_tn)\n",
241 | "X_tn_std = std_scale.transform(X_tn)\n",
242 | "X_te_std = std_scale.transform(X_te)\n",
243 | "\n",
244 | "# Gradient Boosting 학습\n",
245 | "clf_gbt = GradientBoostingClassifier(max_depth=2, \n",
246 | " learning_rate=0.01,\n",
247 | " random_state=0)\n",
248 | "clf_gbt.fit(X_tn_std, y_tn)\n",
249 | "\n",
250 | "# 예측\n",
251 | "pred_gboost = clf_gbt.predict(X_te_std)\n",
252 | "print(pred_gboost)\n",
253 | "\n",
254 | "# 정확도\n",
255 | "accuracy = accuracy_score(y_te, pred_gboost)\n",
256 | "print(accuracy)\n",
257 | "\n",
258 | "# confusion matrix 확인 \n",
259 | "conf_matrix = confusion_matrix(y_te, pred_gboost)\n",
260 | "print(conf_matrix)\n",
261 | "\n",
262 | "# 분류 레포트 확인\n",
263 | "class_report = classification_report(y_te, pred_gboost)\n",
264 | "print(class_report)"
265 | ]
266 | },
267 | {
268 | "cell_type": "code",
269 | "execution_count": null,
270 | "metadata": {},
271 | "outputs": [],
272 | "source": []
273 | }
274 | ],
275 | "metadata": {
276 | "kernelspec": {
277 | "display_name": "Python 3",
278 | "language": "python",
279 | "name": "python3"
280 | },
281 | "language_info": {
282 | "codemirror_mode": {
283 | "name": "ipython",
284 | "version": 3
285 | },
286 | "file_extension": ".py",
287 | "mimetype": "text/x-python",
288 | "name": "python",
289 | "nbconvert_exporter": "python",
290 | "pygments_lexer": "ipython3",
291 | "version": "3.7.6"
292 | }
293 | },
294 | "nbformat": 4,
295 | "nbformat_minor": 4
296 | }
297 |
--------------------------------------------------------------------------------
/09장_앙상블_5절_스태킹.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "# 데이터 불러오기\n",
17 | "from sklearn import datasets\n",
18 | "raw_breast_cancer = datasets.load_breast_cancer()"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 2,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "# 피쳐, 타겟 데이터 지정\n",
28 | "X = raw_breast_cancer.data\n",
29 | "y = raw_breast_cancer.target"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 3,
35 | "metadata": {},
36 | "outputs": [],
37 | "source": [
38 | "# 트레이닝/테스트 데이터 분할\n",
39 | "from sklearn.model_selection import train_test_split\n",
40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)"
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": 4,
46 | "metadata": {},
47 | "outputs": [],
48 | "source": [
49 | "# 데이터 표준화\n",
50 | "from sklearn.preprocessing import StandardScaler\n",
51 | "std_scale = StandardScaler()\n",
52 | "std_scale.fit(X_tn)\n",
53 | "X_tn_std = std_scale.transform(X_tn)\n",
54 | "X_te_std = std_scale.transform(X_te)"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": 5,
60 | "metadata": {},
61 | "outputs": [
62 | {
63 | "data": {
64 | "text/plain": [
65 | "StackingClassifier(cv=None,\n",
66 | " estimators=[('svm',\n",
67 | " SVC(C=1.0, break_ties=False, cache_size=200,\n",
68 | " class_weight=None, coef0=0.0,\n",
69 | " decision_function_shape='ovr', degree=3,\n",
70 | " gamma='scale', kernel='linear', max_iter=-1,\n",
71 | " probability=False, random_state=1,\n",
72 | " shrinking=True, tol=0.001, verbose=False)),\n",
73 | " ('gnb',\n",
74 | " GaussianNB(priors=None, var_smoothing=1e-09))],\n",
75 | " final_estimator=LogisticRegression(C=1.0, class_weight=None,\n",
76 | " dual=False,\n",
77 | " fit_intercept=True,\n",
78 | " intercept_scaling=1,\n",
79 | " l1_ratio=None,\n",
80 | " max_iter=100,\n",
81 | " multi_class='auto',\n",
82 | " n_jobs=None, penalty='l2',\n",
83 | " random_state=None,\n",
84 | " solver='lbfgs',\n",
85 | " tol=0.0001, verbose=0,\n",
86 | " warm_start=False),\n",
87 | " n_jobs=None, passthrough=False, stack_method='auto',\n",
88 | " verbose=0)"
89 | ]
90 | },
91 | "execution_count": 5,
92 | "metadata": {},
93 | "output_type": "execute_result"
94 | }
95 | ],
96 | "source": [
97 | "# 스태킹 학습\n",
98 | "from sklearn import svm\n",
99 | "from sklearn.naive_bayes import GaussianNB\n",
100 | "from sklearn.linear_model import LogisticRegression\n",
101 | "from sklearn.ensemble import StackingClassifier\n",
102 | "\n",
103 | "clf1 = svm.SVC(kernel='linear', random_state=1) \n",
104 | "clf2 = GaussianNB()\n",
105 | "\n",
106 | "clf_stkg = StackingClassifier(\n",
107 | " estimators=[\n",
108 | " ('svm', clf1), \n",
109 | " ('gnb', clf2)\n",
110 | " ],\n",
111 | " final_estimator=LogisticRegression())\n",
112 | "clf_stkg.fit(X_tn_std, y_tn)"
113 | ]
114 | },
115 | {
116 | "cell_type": "code",
117 | "execution_count": 7,
118 | "metadata": {},
119 | "outputs": [
120 | {
121 | "name": "stdout",
122 | "output_type": "stream",
123 | "text": [
124 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
125 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n",
126 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1\n",
127 | " 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n"
128 | ]
129 | }
130 | ],
131 | "source": [
132 | "# 예측\n",
133 | "pred_stkg = clf_stkg.predict(X_te_std)\n",
134 | "print(pred_stkg)"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 8,
140 | "metadata": {},
141 | "outputs": [
142 | {
143 | "name": "stdout",
144 | "output_type": "stream",
145 | "text": [
146 | "0.965034965034965\n"
147 | ]
148 | }
149 | ],
150 | "source": [
151 | "# 정확도\n",
152 | "from sklearn.metrics import accuracy_score\n",
153 | "accuracy = accuracy_score(y_te, pred_stkg)\n",
154 | "print(accuracy)"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 9,
160 | "metadata": {},
161 | "outputs": [
162 | {
163 | "name": "stdout",
164 | "output_type": "stream",
165 | "text": [
166 | "[[50 3]\n",
167 | " [ 2 88]]\n"
168 | ]
169 | }
170 | ],
171 | "source": [
172 | "# confusion matrix 확인 \n",
173 | "from sklearn.metrics import confusion_matrix\n",
174 | "conf_matrix = confusion_matrix(y_te, pred_stkg)\n",
175 | "print(conf_matrix)"
176 | ]
177 | },
178 | {
179 | "cell_type": "code",
180 | "execution_count": 10,
181 | "metadata": {},
182 | "outputs": [
183 | {
184 | "name": "stdout",
185 | "output_type": "stream",
186 | "text": [
187 | " precision recall f1-score support\n",
188 | "\n",
189 | " 0 0.96 0.94 0.95 53\n",
190 | " 1 0.97 0.98 0.97 90\n",
191 | "\n",
192 | " accuracy 0.97 143\n",
193 | " macro avg 0.96 0.96 0.96 143\n",
194 | "weighted avg 0.96 0.97 0.96 143\n",
195 | "\n"
196 | ]
197 | }
198 | ],
199 | "source": [
200 | "# 분류 레포트 확인\n",
201 | "from sklearn.metrics import classification_report\n",
202 | "class_report = classification_report(y_te, pred_stkg)\n",
203 | "print(class_report)"
204 | ]
205 | },
206 | {
207 | "cell_type": "markdown",
208 | "metadata": {},
209 | "source": [
210 | "# 통합코드"
211 | ]
212 | },
213 | {
214 | "cell_type": "code",
215 | "execution_count": 11,
216 | "metadata": {},
217 | "outputs": [
218 | {
219 | "name": "stdout",
220 | "output_type": "stream",
221 | "text": [
222 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n",
223 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n",
224 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1\n",
225 | " 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n",
226 | "0.965034965034965\n",
227 | "[[50 3]\n",
228 | " [ 2 88]]\n",
229 | " precision recall f1-score support\n",
230 | "\n",
231 | " 0 0.96 0.94 0.95 53\n",
232 | " 1 0.97 0.98 0.97 90\n",
233 | "\n",
234 | " accuracy 0.97 143\n",
235 | " macro avg 0.96 0.96 0.96 143\n",
236 | "weighted avg 0.96 0.97 0.96 143\n",
237 | "\n"
238 | ]
239 | }
240 | ],
241 | "source": [
242 | "from sklearn import datasets\n",
243 | "from sklearn.model_selection import train_test_split\n",
244 | "from sklearn.preprocessing import StandardScaler\n",
245 | "\n",
246 | "from sklearn import svm\n",
247 | "from sklearn.naive_bayes import GaussianNB\n",
248 | "from sklearn.linear_model import LogisticRegression\n",
249 | "from sklearn.ensemble import StackingClassifier\n",
250 | "\n",
251 | "from sklearn.metrics import accuracy_score\n",
252 | "from sklearn.metrics import confusion_matrix\n",
253 | "from sklearn.metrics import classification_report\n",
254 | "\n",
255 | "\n",
256 | "# 데이터 불러오기\n",
257 | "raw_breast_cancer = datasets.load_breast_cancer()\n",
258 | "\n",
259 | "# 피쳐, 타겟 데이터 지정\n",
260 | "X = raw_breast_cancer.data\n",
261 | "y = raw_breast_cancer.target\n",
262 | "\n",
263 | "# 트레이닝/테스트 데이터 분할\n",
264 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n",
265 | "\n",
266 | "# 데이터 표준화\n",
267 | "std_scale = StandardScaler()\n",
268 | "std_scale.fit(X_tn)\n",
269 | "X_tn_std = std_scale.transform(X_tn)\n",
270 | "X_te_std = std_scale.transform(X_te)\n",
271 | "\n",
272 | "# 스태킹 학습\n",
273 | "clf1 = svm.SVC(kernel='linear', random_state=1) \n",
274 | "clf2 = GaussianNB()\n",
275 | "\n",
276 | "clf_stkg = StackingClassifier(\n",
277 | " estimators=[\n",
278 | " ('svm', clf1), \n",
279 | " ('gnb', clf2)\n",
280 | " ],\n",
281 | " final_estimator=LogisticRegression())\n",
282 | "clf_stkg.fit(X_tn_std, y_tn)\n",
283 | "\n",
284 | "# 예측\n",
285 | "pred_stkg = clf_stkg.predict(X_te_std)\n",
286 | "print(pred_stkg)\n",
287 | "\n",
288 | "# 정확도\n",
289 | "accuracy = accuracy_score(y_te, pred_stkg)\n",
290 | "print(accuracy)\n",
291 | "\n",
292 | "# confusion matrix 확인 \n",
293 | "conf_matrix = confusion_matrix(y_te, pred_stkg)\n",
294 | "print(conf_matrix)\n",
295 | "\n",
296 | "# 분류 레포트 확인\n",
297 | "class_report = classification_report(y_te, pred_stkg)\n",
298 | "print(class_report)"
299 | ]
300 | },
301 | {
302 | "cell_type": "code",
303 | "execution_count": null,
304 | "metadata": {},
305 | "outputs": [],
306 | "source": []
307 | }
308 | ],
309 | "metadata": {
310 | "kernelspec": {
311 | "display_name": "Python 3",
312 | "language": "python",
313 | "name": "python3"
314 | },
315 | "language_info": {
316 | "codemirror_mode": {
317 | "name": "ipython",
318 | "version": 3
319 | },
320 | "file_extension": ".py",
321 | "mimetype": "text/x-python",
322 | "name": "python",
323 | "nbconvert_exporter": "python",
324 | "pygments_lexer": "ipython3",
325 | "version": "3.7.6"
326 | }
327 | },
328 | "nbformat": 4,
329 | "nbformat_minor": 4
330 | }
331 |
--------------------------------------------------------------------------------
/12장_딥러닝_2절_1_퍼셉트론.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 개별코드"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [
15 | {
16 | "name": "stdout",
17 | "output_type": "stream",
18 | "text": [
19 | "[[2 3]\n",
20 | " [5 1]]\n",
21 | "[2 3 5 1]\n"
22 | ]
23 | }
24 | ],
25 | "source": [
26 | "import numpy as np\n",
27 | "\n",
28 | "# 입력층\n",
29 | "input_data = np.array([[2,3], [5,1]])\n",
30 | "print(input_data)\n",
31 | "x = input_data.reshape(-1)\n",
32 | "print(x)"
33 | ]
34 | },
35 | {
36 | "cell_type": "code",
37 | "execution_count": 2,
38 | "metadata": {},
39 | "outputs": [],
40 | "source": [
41 | "# 가중치 및 편향\n",
42 | "w1 = np.array([2,1,-3,3])\n",
43 | "w2 = np.array([1,-3,1,3])\n",
44 | "b1 = 3\n",
45 | "b2 = 3"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "metadata": {},
52 | "outputs": [
53 | {
54 | "name": "stdout",
55 | "output_type": "stream",
56 | "text": [
57 | "[[ 2 1 -3 3]\n",
58 | " [ 1 -3 1 3]]\n",
59 | "[3 3]\n",
60 | "[-2 4]\n"
61 | ]
62 | }
63 | ],
64 | "source": [
65 | "# 가중합\n",
66 | "W = np.array([w1, w2])\n",
67 | "print(W)\n",
68 | "b = np.array([b1, b2])\n",
69 | "print(b)\n",
70 | "weight_sum = np.dot(W, x) + b\n",
71 | "print(weight_sum)"
72 | ]
73 | },
74 | {
75 | "cell_type": "code",
76 | "execution_count": 5,
77 | "metadata": {},
78 | "outputs": [
79 | {
80 | "name": "stdout",
81 | "output_type": "stream",
82 | "text": [
83 | "[0.11920292 0.98201379]\n"
84 | ]
85 | }
86 | ],
87 | "source": [
88 | "# 출력층\n",
89 | "res = 1/(1+np.exp(-weight_sum))\n",
90 | "print(res)"
91 | ]
92 | },
93 | {
94 | "cell_type": "markdown",
95 | "metadata": {},
96 | "source": [
97 | "# 통합 코드"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 6,
103 | "metadata": {},
104 | "outputs": [],
105 | "source": [
106 | "import numpy as np\n",
107 | "\n",
108 | "# 입력층\n",
109 | "input_data = np.array([[2,3], [5,1]])\n",
110 | "x = input_data.reshape(-1)\n",
111 | "\n",
112 | "# 가중치 및 편향\n",
113 | "w1 = np.array([2,1,-3,3])\n",
114 | "w2 = np.array([1,-3,1,3])\n",
115 | "b1 = 3\n",
116 | "b2 = 3\n",
117 | "\n",
118 | "# 가중합\n",
119 | "W = np.array([w1, w2])\n",
120 | "b = np.array([b1, b2])\n",
121 | "weight_sum = np.dot(W, x) + b\n",
122 | "\n",
123 | "# 출력층\n",
124 | "res = 1/(1+np.exp(-weight_sum))"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": null,
130 | "metadata": {},
131 | "outputs": [],
132 | "source": []
133 | }
134 | ],
135 | "metadata": {
136 | "kernelspec": {
137 | "display_name": "Python 3",
138 | "language": "python",
139 | "name": "python3"
140 | },
141 | "language_info": {
142 | "codemirror_mode": {
143 | "name": "ipython",
144 | "version": 3
145 | },
146 | "file_extension": ".py",
147 | "mimetype": "text/x-python",
148 | "name": "python",
149 | "nbconvert_exporter": "python",
150 | "pygments_lexer": "ipython3",
151 | "version": "3.7.6"
152 | }
153 | },
154 | "nbformat": 4,
155 | "nbformat_minor": 4
156 | }
157 |
--------------------------------------------------------------------------------
/12장_딥러닝_3절_7_텐서플로_소개.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Sequential API를 활용한 딥러닝 모형 생성"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "from tensorflow.keras.models import Sequential\n",
17 | "from tensorflow.keras.layers import Dense\n",
18 | "\n",
19 | "model = Sequential()\n",
20 | "model.add(Dense(100, activation='relu', \n",
21 | " input_shape=(32,32,1)))\n",
22 | "model.add(Dense(50, activation='relu'))\n",
23 | "model.add(Dense(5, activation='softmax'))"
24 | ]
25 | },
26 | {
27 | "cell_type": "code",
28 | "execution_count": 2,
29 | "metadata": {},
30 | "outputs": [
31 | {
32 | "name": "stdout",
33 | "output_type": "stream",
34 | "text": [
35 | "Model: \"sequential\"\n",
36 | "_________________________________________________________________\n",
37 | "Layer (type) Output Shape Param # \n",
38 | "=================================================================\n",
39 | "dense (Dense) (None, 32, 32, 100) 200 \n",
40 | "_________________________________________________________________\n",
41 | "dense_1 (Dense) (None, 32, 32, 50) 5050 \n",
42 | "_________________________________________________________________\n",
43 | "dense_2 (Dense) (None, 32, 32, 5) 255 \n",
44 | "=================================================================\n",
45 | "Total params: 5,505\n",
46 | "Trainable params: 5,505\n",
47 | "Non-trainable params: 0\n",
48 | "_________________________________________________________________\n"
49 | ]
50 | }
51 | ],
52 | "source": [
53 | "model.summary()"
54 | ]
55 | },
56 | {
57 | "cell_type": "markdown",
58 | "metadata": {},
59 | "source": [
60 | "# 함수형 API를 활용한 딥러닝 모형 생성"
61 | ]
62 | },
63 | {
64 | "cell_type": "code",
65 | "execution_count": 3,
66 | "metadata": {},
67 | "outputs": [],
68 | "source": [
69 | "from tensorflow.keras.layers import Input, Dense\n",
70 | "from tensorflow.keras.models import Model\n",
71 | "\n",
72 | "input_layer = Input(shape=(32,32,1))\n",
73 | "\n",
74 | "x = Dense(units=100, activation = 'relu')(input_layer)\n",
75 | "x = Dense(units=50, activation = 'relu')(x)\n",
76 | "\n",
77 | "output_layer = Dense(units=5, activation='softmax')(x)\n",
78 | "\n",
79 | "model2 = Model(input_layer, output_layer)\n"
80 | ]
81 | },
82 | {
83 | "cell_type": "code",
84 | "execution_count": 4,
85 | "metadata": {},
86 | "outputs": [
87 | {
88 | "name": "stdout",
89 | "output_type": "stream",
90 | "text": [
91 | "Model: \"model\"\n",
92 | "_________________________________________________________________\n",
93 | "Layer (type) Output Shape Param # \n",
94 | "=================================================================\n",
95 | "input_1 (InputLayer) [(None, 32, 32, 1)] 0 \n",
96 | "_________________________________________________________________\n",
97 | "dense_3 (Dense) (None, 32, 32, 100) 200 \n",
98 | "_________________________________________________________________\n",
99 | "dense_4 (Dense) (None, 32, 32, 50) 5050 \n",
100 | "_________________________________________________________________\n",
101 | "dense_5 (Dense) (None, 32, 32, 5) 255 \n",
102 | "=================================================================\n",
103 | "Total params: 5,505\n",
104 | "Trainable params: 5,505\n",
105 | "Non-trainable params: 0\n",
106 | "_________________________________________________________________\n"
107 | ]
108 | }
109 | ],
110 | "source": [
111 | "model2.summary()"
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {},
117 | "source": [
118 | "# 활성화 함수 사용"
119 | ]
120 | },
121 | {
122 | "cell_type": "code",
123 | "execution_count": null,
124 | "metadata": {},
125 | "outputs": [],
126 | "source": [
127 | "x = Dense(units=100)(x)\n",
128 | "x = Activation('relu')(x)"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": null,
134 | "metadata": {},
135 | "outputs": [],
136 | "source": [
137 | "x = Dense(units=100, activation='relu')(x)"
138 | ]
139 | }
140 | ],
141 | "metadata": {
142 | "kernelspec": {
143 | "display_name": "Python 3",
144 | "language": "python",
145 | "name": "python3"
146 | },
147 | "language_info": {
148 | "codemirror_mode": {
149 | "name": "ipython",
150 | "version": 3
151 | },
152 | "file_extension": ".py",
153 | "mimetype": "text/x-python",
154 | "name": "python",
155 | "nbconvert_exporter": "python",
156 | "pygments_lexer": "ipython3",
157 | "version": "3.7.6"
158 | }
159 | },
160 | "nbformat": 4,
161 | "nbformat_minor": 4
162 | }
163 |
--------------------------------------------------------------------------------
/12장_딥러닝_7절_1_자연어처리.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# 단어의 토큰화"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 55,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "from tensorflow.keras.preprocessing.text import Tokenizer\n",
17 | "\n",
18 | "paper = ['많은 것을 바꾸고 싶다면 많은 것을 받아들여라']\n",
19 | "\n",
20 | "tknz = Tokenizer()\n",
21 | "tknz.fit_on_texts(paper)"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 57,
27 | "metadata": {},
28 | "outputs": [
29 | {
30 | "name": "stdout",
31 | "output_type": "stream",
32 | "text": [
33 | "{'많은': 1, '것을': 2, '바꾸고': 3, '싶다면': 4, '받아들여라': 5}\n",
34 | "OrderedDict([('많은', 2), ('것을', 2), ('바꾸고', 1), ('싶다면', 1), ('받아들여라', 1)])\n"
35 | ]
36 | }
37 | ],
38 | "source": [
39 | "print(tknz.word_index)\n",
40 | "print(tknz.word_counts)"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "# 원 핫 인코딩"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": 70,
53 | "metadata": {},
54 | "outputs": [],
55 | "source": [
56 | "from tensorflow.keras.utils import to_categorical\n",
57 | "from tensorflow.keras.preprocessing.text import Tokenizer\n",
58 | "\n",
59 | "\n",
60 | "paper = ['많은 것을 바꾸고 싶다면 많은 것을 받아들여라']\n",
61 | "tknz = Tokenizer()\n",
62 | "tknz.fit_on_texts(paper)\n",
63 | "\n",
64 | "idx_paper = tknz.texts_to_sequences(paper)\n",
65 | "n = len(tknz.word_index)+1\n",
66 | "idx_onehot = to_categorical(idx_paper, num_classes=n)"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 71,
72 | "metadata": {},
73 | "outputs": [
74 | {
75 | "name": "stdout",
76 | "output_type": "stream",
77 | "text": [
78 | "[[1, 2, 3, 4, 1, 2, 5]]\n",
79 | "6\n",
80 | "[[[0. 1. 0. 0. 0. 0.]\n",
81 | " [0. 0. 1. 0. 0. 0.]\n",
82 | " [0. 0. 0. 1. 0. 0.]\n",
83 | " [0. 0. 0. 0. 1. 0.]\n",
84 | " [0. 1. 0. 0. 0. 0.]\n",
85 | " [0. 0. 1. 0. 0. 0.]\n",
86 | " [0. 0. 0. 0. 0. 1.]]]\n"
87 | ]
88 | }
89 | ],
90 | "source": [
91 | "print(idx_paper)\n",
92 | "print(n)\n",
93 | "print(idx_onehot)"
94 | ]
95 | },
96 | {
97 | "cell_type": "markdown",
98 | "metadata": {},
99 | "source": [
100 | "# 단어 임베딩"
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": 76,
106 | "metadata": {},
107 | "outputs": [],
108 | "source": [
109 | "from tensorflow.keras.models import Sequential\n",
110 | "from tensorflow.keras.layers import Embedding\n",
111 | "\n",
112 | "model = Sequential()\n",
113 | "model.add(Embedding(input_dim=n, output_dim=3))\n",
114 | "model.compile(optimizer='rmsprop', loss='mse')\n",
115 | "embedding = model.predict(idx_paper)"
116 | ]
117 | },
118 | {
119 | "cell_type": "code",
120 | "execution_count": 77,
121 | "metadata": {},
122 | "outputs": [
123 | {
124 | "name": "stdout",
125 | "output_type": "stream",
126 | "text": [
127 | "[[[-0.02796837 -0.03958071 -0.03936887]\n",
128 | " [-0.02087821 -0.02005102 0.0131931 ]\n",
129 | " [-0.00142742 -0.03759698 0.02437944]\n",
130 | " [ 0.01546348 -0.00769221 -0.01694027]\n",
131 | " [-0.02796837 -0.03958071 -0.03936887]\n",
132 | " [-0.02087821 -0.02005102 0.0131931 ]\n",
133 | " [ 0.024049 -0.03488786 0.02603838]]]\n"
134 | ]
135 | }
136 | ],
137 | "source": [
138 | "print(embedding)"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {},
145 | "outputs": [],
146 | "source": []
147 | }
148 | ],
149 | "metadata": {
150 | "kernelspec": {
151 | "display_name": "Python 3",
152 | "language": "python",
153 | "name": "python3"
154 | },
155 | "language_info": {
156 | "codemirror_mode": {
157 | "name": "ipython",
158 | "version": 3
159 | },
160 | "file_extension": ".py",
161 | "mimetype": "text/x-python",
162 | "name": "python",
163 | "nbconvert_exporter": "python",
164 | "pygments_lexer": "ipython3",
165 | "version": "3.7.6"
166 | }
167 | },
168 | "nbformat": 4,
169 | "nbformat_minor": 4
170 | }
171 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # 선형대수와 통계학으로 배우는 머신러닝 with 파이썬
2 | 
3 |
4 | - 부제: 최적화 개념부터 텐서플로를 활용한 딥러닝까지
5 | - 저자: 장철원
6 | - 출간일: 2021년 1월 26일
7 | - 페이지수: 624쪽
8 |
9 |
10 | ## 오탈자 정오표
11 | https://cafe.naver.com/aifromstat/28
12 |
13 |
14 | ## 온라인 서점 구매 링크
15 | - [yes24](http://www.yes24.com/Product/Goods/97032765)
16 | - [교보문고](http://www.kyobobook.co.kr/product/detailViewKor.laf?ejkGb=KOR&mallGb=KOR&barcode=9791165920395&orderClick=LAG&Kc=)
17 | - [알라딘](https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=262038358)
18 |
19 |
20 | ## 책 소개
21 |