├── .DS_Store ├── .ipynb_checkpoints ├── 06장_1절_데이터셋_설명-checkpoint.ipynb ├── 06장_2절_결측치처리_클래스레이블_원핫-checkpoint.ipynb ├── 07장_3절_파이프라인-checkpoint.ipynb ├── 07장_4절_그리드서치-checkpoint.ipynb ├── 07장_모형평가_6절_분류_회귀_군집-checkpoint.ipynb ├── 08장_지도학습_3절_k최근접이웃-checkpoint.ipynb ├── 08장_지도학습_4절_선형회귀분석-checkpoint.ipynb ├── 08장_지도학습_5절_로지스틱회귀분석-checkpoint.ipynb ├── 08장_지도학습_6절_나이브베이즈-checkpoint.ipynb ├── 08장_지도학습_7절_의사결정나무-checkpoint.ipynb ├── 08장_지도학습_8절_서포트벡터머신-checkpoint.ipynb ├── 08장_지도학습_9절_크로스밸리데이션-checkpoint.ipynb ├── 09장_앙상블_2절_보팅-checkpoint.ipynb ├── 09장_앙상블_3절_1_랜덤포레스트-checkpoint.ipynb ├── 09장_앙상블_3절_2_배깅-checkpoint.ipynb ├── 09장_앙상블_4절_1_adaboost-checkpoint.ipynb ├── 09장_앙상블_4절_2_gradient_boost-checkpoint.ipynb ├── 09장_앙상블_5절_스태킹-checkpoint.ipynb ├── 10장_차원축소_2절_주성분분석(PCA)-checkpoint.ipynb ├── 10장_차원축소_3절_커널_PCA-checkpoint.ipynb ├── 10장_차원축소_4절_LDA-checkpoint.ipynb ├── 10장_차원축소_5절_LLE(locally_linear_embedding)-checkpoint.ipynb ├── 10장_차원축소_6절_비음수행렬분해(NMF)-checkpoint.ipynb ├── 11장_비지도학습_2절_k 평균 클러스터링 -checkpoint.ipynb ├── 11장_비지도학습_3절_계층군집-checkpoint.ipynb ├── 11장_비지도학습_4절_DBSCAN-checkpoint.ipynb ├── 11장_비지도학습_5절_가우시안_혼합_모델-checkpoint.ipynb ├── 12장_딥러닝_2절_1_퍼셉트론-checkpoint.ipynb ├── 12장_딥러닝_3절_7_텐서플로_소개-checkpoint.ipynb ├── 12장_딥러닝_3절_8_신경망_(1)분류문제-checkpoint.ipynb ├── 12장_딥러닝_3절_9_신경망_(2)회귀문제-checkpoint.ipynb ├── 12장_딥러닝_4절_CNN-checkpoint.ipynb ├── 12장_딥러닝_5절_RNN-checkpoint.ipynb ├── 12장_딥러닝_6절_오토인코더1(시퀀스형)-checkpoint.ipynb ├── 12장_딥러닝_6절_오토인코더2(함수형)-checkpoint.ipynb ├── 12장_딥러닝_7절_1_자연어처리-checkpoint.ipynb ├── 12장_딥러닝_7절_2_seq2seq-checkpoint.ipynb └── 12장_딥러닝_8절_GAN-checkpoint.ipynb ├── 06장_1절_데이터셋_설명.ipynb ├── 06장_2절_결측치처리_클래스레이블_원핫.ipynb ├── 07장_3절_파이프라인.ipynb ├── 07장_4절_그리드서치.ipynb ├── 07장_모형평가_6절_분류_회귀_군집.ipynb ├── 08장_지도학습_3절_k최근접이웃.ipynb ├── 08장_지도학습_4절_선형회귀분석.ipynb ├── 08장_지도학습_5절_로지스틱회귀분석.ipynb ├── 08장_지도학습_6절_나이브베이즈.ipynb ├── 08장_지도학습_7절_의사결정나무.ipynb ├── 08장_지도학습_8절_서포트벡터머신.ipynb ├── 08장_지도학습_9절_크로스밸리데이션.ipynb ├── 09장_앙상블_2절_보팅.ipynb ├── 09장_앙상블_3절_1_랜덤포레스트.ipynb ├── 09장_앙상블_3절_2_배깅.ipynb ├── 09장_앙상블_4절_1_adaboost.ipynb ├── 09장_앙상블_4절_2_gradient_boost.ipynb ├── 09장_앙상블_5절_스태킹.ipynb ├── 10장_차원축소_2절_주성분분석(PCA).ipynb ├── 10장_차원축소_3절_커널_PCA.ipynb ├── 10장_차원축소_4절_LDA.ipynb ├── 10장_차원축소_5절_LLE(locally_linear_embedding).ipynb ├── 10장_차원축소_6절_비음수행렬분해(NMF).ipynb ├── 11장_비지도학습_2절_k 평균 클러스터링 .ipynb ├── 11장_비지도학습_3절_계층군집.ipynb ├── 11장_비지도학습_4절_DBSCAN.ipynb ├── 11장_비지도학습_5절_가우시안_혼합_모델.ipynb ├── 12장_딥러닝_2절_1_퍼셉트론.ipynb ├── 12장_딥러닝_3절_7_텐서플로_소개.ipynb ├── 12장_딥러닝_3절_8_신경망_(1)분류문제.ipynb ├── 12장_딥러닝_3절_9_신경망_(2)회귀문제.ipynb ├── 12장_딥러닝_4절_CNN.ipynb ├── 12장_딥러닝_5절_RNN.ipynb ├── 12장_딥러닝_6절_오토인코더1(시퀀스형).ipynb ├── 12장_딥러닝_6절_오토인코더2(함수형).ipynb ├── 12장_딥러닝_7절_1_자연어처리.ipynb ├── 12장_딥러닝_7절_2_seq2seq.ipynb ├── 12장_딥러닝_8절_GAN.ipynb ├── README.md ├── data ├── .DS_Store └── eng-fra │ ├── _about.txt │ └── fra.txt └── 소스코드전체.zip /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bjpublic/MachineLearning/f6e04995802b49c5841a917a178dbf43f58a9f44/.DS_Store -------------------------------------------------------------------------------- /.ipynb_checkpoints/07장_3절_파이프라인-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 파이프 라인" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 8, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from sklearn import datasets\n", 17 | "from sklearn.pipeline import Pipeline\n", 18 | "from sklearn.preprocessing import StandardScaler\n", 19 | "from sklearn.linear_model import LinearRegression\n", 20 | "from sklearn.model_selection import train_test_split\n", 21 | "from sklearn.metrics import mean_squared_error" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 9, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "data": { 31 | "text/plain": [ 32 | "29.515137790197567" 33 | ] 34 | }, 35 | "execution_count": 9, 36 | "metadata": {}, 37 | "output_type": "execute_result" 38 | } 39 | ], 40 | "source": [ 41 | "raw_boston = datasets.load_boston()\n", 42 | "\n", 43 | "X = raw_boston.data\n", 44 | "y = raw_boston.target\n", 45 | "\n", 46 | "# 트레이닝 / 테스트 데이터 분할\n", 47 | "X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=7)\n", 48 | "\n", 49 | "# 표준화 스케일링\n", 50 | "std_scale = StandardScaler()\n", 51 | "X_tn_std = std_scale.fit_transform(X_tn)\n", 52 | "X_te_std = std_scale.transform(X_te)\n", 53 | "\n", 54 | "# 학습\n", 55 | "clf_linear = LinearRegression()\n", 56 | "clf_linear.fit(X_tn_std, y_tn)\n", 57 | "\n", 58 | "# 예측\n", 59 | "pred_linear = clf_linear.predict(X_te_std)\n", 60 | "\n", 61 | "# 평가\n", 62 | "mean_squared_error(y_te, pred_linear)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 10, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "data": { 72 | "text/plain": [ 73 | "29.515137790197567" 74 | ] 75 | }, 76 | "execution_count": 10, 77 | "metadata": {}, 78 | "output_type": "execute_result" 79 | } 80 | ], 81 | "source": [ 82 | "# 트레이닝 / 테스트 데이터 분할\n", 83 | "X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=7)\n", 84 | "\n", 85 | "# 파이프라인\n", 86 | "linear_pipline = Pipeline([\n", 87 | " ('scaler',StandardScaler()), \n", 88 | " ('linear_regression', LinearRegression()) \n", 89 | "])\n", 90 | "\n", 91 | "# 학습\n", 92 | "linear_pipline.fit(X_tn, y_tn)\n", 93 | "\n", 94 | "# 예측\n", 95 | "pred_linear = linear_pipline.predict(X_te)\n", 96 | "\n", 97 | "# 평가\n", 98 | "mean_squared_error(y_te, pred_linear)" 99 | ] 100 | } 101 | ], 102 | "metadata": { 103 | "kernelspec": { 104 | "display_name": "Python 3", 105 | "language": "python", 106 | "name": "python3" 107 | }, 108 | "language_info": { 109 | "codemirror_mode": { 110 | "name": "ipython", 111 | "version": 3 112 | }, 113 | "file_extension": ".py", 114 | "mimetype": "text/x-python", 115 | "name": "python", 116 | "nbconvert_exporter": "python", 117 | "pygments_lexer": "ipython3", 118 | "version": "3.7.6" 119 | } 120 | }, 121 | "nbformat": 4, 122 | "nbformat_minor": 4 123 | } 124 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/07장_4절_그리드서치-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 그리드 서치" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 7, 13 | "metadata": { 14 | "scrolled": true 15 | }, 16 | "outputs": [ 17 | { 18 | "name": "stdout", 19 | "output_type": "stream", 20 | "text": [ 21 | "{'k': 3}\n", 22 | "0.9736842105263158\n" 23 | ] 24 | } 25 | ], 26 | "source": [ 27 | "from sklearn import datasets\n", 28 | "from sklearn.preprocessing import StandardScaler\n", 29 | "from sklearn.neighbors import KNeighborsClassifier\n", 30 | "from sklearn.model_selection import train_test_split\n", 31 | "\n", 32 | "from sklearn.metrics import accuracy_score\n", 33 | "from sklearn.metrics import confusion_matrix\n", 34 | "from sklearn.metrics import classification_report\n", 35 | "\n", 36 | "# 꽃 데이터 불러오기\n", 37 | "raw_iris = datasets.load_iris()\n", 38 | "\n", 39 | "# 피쳐 / 타겟\n", 40 | "X = raw_iris.data\n", 41 | "y = raw_iris.target\n", 42 | "\n", 43 | "# 트레이닝 / 테스트 데이터 분할\n", 44 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 45 | "\n", 46 | "# 표준화 스케일\n", 47 | "std_scale = StandardScaler()\n", 48 | "std_scale.fit(X_tn)\n", 49 | "X_tn_std = std_scale.transform(X_tn)\n", 50 | "X_te_std = std_scale.transform(X_te)\n", 51 | "\n", 52 | "best_accuracy = 0\n", 53 | "\n", 54 | "for k in [1,2,3,4,5,6,7,8,9,10]:\n", 55 | " clf_knn = KNeighborsClassifier(n_neighbors=k)\n", 56 | " clf_knn.fit(X_tn_std, y_tn)\n", 57 | " knn_pred = clf_knn.predict(X_te_std)\n", 58 | " accuracy = accuracy_score(y_te, knn_pred)\n", 59 | " if accuracy > best_accuracy:\n", 60 | " best_accuracy = accuracy\n", 61 | " final_k = {'k': k}\n", 62 | " \n", 63 | "print(final_k)\n", 64 | "print(accuracy)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [] 73 | } 74 | ], 75 | "metadata": { 76 | "kernelspec": { 77 | "display_name": "Python 3", 78 | "language": "python", 79 | "name": "python3" 80 | }, 81 | "language_info": { 82 | "codemirror_mode": { 83 | "name": "ipython", 84 | "version": 3 85 | }, 86 | "file_extension": ".py", 87 | "mimetype": "text/x-python", 88 | "name": "python", 89 | "nbconvert_exporter": "python", 90 | "pygments_lexer": "ipython3", 91 | "version": "3.7.6" 92 | } 93 | }, 94 | "nbformat": 4, 95 | "nbformat_minor": 4 96 | } 97 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/07장_모형평가_6절_분류_회귀_군집-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 분류" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "## 정확도" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 3, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "name": "stdout", 26 | "output_type": "stream", 27 | "text": [ 28 | "0.5\n", 29 | "2\n" 30 | ] 31 | } 32 | ], 33 | "source": [ 34 | "#import numpy as np\n", 35 | "from sklearn.metrics import accuracy_score\n", 36 | "y_pred = [0, 2, 1, 3]\n", 37 | "y_true = [0, 1, 2, 3]\n", 38 | "print(accuracy_score(y_true, y_pred))\n", 39 | "print(accuracy_score(y_true, y_pred, normalize=False))" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 3, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "## confusionm matrix" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 4, 54 | "metadata": {}, 55 | "outputs": [ 56 | { 57 | "data": { 58 | "text/plain": [ 59 | "array([[2, 0, 0],\n", 60 | " [0, 0, 1],\n", 61 | " [1, 0, 2]])" 62 | ] 63 | }, 64 | "execution_count": 4, 65 | "metadata": {}, 66 | "output_type": "execute_result" 67 | } 68 | ], 69 | "source": [ 70 | "from sklearn.metrics import confusion_matrix\n", 71 | "y_true = [2, 0, 2, 2, 0, 1]\n", 72 | "y_pred = [0, 0, 2, 2, 0, 2]\n", 73 | "confusion_matrix(y_true, y_pred)" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 5, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "## classification report " 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 6, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "name": "stdout", 92 | "output_type": "stream", 93 | "text": [ 94 | " precision recall f1-score support\n", 95 | "\n", 96 | " class 0 0.67 1.00 0.80 2\n", 97 | " class 1 0.00 0.00 0.00 1\n", 98 | " class 2 1.00 0.50 0.67 2\n", 99 | "\n", 100 | " accuracy 0.60 5\n", 101 | " macro avg 0.56 0.50 0.49 5\n", 102 | "weighted avg 0.67 0.60 0.59 5\n", 103 | "\n" 104 | ] 105 | } 106 | ], 107 | "source": [ 108 | "from sklearn.metrics import classification_report\n", 109 | "y_true = [0, 1, 2, 2, 0]\n", 110 | "y_pred = [0, 0, 2, 1, 0]\n", 111 | "target_names = ['class 0', 'class 1', 'class 2']\n", 112 | "print(classification_report(y_true, y_pred, target_names=target_names))" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "# 회귀" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "# mean absolute error" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 5, 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "0.5\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "from sklearn.metrics import mean_absolute_error\n", 146 | "y_true = [3, -0.5, 2, 7]\n", 147 | "y_pred = [2.5, 0.0, 2, 8]\n", 148 | "\n", 149 | "print(mean_absolute_error(y_true, y_pred))" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 6, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "# mean squared error" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 7, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "data": { 168 | "text/plain": [ 169 | "0.375" 170 | ] 171 | }, 172 | "execution_count": 7, 173 | "metadata": {}, 174 | "output_type": "execute_result" 175 | } 176 | ], 177 | "source": [ 178 | "from sklearn.metrics import mean_squared_error\n", 179 | "y_true = [3, -0.5, 2, 7]\n", 180 | "y_pred = [2.5, 0.0, 2, 8]\n", 181 | "print(mean_squared_error(y_true, y_pred))" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 8, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "# R2" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 10, 196 | "metadata": {}, 197 | "outputs": [ 198 | { 199 | "name": "stdout", 200 | "output_type": "stream", 201 | "text": [ 202 | "0.9486081370449679\n" 203 | ] 204 | } 205 | ], 206 | "source": [ 207 | "from sklearn.metrics import r2_score\n", 208 | "y_true = [3, -0.5, 2, 7]\n", 209 | "y_pred = [2.5, 0.0, 2, 8]\n", 210 | "print(r2_score(y_true, y_pred))" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "# 군집" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "# adjusted rand index" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 2, 232 | "metadata": {}, 233 | "outputs": [ 234 | { 235 | "name": "stdout", 236 | "output_type": "stream", 237 | "text": [ 238 | "0.24242424242424246\n" 239 | ] 240 | } 241 | ], 242 | "source": [ 243 | "from sklearn.metrics import adjusted_rand_score\n", 244 | "labels_true = [0, 0, 0, 1, 1, 1]\n", 245 | "labels_pred = [0, 0, 1, 1, 2, 2]\n", 246 | "\n", 247 | "print(adjusted_rand_score(labels_true, labels_pred))" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 3, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "# silloutte score" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 2, 262 | "metadata": {}, 263 | "outputs": [ 264 | { 265 | "name": "stdout", 266 | "output_type": "stream", 267 | "text": [ 268 | "0.5789497702625118\n" 269 | ] 270 | } 271 | ], 272 | "source": [ 273 | "from sklearn.metrics import silhouette_score\n", 274 | "X = [[1, 2], [4, 5], [2, 1], [6, 7], [2, 3]]\n", 275 | "labels = [0, 1, 0, 1, 0] \n", 276 | "sil_score = silhouette_score(X, labels)\n", 277 | "print(sil_score)" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": null, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [] 286 | } 287 | ], 288 | "metadata": { 289 | "kernelspec": { 290 | "display_name": "Python 3", 291 | "language": "python", 292 | "name": "python3" 293 | }, 294 | "language_info": { 295 | "codemirror_mode": { 296 | "name": "ipython", 297 | "version": 3 298 | }, 299 | "file_extension": ".py", 300 | "mimetype": "text/x-python", 301 | "name": "python", 302 | "nbconvert_exporter": "python", 303 | "pygments_lexer": "ipython3", 304 | "version": "3.7.6" 305 | } 306 | }, 307 | "nbformat": 4, 308 | "nbformat_minor": 4 309 | } 310 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/08장_지도학습_3절_k최근접이웃-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_iris = datasets.load_iris()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐/타겟\n", 28 | "X = raw_iris.data\n", 29 | "y = raw_iris.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "#데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", 66 | " metric_params=None, n_jobs=None, n_neighbors=2, p=2,\n", 67 | " weights='uniform')" 68 | ] 69 | }, 70 | "execution_count": 5, 71 | "metadata": {}, 72 | "output_type": "execute_result" 73 | } 74 | ], 75 | "source": [ 76 | "# 학습\n", 77 | "from sklearn.neighbors import KNeighborsClassifier\n", 78 | "clf_knn = KNeighborsClassifier(n_neighbors=2)\n", 79 | "clf_knn.fit(X_tn_std, y_tn)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 8, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "name": "stdout", 89 | "output_type": "stream", 90 | "text": [ 91 | "[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n", 92 | " 2]\n" 93 | ] 94 | } 95 | ], 96 | "source": [ 97 | "# 예측\n", 98 | "knn_pred = clf_knn.predict(X_te_std)\n", 99 | "print(knn_pred)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 9, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "name": "stdout", 109 | "output_type": "stream", 110 | "text": [ 111 | "0.9473684210526315\n" 112 | ] 113 | } 114 | ], 115 | "source": [ 116 | "# 정확도\n", 117 | "from sklearn.metrics import accuracy_score\n", 118 | "accuracy = accuracy_score(y_te, knn_pred)\n", 119 | "print(accuracy)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 10, 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "name": "stdout", 129 | "output_type": "stream", 130 | "text": [ 131 | "[[13 0 0]\n", 132 | " [ 0 15 1]\n", 133 | " [ 0 1 8]]\n" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "# confusion matrix 확인 \n", 139 | "from sklearn.metrics import confusion_matrix\n", 140 | "conf_matrix = confusion_matrix(y_te, knn_pred)\n", 141 | "print(conf_matrix)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 22, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | " precision recall f1-score support\n", 154 | "\n", 155 | " 0 1.00 1.00 1.00 13\n", 156 | " 1 0.94 0.94 0.94 16\n", 157 | " 2 0.89 0.89 0.89 9\n", 158 | "\n", 159 | " accuracy 0.95 38\n", 160 | " macro avg 0.94 0.94 0.94 38\n", 161 | "weighted avg 0.95 0.95 0.95 38\n", 162 | "\n" 163 | ] 164 | } 165 | ], 166 | "source": [ 167 | "# 분류 레포트 확인\n", 168 | "from sklearn.metrics import classification_report\n", 169 | "class_report = classification_report(y_te, knn_pred)\n", 170 | "print(class_report)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "# 통합 코드" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 11, 183 | "metadata": {}, 184 | "outputs": [ 185 | { 186 | "name": "stdout", 187 | "output_type": "stream", 188 | "text": [ 189 | "[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n", 190 | " 2]\n", 191 | "0.9473684210526315\n", 192 | "[[13 0 0]\n", 193 | " [ 0 15 1]\n", 194 | " [ 0 1 8]]\n", 195 | " precision recall f1-score support\n", 196 | "\n", 197 | " 0 1.00 1.00 1.00 13\n", 198 | " 1 0.94 0.94 0.94 16\n", 199 | " 2 0.89 0.89 0.89 9\n", 200 | "\n", 201 | " accuracy 0.95 38\n", 202 | " macro avg 0.94 0.94 0.94 38\n", 203 | "weighted avg 0.95 0.95 0.95 38\n", 204 | "\n" 205 | ] 206 | } 207 | ], 208 | "source": [ 209 | "from sklearn import datasets\n", 210 | "from sklearn.preprocessing import StandardScaler\n", 211 | "from sklearn.neighbors import KNeighborsClassifier\n", 212 | "from sklearn.model_selection import train_test_split\n", 213 | "\n", 214 | "from sklearn.metrics import accuracy_score\n", 215 | "from sklearn.metrics import confusion_matrix\n", 216 | "from sklearn.metrics import classification_report\n", 217 | "\n", 218 | "# 꽃 데이터 불러오기\n", 219 | "raw_iris = datasets.load_iris()\n", 220 | "\n", 221 | "# 피쳐 / 타겟\n", 222 | "X = raw_iris.data\n", 223 | "y = raw_iris.target\n", 224 | "\n", 225 | "# 트레이닝 / 테스트 데이터 분할\n", 226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 227 | "\n", 228 | "\n", 229 | "# 표준화 스케일\n", 230 | "std_scale = StandardScaler()\n", 231 | "std_scale.fit(X_tn)\n", 232 | "X_tn_std = std_scale.transform(X_tn)\n", 233 | "X_te_std = std_scale.transform(X_te)\n", 234 | "\n", 235 | "#학습\n", 236 | "clf_knn = KNeighborsClassifier(n_neighbors=2)\n", 237 | "clf_knn.fit(X_tn_std, y_tn)\n", 238 | "\n", 239 | "# 예측\n", 240 | "knn_pred = clf_knn.predict(X_te_std)\n", 241 | "print(knn_pred)\n", 242 | "\n", 243 | "# 정확도\n", 244 | "accuracy = accuracy_score(y_te, knn_pred)\n", 245 | "print(accuracy)\n", 246 | "\n", 247 | "# confusion matrix 확인 \n", 248 | "conf_matrix = confusion_matrix(y_te, knn_pred)\n", 249 | "print(conf_matrix)\n", 250 | "\n", 251 | "# 분류 레포트 확인\n", 252 | "class_report = classification_report(y_te, knn_pred)\n", 253 | "print(class_report)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": {}, 260 | "outputs": [], 261 | "source": [] 262 | } 263 | ], 264 | "metadata": { 265 | "kernelspec": { 266 | "display_name": "Python 3", 267 | "language": "python", 268 | "name": "python3" 269 | }, 270 | "language_info": { 271 | "codemirror_mode": { 272 | "name": "ipython", 273 | "version": 3 274 | }, 275 | "file_extension": ".py", 276 | "mimetype": "text/x-python", 277 | "name": "python", 278 | "nbconvert_exporter": "python", 279 | "pygments_lexer": "ipython3", 280 | "version": "3.7.6" 281 | } 282 | }, 283 | "nbformat": 4, 284 | "nbformat_minor": 4 285 | } 286 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/08장_지도학습_6절_나이브베이즈-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드 " 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "#데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "GaussianNB(priors=None, var_smoothing=1e-09)" 66 | ] 67 | }, 68 | "execution_count": 5, 69 | "metadata": {}, 70 | "output_type": "execute_result" 71 | } 72 | ], 73 | "source": [ 74 | "# 나이브 베이즈 학습\n", 75 | "from sklearn.naive_bayes import GaussianNB\n", 76 | "clf_gnb = GaussianNB()\n", 77 | "clf_gnb.fit(X_tn_std, y_tn)" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 12, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "name": "stdout", 87 | "output_type": "stream", 88 | "text": [ 89 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 90 | " 1 1 2 0 0 1 1 1]\n" 91 | ] 92 | } 93 | ], 94 | "source": [ 95 | "# 예측\n", 96 | "pred_gnb = clf_gnb.predict(X_te_std)\n", 97 | "print(pred_gnb)" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 15, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "name": "stdout", 107 | "output_type": "stream", 108 | "text": [ 109 | "0.9523809523809524\n" 110 | ] 111 | } 112 | ], 113 | "source": [ 114 | "# 리콜\n", 115 | "from sklearn.metrics import recall_score\n", 116 | "recall = recall_score(y_te, pred_gnb, average='macro')\n", 117 | "print(recall)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 26, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "name": "stdout", 127 | "output_type": "stream", 128 | "text": [ 129 | "[[16 0 0]\n", 130 | " [ 2 18 1]\n", 131 | " [ 0 0 8]]\n" 132 | ] 133 | } 134 | ], 135 | "source": [ 136 | "# confusion matrix 확인 \n", 137 | "from sklearn.metrics import confusion_matrix\n", 138 | "conf_matrix = confusion_matrix(y_te, pred_gnb)\n", 139 | "print(conf_matrix)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 27, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | " precision recall f1-score support\n", 152 | "\n", 153 | " 0 0.89 1.00 0.94 16\n", 154 | " 1 1.00 0.86 0.92 21\n", 155 | " 2 0.89 1.00 0.94 8\n", 156 | "\n", 157 | " accuracy 0.93 45\n", 158 | " macro avg 0.93 0.95 0.94 45\n", 159 | "weighted avg 0.94 0.93 0.93 45\n", 160 | "\n" 161 | ] 162 | } 163 | ], 164 | "source": [ 165 | "# 분류 레포트 확인\n", 166 | "from sklearn.metrics import classification_report\n", 167 | "class_report = classification_report(y_te, pred_gnb)\n", 168 | "print(class_report)" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "# 통합코드" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 1, 181 | "metadata": {}, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 188 | " 1 1 2 0 0 1 1 1]\n", 189 | "0.9523809523809524\n", 190 | "[[16 0 0]\n", 191 | " [ 2 18 1]\n", 192 | " [ 0 0 8]]\n", 193 | " precision recall f1-score support\n", 194 | "\n", 195 | " 0 0.89 1.00 0.94 16\n", 196 | " 1 1.00 0.86 0.92 21\n", 197 | " 2 0.89 1.00 0.94 8\n", 198 | "\n", 199 | " accuracy 0.93 45\n", 200 | " macro avg 0.93 0.95 0.94 45\n", 201 | "weighted avg 0.94 0.93 0.93 45\n", 202 | "\n" 203 | ] 204 | } 205 | ], 206 | "source": [ 207 | "from sklearn import datasets\n", 208 | "from sklearn.preprocessing import StandardScaler\n", 209 | "from sklearn.model_selection import train_test_split\n", 210 | "\n", 211 | "from sklearn.naive_bayes import GaussianNB\n", 212 | "\n", 213 | "from sklearn.metrics import recall_score\n", 214 | "from sklearn.metrics import confusion_matrix\n", 215 | "from sklearn.metrics import classification_report\n", 216 | "\n", 217 | "\n", 218 | "# 데이터 불러오기\n", 219 | "raw_wine = datasets.load_wine()\n", 220 | "\n", 221 | "# 피쳐, 타겟 데이터 지정\n", 222 | "X = raw_wine.data\n", 223 | "y = raw_wine.target\n", 224 | "\n", 225 | "# 트레이닝/테스트 데이터 분할\n", 226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 227 | "\n", 228 | "# 데이터 표준화\n", 229 | "std_scale = StandardScaler()\n", 230 | "std_scale.fit(X_tn)\n", 231 | "X_tn_std = std_scale.transform(X_tn)\n", 232 | "X_te_std = std_scale.transform(X_te)\n", 233 | "\n", 234 | "# 나이브 베이즈 학습\n", 235 | "clf_gnb = GaussianNB()\n", 236 | "clf_gnb.fit(X_tn_std, y_tn)\n", 237 | "\n", 238 | "# 예측\n", 239 | "pred_gnb = clf_gnb.predict(X_te_std)\n", 240 | "print(pred_gnb)\n", 241 | "\n", 242 | "# 리콜\n", 243 | "recall = recall_score(y_te, pred_gnb, average='macro')\n", 244 | "print(recall)\n", 245 | "\n", 246 | "# confusion matrix 확인 \n", 247 | "conf_matrix = confusion_matrix(y_te, pred_gnb)\n", 248 | "print(conf_matrix)\n", 249 | "\n", 250 | "# 분류 레포트 확인\n", 251 | "class_report = classification_report(y_te, pred_gnb)\n", 252 | "print(class_report)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": {}, 259 | "outputs": [], 260 | "source": [] 261 | } 262 | ], 263 | "metadata": { 264 | "kernelspec": { 265 | "display_name": "Python 3", 266 | "language": "python", 267 | "name": "python3" 268 | }, 269 | "language_info": { 270 | "codemirror_mode": { 271 | "name": "ipython", 272 | "version": 3 273 | }, 274 | "file_extension": ".py", 275 | "mimetype": "text/x-python", 276 | "name": "python", 277 | "nbconvert_exporter": "python", 278 | "pygments_lexer": "ipython3", 279 | "version": "3.7.6" 280 | } 281 | }, 282 | "nbformat": 4, 283 | "nbformat_minor": 4 284 | } 285 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/08장_지도학습_7절_의사결정나무-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n", 66 | " max_depth=None, max_features=None, max_leaf_nodes=None,\n", 67 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 68 | " min_samples_leaf=1, min_samples_split=2,\n", 69 | " min_weight_fraction_leaf=0.0, presort='deprecated',\n", 70 | " random_state=0, splitter='best')" 71 | ] 72 | }, 73 | "execution_count": 5, 74 | "metadata": {}, 75 | "output_type": "execute_result" 76 | } 77 | ], 78 | "source": [ 79 | "# 의사결정나무 학습\n", 80 | "from sklearn import tree \n", 81 | "clf_tree = tree.DecisionTreeClassifier(random_state=0)\n", 82 | "clf_tree.fit(X_tn_std, y_tn)" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 6, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "name": "stdout", 92 | "output_type": "stream", 93 | "text": [ 94 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 95 | " 1 1 2 1 0 1 1 1]\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "# 예측\n", 101 | "pred_tree = clf_tree.predict(X_te_std)\n", 102 | "print(pred_tree)" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 7, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "name": "stdout", 112 | "output_type": "stream", 113 | "text": [ 114 | "0.9349141206870346\n" 115 | ] 116 | } 117 | ], 118 | "source": [ 119 | "# f1 score\n", 120 | "from sklearn.metrics import f1_score\n", 121 | "f1 = f1_score(y_te, pred_tree, average='macro')\n", 122 | "print(f1)" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 8, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "[[14 2 0]\n", 135 | " [ 0 20 1]\n", 136 | " [ 0 0 8]]\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "# confusion matrix 확인 \n", 142 | "from sklearn.metrics import confusion_matrix\n", 143 | "conf_matrix = confusion_matrix(y_te, pred_tree)\n", 144 | "print(conf_matrix)" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 9, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "name": "stdout", 154 | "output_type": "stream", 155 | "text": [ 156 | " precision recall f1-score support\n", 157 | "\n", 158 | " 0 1.00 0.88 0.93 16\n", 159 | " 1 0.91 0.95 0.93 21\n", 160 | " 2 0.89 1.00 0.94 8\n", 161 | "\n", 162 | " accuracy 0.93 45\n", 163 | " macro avg 0.93 0.94 0.93 45\n", 164 | "weighted avg 0.94 0.93 0.93 45\n", 165 | "\n" 166 | ] 167 | } 168 | ], 169 | "source": [ 170 | "# 분류 레포트 확인\n", 171 | "from sklearn.metrics import classification_report\n", 172 | "class_report = classification_report(y_te, pred_tree)\n", 173 | "print(class_report)" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "# 통합 코드" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 10, 186 | "metadata": {}, 187 | "outputs": [ 188 | { 189 | "name": "stdout", 190 | "output_type": "stream", 191 | "text": [ 192 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 193 | " 1 1 2 1 0 1 1 1]\n", 194 | "0.9349141206870346\n", 195 | "[[14 2 0]\n", 196 | " [ 0 20 1]\n", 197 | " [ 0 0 8]]\n", 198 | " precision recall f1-score support\n", 199 | "\n", 200 | " 0 1.00 0.88 0.93 16\n", 201 | " 1 0.91 0.95 0.93 21\n", 202 | " 2 0.89 1.00 0.94 8\n", 203 | "\n", 204 | " accuracy 0.93 45\n", 205 | " macro avg 0.93 0.94 0.93 45\n", 206 | "weighted avg 0.94 0.93 0.93 45\n", 207 | "\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "from sklearn import datasets\n", 213 | "from sklearn.preprocessing import StandardScaler\n", 214 | "from sklearn.model_selection import train_test_split\n", 215 | "\n", 216 | "from sklearn import tree \n", 217 | "\n", 218 | "from sklearn.metrics import f1_score\n", 219 | "from sklearn.metrics import confusion_matrix\n", 220 | "from sklearn.metrics import classification_report\n", 221 | "\n", 222 | "\n", 223 | "# 데이터 불러오기\n", 224 | "raw_wine = datasets.load_wine()\n", 225 | "\n", 226 | "# 피쳐, 타겟 데이터 지정\n", 227 | "X = raw_wine.data\n", 228 | "y = raw_wine.target\n", 229 | "\n", 230 | "# 트레이닝/테스트 데이터 분할\n", 231 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 232 | "\n", 233 | "# 데이터 표준화\n", 234 | "std_scale = StandardScaler()\n", 235 | "std_scale.fit(X_tn)\n", 236 | "X_tn_std = std_scale.transform(X_tn)\n", 237 | "X_te_std = std_scale.transform(X_te)\n", 238 | "\n", 239 | "# 의사결정나무 학습\n", 240 | "clf_tree = tree.DecisionTreeClassifier(random_state=0)\n", 241 | "clf_tree.fit(X_tn_std, y_tn)\n", 242 | "\n", 243 | "# 예측\n", 244 | "pred_tree = clf_tree.predict(X_te_std)\n", 245 | "print(pred_tree)\n", 246 | "\n", 247 | "# f1 score\n", 248 | "from sklearn.metrics import f1_score\n", 249 | "f1 = f1_score(y_te, pred_tree, average='macro')\n", 250 | "print(f1)\n", 251 | "\n", 252 | "# confusion matrix 확인 \n", 253 | "conf_matrix = confusion_matrix(y_te, pred_tree)\n", 254 | "print(conf_matrix)\n", 255 | "\n", 256 | "# 분류 레포트 확인\n", 257 | "class_report = classification_report(y_te, pred_tree)\n", 258 | "print(class_report)" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": null, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "metadata": {}, 279 | "outputs": [], 280 | "source": [] 281 | } 282 | ], 283 | "metadata": { 284 | "kernelspec": { 285 | "display_name": "Python 3", 286 | "language": "python", 287 | "name": "python3" 288 | }, 289 | "language_info": { 290 | "codemirror_mode": { 291 | "name": "ipython", 292 | "version": 3 293 | }, 294 | "file_extension": ".py", 295 | "mimetype": "text/x-python", 296 | "name": "python", 297 | "nbconvert_exporter": "python", 298 | "pygments_lexer": "ipython3", 299 | "version": "3.7.6" 300 | } 301 | }, 302 | "nbformat": 4, 303 | "nbformat_minor": 4 304 | } 305 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/08장_지도학습_8절_서포트벡터머신-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별코드 " 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,\n", 66 | " decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',\n", 67 | " max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,\n", 68 | " verbose=False)" 69 | ] 70 | }, 71 | "execution_count": 5, 72 | "metadata": {}, 73 | "output_type": "execute_result" 74 | } 75 | ], 76 | "source": [ 77 | "# 서포트벡터머신 학습\n", 78 | "from sklearn import svm \n", 79 | "clf_svm_lr = svm.SVC(kernel='linear', random_state=0)\n", 80 | "clf_svm_lr.fit(X_tn_std, y_tn)" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 6, 86 | "metadata": {}, 87 | "outputs": [ 88 | { 89 | "name": "stdout", 90 | "output_type": "stream", 91 | "text": [ 92 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 93 | " 1 1 2 0 0 1 1 1]\n" 94 | ] 95 | } 96 | ], 97 | "source": [ 98 | "# 예측\n", 99 | "pred_svm = clf_svm_lr.predict(X_te_std)\n", 100 | "print(pred_svm)" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 7, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "name": "stdout", 110 | "output_type": "stream", 111 | "text": [ 112 | "1.0\n" 113 | ] 114 | } 115 | ], 116 | "source": [ 117 | "# 정확도\n", 118 | "from sklearn.metrics import accuracy_score\n", 119 | "accuracy = accuracy_score(y_te, pred_svm)\n", 120 | "print(accuracy)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 8, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "[[16 0 0]\n", 133 | " [ 0 21 0]\n", 134 | " [ 0 0 8]]\n" 135 | ] 136 | } 137 | ], 138 | "source": [ 139 | "# confusion matrix 확인 \n", 140 | "from sklearn.metrics import confusion_matrix\n", 141 | "conf_matrix = confusion_matrix(y_te, pred_svm)\n", 142 | "print(conf_matrix)" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 9, 148 | "metadata": {}, 149 | "outputs": [ 150 | { 151 | "name": "stdout", 152 | "output_type": "stream", 153 | "text": [ 154 | " precision recall f1-score support\n", 155 | "\n", 156 | " 0 1.00 1.00 1.00 16\n", 157 | " 1 1.00 1.00 1.00 21\n", 158 | " 2 1.00 1.00 1.00 8\n", 159 | "\n", 160 | " accuracy 1.00 45\n", 161 | " macro avg 1.00 1.00 1.00 45\n", 162 | "weighted avg 1.00 1.00 1.00 45\n", 163 | "\n" 164 | ] 165 | } 166 | ], 167 | "source": [ 168 | "# 분류 레포트 확인\n", 169 | "from sklearn.metrics import classification_report\n", 170 | "class_report = classification_report(y_te, pred_svm)\n", 171 | "print(class_report)" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "# 통합코드" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 1, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "name": "stdout", 188 | "output_type": "stream", 189 | "text": [ 190 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 191 | " 1 1 2 0 0 1 1 1]\n", 192 | "1.0\n", 193 | "[[16 0 0]\n", 194 | " [ 0 21 0]\n", 195 | " [ 0 0 8]]\n", 196 | " precision recall f1-score support\n", 197 | "\n", 198 | " 0 1.00 1.00 1.00 16\n", 199 | " 1 1.00 1.00 1.00 21\n", 200 | " 2 1.00 1.00 1.00 8\n", 201 | "\n", 202 | " accuracy 1.00 45\n", 203 | " macro avg 1.00 1.00 1.00 45\n", 204 | "weighted avg 1.00 1.00 1.00 45\n", 205 | "\n" 206 | ] 207 | } 208 | ], 209 | "source": [ 210 | "from sklearn import datasets\n", 211 | "from sklearn.preprocessing import StandardScaler\n", 212 | "from sklearn.model_selection import train_test_split\n", 213 | "\n", 214 | "from sklearn import svm \n", 215 | "\n", 216 | "from sklearn.metrics import accuracy_score\n", 217 | "from sklearn.metrics import confusion_matrix\n", 218 | "from sklearn.metrics import classification_report\n", 219 | "\n", 220 | "# 데이터 불러오기\n", 221 | "raw_wine = datasets.load_wine()\n", 222 | "\n", 223 | "# 피쳐, 타겟 데이터 지정\n", 224 | "X = raw_wine.data\n", 225 | "y = raw_wine.target\n", 226 | "\n", 227 | "# 트레이닝/테스트 데이터 분할\n", 228 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 229 | "\n", 230 | "# 데이터 표준화\n", 231 | "std_scale = StandardScaler()\n", 232 | "std_scale.fit(X_tn)\n", 233 | "X_tn_std = std_scale.transform(X_tn)\n", 234 | "X_te_std = std_scale.transform(X_te)\n", 235 | "\n", 236 | "# 서포트벡터머신 학습\n", 237 | "clf_svm_lr = svm.SVC(kernel='linear', random_state=0)\n", 238 | "clf_svm_lr.fit(X_tn_std, y_tn)\n", 239 | "\n", 240 | "# 예측\n", 241 | "pred_svm = clf_svm_lr.predict(X_te_std)\n", 242 | "print(pred_svm)\n", 243 | "\n", 244 | "# 정확도\n", 245 | "accuracy = accuracy_score(y_te, pred_svm)\n", 246 | "print(accuracy)\n", 247 | "\n", 248 | "# confusion matrix 확인 \n", 249 | "conf_matrix = confusion_matrix(y_te, pred_svm)\n", 250 | "print(conf_matrix)\n", 251 | "\n", 252 | "# 분류 레포트 확인\n", 253 | "class_report = classification_report(y_te, pred_svm)\n", 254 | "print(class_report)" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [] 263 | } 264 | ], 265 | "metadata": { 266 | "kernelspec": { 267 | "display_name": "Python 3", 268 | "language": "python", 269 | "name": "python3" 270 | }, 271 | "language_info": { 272 | "codemirror_mode": { 273 | "name": "ipython", 274 | "version": 3 275 | }, 276 | "file_extension": ".py", 277 | "mimetype": "text/x-python", 278 | "name": "python", 279 | "nbconvert_exporter": "python", 280 | "pygments_lexer": "ipython3", 281 | "version": "3.7.6" 282 | } 283 | }, 284 | "nbformat": 4, 285 | "nbformat_minor": 4 286 | } 287 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/09장_앙상블_3절_1_랜덤포레스트-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 2, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 3, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 4, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 5, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 6, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n", 66 | " criterion='gini', max_depth=2, max_features='auto',\n", 67 | " max_leaf_nodes=None, max_samples=None,\n", 68 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 69 | " min_samples_leaf=1, min_samples_split=2,\n", 70 | " min_weight_fraction_leaf=0.0, n_estimators=100,\n", 71 | " n_jobs=None, oob_score=False, random_state=0, verbose=0,\n", 72 | " warm_start=False)" 73 | ] 74 | }, 75 | "execution_count": 6, 76 | "metadata": {}, 77 | "output_type": "execute_result" 78 | } 79 | ], 80 | "source": [ 81 | "from sklearn.ensemble import RandomForestClassifier\n", 82 | "clf_rf = RandomForestClassifier(max_depth=2, \n", 83 | " random_state=0)\n", 84 | "clf_rf.fit(X_tn_std, y_tn)" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 7, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "name": "stdout", 94 | "output_type": "stream", 95 | "text": [ 96 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 97 | " 1 1 2 0 0 1 1 1]\n" 98 | ] 99 | } 100 | ], 101 | "source": [ 102 | "# 예측\n", 103 | "pred_rf = clf_rf.predict(X_te_std)\n", 104 | "print(pred_rf)" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 8, 110 | "metadata": {}, 111 | "outputs": [ 112 | { 113 | "name": "stdout", 114 | "output_type": "stream", 115 | "text": [ 116 | "0.9555555555555556\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "# 정확도\n", 122 | "from sklearn.metrics import accuracy_score\n", 123 | "accuracy = accuracy_score(y_te, pred_rf)\n", 124 | "print(accuracy)" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 9, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "name": "stdout", 134 | "output_type": "stream", 135 | "text": [ 136 | "[[16 0 0]\n", 137 | " [ 1 19 1]\n", 138 | " [ 0 0 8]]\n" 139 | ] 140 | } 141 | ], 142 | "source": [ 143 | "# confusion matrix 확인 \n", 144 | "from sklearn.metrics import confusion_matrix\n", 145 | "conf_matrix = confusion_matrix(y_te, pred_rf)\n", 146 | "print(conf_matrix)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 10, 152 | "metadata": {}, 153 | "outputs": [ 154 | { 155 | "name": "stdout", 156 | "output_type": "stream", 157 | "text": [ 158 | " precision recall f1-score support\n", 159 | "\n", 160 | " 0 0.94 1.00 0.97 16\n", 161 | " 1 1.00 0.90 0.95 21\n", 162 | " 2 0.89 1.00 0.94 8\n", 163 | "\n", 164 | " accuracy 0.96 45\n", 165 | " macro avg 0.94 0.97 0.95 45\n", 166 | "weighted avg 0.96 0.96 0.96 45\n", 167 | "\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "# 분류 레포트 확인\n", 173 | "from sklearn.metrics import classification_report\n", 174 | "class_report = classification_report(y_te, pred_rf)\n", 175 | "print(class_report)" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "# 통합 코드" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 11, 188 | "metadata": {}, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 195 | " 1 1 2 0 0 1 1 1]\n", 196 | "0.9555555555555556\n", 197 | "[[16 0 0]\n", 198 | " [ 1 19 1]\n", 199 | " [ 0 0 8]]\n", 200 | " precision recall f1-score support\n", 201 | "\n", 202 | " 0 0.94 1.00 0.97 16\n", 203 | " 1 1.00 0.90 0.95 21\n", 204 | " 2 0.89 1.00 0.94 8\n", 205 | "\n", 206 | " accuracy 0.96 45\n", 207 | " macro avg 0.94 0.97 0.95 45\n", 208 | "weighted avg 0.96 0.96 0.96 45\n", 209 | "\n" 210 | ] 211 | } 212 | ], 213 | "source": [ 214 | "from sklearn import datasets\n", 215 | "from sklearn.model_selection import train_test_split\n", 216 | "from sklearn.preprocessing import StandardScaler\n", 217 | "\n", 218 | "from sklearn.ensemble import RandomForestClassifier\n", 219 | "\n", 220 | "from sklearn.metrics import accuracy_score\n", 221 | "from sklearn.metrics import confusion_matrix\n", 222 | "from sklearn.metrics import classification_report\n", 223 | "\n", 224 | "# 데이터 불러오기\n", 225 | "raw_wine = datasets.load_wine()\n", 226 | "\n", 227 | "# 피쳐, 타겟 데이터 지정\n", 228 | "X = raw_wine.data\n", 229 | "y = raw_wine.target\n", 230 | "\n", 231 | "# 트레이닝/테스트 데이터 분할\n", 232 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 233 | "\n", 234 | "# 데이터 표준화\n", 235 | "std_scale = StandardScaler()\n", 236 | "std_scale.fit(X_tn)\n", 237 | "X_tn_std = std_scale.transform(X_tn)\n", 238 | "X_te_std = std_scale.transform(X_te)\n", 239 | "\n", 240 | "# 랜덤포레스트 학습\n", 241 | "clf_rf = RandomForestClassifier(max_depth=2, \n", 242 | " random_state=0)\n", 243 | "clf_rf.fit(X_tn_std, y_tn)\n", 244 | "\n", 245 | "# 예측\n", 246 | "pred_rf = clf_rf.predict(X_te_std)\n", 247 | "print(pred_rf)\n", 248 | "\n", 249 | "# 정확도\n", 250 | "accuracy = accuracy_score(y_te, pred_rf)\n", 251 | "print(accuracy)\n", 252 | "\n", 253 | "# confusion matrix 확인 \n", 254 | "conf_matrix = confusion_matrix(y_te, pred_rf)\n", 255 | "print(conf_matrix)\n", 256 | "\n", 257 | "# 분류 레포트 확인\n", 258 | "class_report = classification_report(y_te, pred_rf)\n", 259 | "print(class_report)" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [] 268 | } 269 | ], 270 | "metadata": { 271 | "kernelspec": { 272 | "display_name": "Python 3", 273 | "language": "python", 274 | "name": "python3" 275 | }, 276 | "language_info": { 277 | "codemirror_mode": { 278 | "name": "ipython", 279 | "version": 3 280 | }, 281 | "file_extension": ".py", 282 | "mimetype": "text/x-python", 283 | "name": "python", 284 | "nbconvert_exporter": "python", 285 | "pygments_lexer": "ipython3", 286 | "version": "3.7.6" 287 | } 288 | }, 289 | "nbformat": 4, 290 | "nbformat_minor": 4 291 | } 292 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/09장_앙상블_3절_2_배깅-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 16, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "BaggingClassifier(base_estimator=GaussianNB(priors=None, var_smoothing=1e-09),\n", 66 | " bootstrap=True, bootstrap_features=False, max_features=1.0,\n", 67 | " max_samples=1.0, n_estimators=10, n_jobs=None,\n", 68 | " oob_score=False, random_state=0, verbose=0, warm_start=False)" 69 | ] 70 | }, 71 | "execution_count": 16, 72 | "metadata": {}, 73 | "output_type": "execute_result" 74 | } 75 | ], 76 | "source": [ 77 | "# 배깅 학습\n", 78 | "from sklearn.naive_bayes import GaussianNB\n", 79 | "from sklearn.ensemble import BaggingClassifier\n", 80 | "clf_bagging = BaggingClassifier(base_estimator=GaussianNB(),\n", 81 | " n_estimators=10, \n", 82 | " random_state=0)\n", 83 | "clf_bagging.fit(X_tn_std, y_tn)" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 17, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "name": "stdout", 93 | "output_type": "stream", 94 | "text": [ 95 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 96 | " 1 1 2 0 0 1 1 1]\n" 97 | ] 98 | } 99 | ], 100 | "source": [ 101 | "# 예측\n", 102 | "pred_bagging = clf_bagging.predict(X_te_std)\n", 103 | "print(pred_bagging)" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 18, 109 | "metadata": {}, 110 | "outputs": [ 111 | { 112 | "name": "stdout", 113 | "output_type": "stream", 114 | "text": [ 115 | "0.9555555555555556\n" 116 | ] 117 | } 118 | ], 119 | "source": [ 120 | "# 정확도\n", 121 | "from sklearn.metrics import accuracy_score\n", 122 | "accuracy = accuracy_score(y_te, pred_bagging)\n", 123 | "print(accuracy)" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 19, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "[[16 0 0]\n", 136 | " [ 1 19 1]\n", 137 | " [ 0 0 8]]\n" 138 | ] 139 | } 140 | ], 141 | "source": [ 142 | "# confusion matrix 확인 \n", 143 | "from sklearn.metrics import confusion_matrix\n", 144 | "conf_matrix = confusion_matrix(y_te, pred_bagging)\n", 145 | "print(conf_matrix)" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 20, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | " precision recall f1-score support\n", 158 | "\n", 159 | " 0 0.94 1.00 0.97 16\n", 160 | " 1 1.00 0.90 0.95 21\n", 161 | " 2 0.89 1.00 0.94 8\n", 162 | "\n", 163 | " accuracy 0.96 45\n", 164 | " macro avg 0.94 0.97 0.95 45\n", 165 | "weighted avg 0.96 0.96 0.96 45\n", 166 | "\n" 167 | ] 168 | } 169 | ], 170 | "source": [ 171 | "# 분류 레포트 확인\n", 172 | "from sklearn.metrics import classification_report\n", 173 | "class_report = classification_report(y_te, pred_bagging)\n", 174 | "print(class_report)" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "# 통합 코드" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 21, 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "name": "stdout", 191 | "output_type": "stream", 192 | "text": [ 193 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 194 | " 1 1 2 0 0 1 1 1]\n", 195 | "0.9555555555555556\n", 196 | "[[16 0 0]\n", 197 | " [ 1 19 1]\n", 198 | " [ 0 0 8]]\n", 199 | " precision recall f1-score support\n", 200 | "\n", 201 | " 0 0.94 1.00 0.97 16\n", 202 | " 1 1.00 0.90 0.95 21\n", 203 | " 2 0.89 1.00 0.94 8\n", 204 | "\n", 205 | " accuracy 0.96 45\n", 206 | " macro avg 0.94 0.97 0.95 45\n", 207 | "weighted avg 0.96 0.96 0.96 45\n", 208 | "\n" 209 | ] 210 | } 211 | ], 212 | "source": [ 213 | "from sklearn import datasets\n", 214 | "from sklearn.model_selection import train_test_split\n", 215 | "from sklearn.preprocessing import StandardScaler\n", 216 | "\n", 217 | "from sklearn.naive_bayes import GaussianNB\n", 218 | "from sklearn.ensemble import BaggingClassifier\n", 219 | "\n", 220 | "from sklearn.metrics import accuracy_score\n", 221 | "from sklearn.metrics import confusion_matrix\n", 222 | "from sklearn.metrics import classification_report\n", 223 | "\n", 224 | "# 데이터 불러오기\n", 225 | "raw_wine = datasets.load_wine()\n", 226 | "\n", 227 | "# 피쳐, 타겟 데이터 지정\n", 228 | "X = raw_wine.data\n", 229 | "y = raw_wine.target\n", 230 | "\n", 231 | "# 트레이닝/테스트 데이터 분할\n", 232 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 233 | "\n", 234 | "# 데이터 표준화\n", 235 | "std_scale = StandardScaler()\n", 236 | "std_scale.fit(X_tn)\n", 237 | "X_tn_std = std_scale.transform(X_tn)\n", 238 | "X_te_std = std_scale.transform(X_te)\n", 239 | "\n", 240 | "# 배깅 학습\n", 241 | "clf_bagging = BaggingClassifier(base_estimator=GaussianNB(),\n", 242 | " n_estimators=10, \n", 243 | " random_state=0)\n", 244 | "clf_bagging.fit(X_tn_std, y_tn)\n", 245 | "\n", 246 | "# 예측\n", 247 | "pred_bagging = clf_bagging.predict(X_te_std)\n", 248 | "print(pred_bagging)\n", 249 | "\n", 250 | "# 정확도\n", 251 | "accuracy = accuracy_score(y_te, pred_bagging)\n", 252 | "print(accuracy)\n", 253 | "\n", 254 | "# confusion matrix 확인 \n", 255 | "conf_matrix = confusion_matrix(y_te, pred_bagging)\n", 256 | "print(conf_matrix)\n", 257 | "\n", 258 | "# 분류 레포트 확인\n", 259 | "class_report = classification_report(y_te, pred_bagging)\n", 260 | "print(class_report)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [] 269 | } 270 | ], 271 | "metadata": { 272 | "kernelspec": { 273 | "display_name": "Python 3", 274 | "language": "python", 275 | "name": "python3" 276 | }, 277 | "language_info": { 278 | "codemirror_mode": { 279 | "name": "ipython", 280 | "version": 3 281 | }, 282 | "file_extension": ".py", 283 | "mimetype": "text/x-python", 284 | "name": "python", 285 | "nbconvert_exporter": "python", 286 | "pygments_lexer": "ipython3", 287 | "version": "3.7.6" 288 | } 289 | }, 290 | "nbformat": 4, 291 | "nbformat_minor": 4 292 | } 293 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/09장_앙상블_4절_1_adaboost-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 11, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_breast_cancer = datasets.load_breast_cancer()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 12, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_breast_cancer.data\n", 29 | "y = raw_breast_cancer.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 13, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 14, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 15, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1.0,\n", 66 | " n_estimators=50, random_state=0)" 67 | ] 68 | }, 69 | "execution_count": 15, 70 | "metadata": {}, 71 | "output_type": "execute_result" 72 | } 73 | ], 74 | "source": [ 75 | "# 에이다 부스트 학습\n", 76 | "from sklearn.ensemble import AdaBoostClassifier\n", 77 | "clf_ada = AdaBoostClassifier(random_state=0)\n", 78 | "clf_ada.fit(X_tn_std, y_tn)" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 16, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "name": "stdout", 88 | "output_type": "stream", 89 | "text": [ 90 | "[0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 91 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n", 92 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n", 93 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0]\n" 94 | ] 95 | } 96 | ], 97 | "source": [ 98 | "# 예측\n", 99 | "pred_ada = clf_ada.predict(X_te_std)\n", 100 | "print(pred_ada)" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 17, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "name": "stdout", 110 | "output_type": "stream", 111 | "text": [ 112 | "0.9790209790209791\n" 113 | ] 114 | } 115 | ], 116 | "source": [ 117 | "# 정확도\n", 118 | "from sklearn.metrics import accuracy_score\n", 119 | "accuracy = accuracy_score(y_te, pred_ada)\n", 120 | "print(accuracy)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 18, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "[[52 1]\n", 133 | " [ 2 88]]\n" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "# confusion matrix 확인 \n", 139 | "from sklearn.metrics import confusion_matrix\n", 140 | "conf_matrix = confusion_matrix(y_te, pred_ada)\n", 141 | "print(conf_matrix)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 19, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | " precision recall f1-score support\n", 154 | "\n", 155 | " 0 0.96 0.98 0.97 53\n", 156 | " 1 0.99 0.98 0.98 90\n", 157 | "\n", 158 | " accuracy 0.98 143\n", 159 | " macro avg 0.98 0.98 0.98 143\n", 160 | "weighted avg 0.98 0.98 0.98 143\n", 161 | "\n" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "# 분류 레포트 확인\n", 167 | "from sklearn.metrics import classification_report\n", 168 | "class_report = classification_report(y_te, pred_ada)\n", 169 | "print(class_report)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "# 통합 코드 " 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 20, 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "name": "stdout", 186 | "output_type": "stream", 187 | "text": [ 188 | "[0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 189 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n", 190 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n", 191 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0]\n", 192 | "0.9790209790209791\n", 193 | "[[52 1]\n", 194 | " [ 2 88]]\n", 195 | " precision recall f1-score support\n", 196 | "\n", 197 | " 0 0.96 0.98 0.97 53\n", 198 | " 1 0.99 0.98 0.98 90\n", 199 | "\n", 200 | " accuracy 0.98 143\n", 201 | " macro avg 0.98 0.98 0.98 143\n", 202 | "weighted avg 0.98 0.98 0.98 143\n", 203 | "\n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "from sklearn import datasets\n", 209 | "from sklearn.model_selection import train_test_split\n", 210 | "from sklearn.preprocessing import StandardScaler\n", 211 | "\n", 212 | "from sklearn.ensemble import AdaBoostClassifier\n", 213 | "\n", 214 | "from sklearn.metrics import accuracy_score\n", 215 | "from sklearn.metrics import confusion_matrix\n", 216 | "from sklearn.metrics import classification_report\n", 217 | "\n", 218 | "# 데이터 불러오기\n", 219 | "raw_breast_cancer = datasets.load_breast_cancer()\n", 220 | "\n", 221 | "# 피쳐, 타겟 데이터 지정\n", 222 | "X = raw_breast_cancer.data\n", 223 | "y = raw_breast_cancer.target\n", 224 | "\n", 225 | "# 트레이닝/테스트 데이터 분할\n", 226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 227 | "\n", 228 | "# 데이터 표준화\n", 229 | "std_scale = StandardScaler()\n", 230 | "std_scale.fit(X_tn)\n", 231 | "X_tn_std = std_scale.transform(X_tn)\n", 232 | "X_te_std = std_scale.transform(X_te)\n", 233 | "\n", 234 | "# 에이다 부스트 학습\n", 235 | "clf_ada = AdaBoostClassifier(random_state=0)\n", 236 | "clf_ada.fit(X_tn_std, y_tn)\n", 237 | "\n", 238 | "# 예측\n", 239 | "pred_ada = clf_ada.predict(X_te_std)\n", 240 | "print(pred_ada)\n", 241 | "\n", 242 | "# 정확도\n", 243 | "accuracy = accuracy_score(y_te, pred_ada)\n", 244 | "print(accuracy)\n", 245 | "\n", 246 | "# confusion matrix 확인 \n", 247 | "conf_matrix = confusion_matrix(y_te, pred_ada)\n", 248 | "print(conf_matrix)\n", 249 | "\n", 250 | "# 분류 레포트 확인\n", 251 | "class_report = classification_report(y_te, pred_ada)\n", 252 | "print(class_report)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": {}, 259 | "outputs": [], 260 | "source": [] 261 | } 262 | ], 263 | "metadata": { 264 | "kernelspec": { 265 | "display_name": "Python 3", 266 | "language": "python", 267 | "name": "python3" 268 | }, 269 | "language_info": { 270 | "codemirror_mode": { 271 | "name": "ipython", 272 | "version": 3 273 | }, 274 | "file_extension": ".py", 275 | "mimetype": "text/x-python", 276 | "name": "python", 277 | "nbconvert_exporter": "python", 278 | "pygments_lexer": "ipython3", 279 | "version": "3.7.6" 280 | } 281 | }, 282 | "nbformat": 4, 283 | "nbformat_minor": 4 284 | } 285 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/09장_앙상블_4절_2_gradient_boost-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_breast_cancer = datasets.load_breast_cancer()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_breast_cancer.data\n", 29 | "y = raw_breast_cancer.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,\n", 66 | " learning_rate=0.01, loss='deviance', max_depth=2,\n", 67 | " max_features=None, max_leaf_nodes=None,\n", 68 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 69 | " min_samples_leaf=1, min_samples_split=2,\n", 70 | " min_weight_fraction_leaf=0.0, n_estimators=100,\n", 71 | " n_iter_no_change=None, presort='deprecated',\n", 72 | " random_state=0, subsample=1.0, tol=0.0001,\n", 73 | " validation_fraction=0.1, verbose=0,\n", 74 | " warm_start=False)" 75 | ] 76 | }, 77 | "execution_count": 5, 78 | "metadata": {}, 79 | "output_type": "execute_result" 80 | } 81 | ], 82 | "source": [ 83 | "# Gradient Boosting 학습\n", 84 | "from sklearn.ensemble import GradientBoostingClassifier\n", 85 | "clf_gbt = GradientBoostingClassifier(max_depth=2, \n", 86 | " learning_rate=0.01,\n", 87 | " random_state=0)\n", 88 | "clf_gbt.fit(X_tn_std, y_tn)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 6, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "name": "stdout", 98 | "output_type": "stream", 99 | "text": [ 100 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 101 | " 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1\n", 102 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n", 103 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n" 104 | ] 105 | } 106 | ], 107 | "source": [ 108 | "# 예측\n", 109 | "pred_gboost = clf_gbt.predict(X_te_std)\n", 110 | "print(pred_gboost)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 7, 116 | "metadata": {}, 117 | "outputs": [ 118 | { 119 | "name": "stdout", 120 | "output_type": "stream", 121 | "text": [ 122 | "0.965034965034965\n" 123 | ] 124 | } 125 | ], 126 | "source": [ 127 | "# 정확도\n", 128 | "from sklearn.metrics import accuracy_score\n", 129 | "accuracy = accuracy_score(y_te, pred_gboost)\n", 130 | "print(accuracy)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 8, 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "name": "stdout", 140 | "output_type": "stream", 141 | "text": [ 142 | "[[49 4]\n", 143 | " [ 1 89]]\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "# confusion matrix 확인 \n", 149 | "from sklearn.metrics import confusion_matrix\n", 150 | "conf_matrix = confusion_matrix(y_te, pred_gboost)\n", 151 | "print(conf_matrix)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 9, 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "name": "stdout", 161 | "output_type": "stream", 162 | "text": [ 163 | " precision recall f1-score support\n", 164 | "\n", 165 | " 0 0.98 0.92 0.95 53\n", 166 | " 1 0.96 0.99 0.97 90\n", 167 | "\n", 168 | " accuracy 0.97 143\n", 169 | " macro avg 0.97 0.96 0.96 143\n", 170 | "weighted avg 0.97 0.97 0.96 143\n", 171 | "\n" 172 | ] 173 | } 174 | ], 175 | "source": [ 176 | "# 분류 레포트 확인\n", 177 | "from sklearn.metrics import classification_report\n", 178 | "class_report = classification_report(y_te, pred_gboost)\n", 179 | "print(class_report)" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "# 통합 코드" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 1, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 199 | " 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1\n", 200 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n", 201 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n", 202 | "0.965034965034965\n", 203 | "[[49 4]\n", 204 | " [ 1 89]]\n", 205 | " precision recall f1-score support\n", 206 | "\n", 207 | " 0 0.98 0.92 0.95 53\n", 208 | " 1 0.96 0.99 0.97 90\n", 209 | "\n", 210 | " accuracy 0.97 143\n", 211 | " macro avg 0.97 0.96 0.96 143\n", 212 | "weighted avg 0.97 0.97 0.96 143\n", 213 | "\n" 214 | ] 215 | } 216 | ], 217 | "source": [ 218 | "from sklearn import datasets\n", 219 | "from sklearn.model_selection import train_test_split\n", 220 | "from sklearn.preprocessing import StandardScaler\n", 221 | "\n", 222 | "from sklearn.ensemble import GradientBoostingClassifier\n", 223 | "\n", 224 | "from sklearn.metrics import accuracy_score\n", 225 | "from sklearn.metrics import confusion_matrix\n", 226 | "from sklearn.metrics import classification_report\n", 227 | "\n", 228 | "# 데이터 불러오기\n", 229 | "raw_breast_cancer = datasets.load_breast_cancer()\n", 230 | "\n", 231 | "# 피쳐, 타겟 데이터 지정\n", 232 | "X = raw_breast_cancer.data\n", 233 | "y = raw_breast_cancer.target\n", 234 | "\n", 235 | "# 트레이닝/테스트 데이터 분할\n", 236 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 237 | "\n", 238 | "# 데이터 표준화\n", 239 | "std_scale = StandardScaler()\n", 240 | "std_scale.fit(X_tn)\n", 241 | "X_tn_std = std_scale.transform(X_tn)\n", 242 | "X_te_std = std_scale.transform(X_te)\n", 243 | "\n", 244 | "# Gradient Boosting 학습\n", 245 | "clf_gbt = GradientBoostingClassifier(max_depth=2, \n", 246 | " learning_rate=0.01,\n", 247 | " random_state=0)\n", 248 | "clf_gbt.fit(X_tn_std, y_tn)\n", 249 | "\n", 250 | "# 예측\n", 251 | "pred_gboost = clf_gbt.predict(X_te_std)\n", 252 | "print(pred_gboost)\n", 253 | "\n", 254 | "# 정확도\n", 255 | "accuracy = accuracy_score(y_te, pred_gboost)\n", 256 | "print(accuracy)\n", 257 | "\n", 258 | "# confusion matrix 확인 \n", 259 | "conf_matrix = confusion_matrix(y_te, pred_gboost)\n", 260 | "print(conf_matrix)\n", 261 | "\n", 262 | "# 분류 레포트 확인\n", 263 | "class_report = classification_report(y_te, pred_gboost)\n", 264 | "print(class_report)" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": null, 270 | "metadata": {}, 271 | "outputs": [], 272 | "source": [] 273 | } 274 | ], 275 | "metadata": { 276 | "kernelspec": { 277 | "display_name": "Python 3", 278 | "language": "python", 279 | "name": "python3" 280 | }, 281 | "language_info": { 282 | "codemirror_mode": { 283 | "name": "ipython", 284 | "version": 3 285 | }, 286 | "file_extension": ".py", 287 | "mimetype": "text/x-python", 288 | "name": "python", 289 | "nbconvert_exporter": "python", 290 | "pygments_lexer": "ipython3", 291 | "version": "3.7.6" 292 | } 293 | }, 294 | "nbformat": 4, 295 | "nbformat_minor": 4 296 | } 297 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/09장_앙상블_5절_스태킹-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_breast_cancer = datasets.load_breast_cancer()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_breast_cancer.data\n", 29 | "y = raw_breast_cancer.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "StackingClassifier(cv=None,\n", 66 | " estimators=[('svm',\n", 67 | " SVC(C=1.0, break_ties=False, cache_size=200,\n", 68 | " class_weight=None, coef0=0.0,\n", 69 | " decision_function_shape='ovr', degree=3,\n", 70 | " gamma='scale', kernel='linear', max_iter=-1,\n", 71 | " probability=False, random_state=1,\n", 72 | " shrinking=True, tol=0.001, verbose=False)),\n", 73 | " ('gnb',\n", 74 | " GaussianNB(priors=None, var_smoothing=1e-09))],\n", 75 | " final_estimator=LogisticRegression(C=1.0, class_weight=None,\n", 76 | " dual=False,\n", 77 | " fit_intercept=True,\n", 78 | " intercept_scaling=1,\n", 79 | " l1_ratio=None,\n", 80 | " max_iter=100,\n", 81 | " multi_class='auto',\n", 82 | " n_jobs=None, penalty='l2',\n", 83 | " random_state=None,\n", 84 | " solver='lbfgs',\n", 85 | " tol=0.0001, verbose=0,\n", 86 | " warm_start=False),\n", 87 | " n_jobs=None, passthrough=False, stack_method='auto',\n", 88 | " verbose=0)" 89 | ] 90 | }, 91 | "execution_count": 5, 92 | "metadata": {}, 93 | "output_type": "execute_result" 94 | } 95 | ], 96 | "source": [ 97 | "# 스태킹 학습\n", 98 | "from sklearn import svm\n", 99 | "from sklearn.naive_bayes import GaussianNB\n", 100 | "from sklearn.linear_model import LogisticRegression\n", 101 | "from sklearn.ensemble import StackingClassifier\n", 102 | "\n", 103 | "clf1 = svm.SVC(kernel='linear', random_state=1) \n", 104 | "clf2 = GaussianNB()\n", 105 | "\n", 106 | "clf_stkg = StackingClassifier(\n", 107 | " estimators=[\n", 108 | " ('svm', clf1), \n", 109 | " ('gnb', clf2)\n", 110 | " ],\n", 111 | " final_estimator=LogisticRegression())\n", 112 | "clf_stkg.fit(X_tn_std, y_tn)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 7, 118 | "metadata": {}, 119 | "outputs": [ 120 | { 121 | "name": "stdout", 122 | "output_type": "stream", 123 | "text": [ 124 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 125 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n", 126 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1\n", 127 | " 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n" 128 | ] 129 | } 130 | ], 131 | "source": [ 132 | "# 예측\n", 133 | "pred_stkg = clf_stkg.predict(X_te_std)\n", 134 | "print(pred_stkg)" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 8, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "name": "stdout", 144 | "output_type": "stream", 145 | "text": [ 146 | "0.965034965034965\n" 147 | ] 148 | } 149 | ], 150 | "source": [ 151 | "# 정확도\n", 152 | "from sklearn.metrics import accuracy_score\n", 153 | "accuracy = accuracy_score(y_te, pred_stkg)\n", 154 | "print(accuracy)" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 9, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "name": "stdout", 164 | "output_type": "stream", 165 | "text": [ 166 | "[[50 3]\n", 167 | " [ 2 88]]\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "# confusion matrix 확인 \n", 173 | "from sklearn.metrics import confusion_matrix\n", 174 | "conf_matrix = confusion_matrix(y_te, pred_stkg)\n", 175 | "print(conf_matrix)" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 10, 181 | "metadata": {}, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | " precision recall f1-score support\n", 188 | "\n", 189 | " 0 0.96 0.94 0.95 53\n", 190 | " 1 0.97 0.98 0.97 90\n", 191 | "\n", 192 | " accuracy 0.97 143\n", 193 | " macro avg 0.96 0.96 0.96 143\n", 194 | "weighted avg 0.96 0.97 0.96 143\n", 195 | "\n" 196 | ] 197 | } 198 | ], 199 | "source": [ 200 | "# 분류 레포트 확인\n", 201 | "from sklearn.metrics import classification_report\n", 202 | "class_report = classification_report(y_te, pred_stkg)\n", 203 | "print(class_report)" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "# 통합코드" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 11, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 223 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n", 224 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1\n", 225 | " 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n", 226 | "0.965034965034965\n", 227 | "[[50 3]\n", 228 | " [ 2 88]]\n", 229 | " precision recall f1-score support\n", 230 | "\n", 231 | " 0 0.96 0.94 0.95 53\n", 232 | " 1 0.97 0.98 0.97 90\n", 233 | "\n", 234 | " accuracy 0.97 143\n", 235 | " macro avg 0.96 0.96 0.96 143\n", 236 | "weighted avg 0.96 0.97 0.96 143\n", 237 | "\n" 238 | ] 239 | } 240 | ], 241 | "source": [ 242 | "from sklearn import datasets\n", 243 | "from sklearn.model_selection import train_test_split\n", 244 | "from sklearn.preprocessing import StandardScaler\n", 245 | "\n", 246 | "from sklearn import svm\n", 247 | "from sklearn.naive_bayes import GaussianNB\n", 248 | "from sklearn.linear_model import LogisticRegression\n", 249 | "from sklearn.ensemble import StackingClassifier\n", 250 | "\n", 251 | "from sklearn.metrics import accuracy_score\n", 252 | "from sklearn.metrics import confusion_matrix\n", 253 | "from sklearn.metrics import classification_report\n", 254 | "\n", 255 | "\n", 256 | "# 데이터 불러오기\n", 257 | "raw_breast_cancer = datasets.load_breast_cancer()\n", 258 | "\n", 259 | "# 피쳐, 타겟 데이터 지정\n", 260 | "X = raw_breast_cancer.data\n", 261 | "y = raw_breast_cancer.target\n", 262 | "\n", 263 | "# 트레이닝/테스트 데이터 분할\n", 264 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 265 | "\n", 266 | "# 데이터 표준화\n", 267 | "std_scale = StandardScaler()\n", 268 | "std_scale.fit(X_tn)\n", 269 | "X_tn_std = std_scale.transform(X_tn)\n", 270 | "X_te_std = std_scale.transform(X_te)\n", 271 | "\n", 272 | "# 스태킹 학습\n", 273 | "clf1 = svm.SVC(kernel='linear', random_state=1) \n", 274 | "clf2 = GaussianNB()\n", 275 | "\n", 276 | "clf_stkg = StackingClassifier(\n", 277 | " estimators=[\n", 278 | " ('svm', clf1), \n", 279 | " ('gnb', clf2)\n", 280 | " ],\n", 281 | " final_estimator=LogisticRegression())\n", 282 | "clf_stkg.fit(X_tn_std, y_tn)\n", 283 | "\n", 284 | "# 예측\n", 285 | "pred_stkg = clf_stkg.predict(X_te_std)\n", 286 | "print(pred_stkg)\n", 287 | "\n", 288 | "# 정확도\n", 289 | "accuracy = accuracy_score(y_te, pred_stkg)\n", 290 | "print(accuracy)\n", 291 | "\n", 292 | "# confusion matrix 확인 \n", 293 | "conf_matrix = confusion_matrix(y_te, pred_stkg)\n", 294 | "print(conf_matrix)\n", 295 | "\n", 296 | "# 분류 레포트 확인\n", 297 | "class_report = classification_report(y_te, pred_stkg)\n", 298 | "print(class_report)" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "metadata": {}, 305 | "outputs": [], 306 | "source": [] 307 | } 308 | ], 309 | "metadata": { 310 | "kernelspec": { 311 | "display_name": "Python 3", 312 | "language": "python", 313 | "name": "python3" 314 | }, 315 | "language_info": { 316 | "codemirror_mode": { 317 | "name": "ipython", 318 | "version": 3 319 | }, 320 | "file_extension": ".py", 321 | "mimetype": "text/x-python", 322 | "name": "python", 323 | "nbconvert_exporter": "python", 324 | "pygments_lexer": "ipython3", 325 | "version": "3.7.6" 326 | } 327 | }, 328 | "nbformat": 4, 329 | "nbformat_minor": 4 330 | } 331 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/12장_딥러닝_2절_1_퍼셉트론-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "[[2 3]\n", 20 | " [5 1]]\n", 21 | "[2 3 5 1]\n" 22 | ] 23 | } 24 | ], 25 | "source": [ 26 | "import numpy as np\n", 27 | "\n", 28 | "# 입력층\n", 29 | "input_data = np.array([[2,3], [5,1]])\n", 30 | "print(input_data)\n", 31 | "x = input_data.reshape(-1)\n", 32 | "print(x)" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 2, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "# 가중치 및 편향\n", 42 | "w1 = np.array([2,1,-3,3])\n", 43 | "w2 = np.array([1,-3,1,3])\n", 44 | "b1 = 3\n", 45 | "b2 = 3" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "name": "stdout", 55 | "output_type": "stream", 56 | "text": [ 57 | "[[ 2 1 -3 3]\n", 58 | " [ 1 -3 1 3]]\n", 59 | "[3 3]\n", 60 | "[-2 4]\n" 61 | ] 62 | } 63 | ], 64 | "source": [ 65 | "# 가중합\n", 66 | "W = np.array([w1, w2])\n", 67 | "print(W)\n", 68 | "b = np.array([b1, b2])\n", 69 | "print(b)\n", 70 | "weight_sum = np.dot(W, x) + b\n", 71 | "print(weight_sum)" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 5, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "name": "stdout", 81 | "output_type": "stream", 82 | "text": [ 83 | "[0.11920292 0.98201379]\n" 84 | ] 85 | } 86 | ], 87 | "source": [ 88 | "# 출력층\n", 89 | "res = 1/(1+np.exp(-weight_sum))\n", 90 | "print(res)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "# 통합 코드" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 6, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "import numpy as np\n", 107 | "\n", 108 | "# 입력층\n", 109 | "input_data = np.array([[2,3], [5,1]])\n", 110 | "x = input_data.reshape(-1)\n", 111 | "\n", 112 | "# 가중치 및 편향\n", 113 | "w1 = np.array([2,1,-3,3])\n", 114 | "w2 = np.array([1,-3,1,3])\n", 115 | "b1 = 3\n", 116 | "b2 = 3\n", 117 | "\n", 118 | "# 가중합\n", 119 | "W = np.array([w1, w2])\n", 120 | "b = np.array([b1, b2])\n", 121 | "weight_sum = np.dot(W, x) + b\n", 122 | "\n", 123 | "# 출력층\n", 124 | "res = 1/(1+np.exp(-weight_sum))" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [] 133 | } 134 | ], 135 | "metadata": { 136 | "kernelspec": { 137 | "display_name": "Python 3", 138 | "language": "python", 139 | "name": "python3" 140 | }, 141 | "language_info": { 142 | "codemirror_mode": { 143 | "name": "ipython", 144 | "version": 3 145 | }, 146 | "file_extension": ".py", 147 | "mimetype": "text/x-python", 148 | "name": "python", 149 | "nbconvert_exporter": "python", 150 | "pygments_lexer": "ipython3", 151 | "version": "3.7.6" 152 | } 153 | }, 154 | "nbformat": 4, 155 | "nbformat_minor": 4 156 | } 157 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/12장_딥러닝_3절_7_텐서플로_소개-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Sequential API를 활용한 딥러닝 모형 생성" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from tensorflow.keras.models import Sequential\n", 17 | "from tensorflow.keras.layers import Dense\n", 18 | "\n", 19 | "model = Sequential()\n", 20 | "model.add(Dense(100, activation='relu', \n", 21 | " input_shape=(32,32,1)))\n", 22 | "model.add(Dense(50, activation='relu'))\n", 23 | "model.add(Dense(5, activation='softmax'))" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 2, 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "name": "stdout", 33 | "output_type": "stream", 34 | "text": [ 35 | "Model: \"sequential\"\n", 36 | "_________________________________________________________________\n", 37 | "Layer (type) Output Shape Param # \n", 38 | "=================================================================\n", 39 | "dense (Dense) (None, 32, 32, 100) 200 \n", 40 | "_________________________________________________________________\n", 41 | "dense_1 (Dense) (None, 32, 32, 50) 5050 \n", 42 | "_________________________________________________________________\n", 43 | "dense_2 (Dense) (None, 32, 32, 5) 255 \n", 44 | "=================================================================\n", 45 | "Total params: 5,505\n", 46 | "Trainable params: 5,505\n", 47 | "Non-trainable params: 0\n", 48 | "_________________________________________________________________\n" 49 | ] 50 | } 51 | ], 52 | "source": [ 53 | "model.summary()" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "# 함수형 API를 활용한 딥러닝 모형 생성" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 3, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "from tensorflow.keras.layers import Input, Dense\n", 70 | "from tensorflow.keras.models import Model\n", 71 | "\n", 72 | "input_layer = Input(shape=(32,32,1))\n", 73 | "\n", 74 | "x = Dense(units=100, activation = 'relu')(input_layer)\n", 75 | "x = Dense(units=50, activation = 'relu')(x)\n", 76 | "\n", 77 | "output_layer = Dense(units=5, activation='softmax')(x)\n", 78 | "\n", 79 | "model2 = Model(input_layer, output_layer)\n" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 4, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "name": "stdout", 89 | "output_type": "stream", 90 | "text": [ 91 | "Model: \"model\"\n", 92 | "_________________________________________________________________\n", 93 | "Layer (type) Output Shape Param # \n", 94 | "=================================================================\n", 95 | "input_1 (InputLayer) [(None, 32, 32, 1)] 0 \n", 96 | "_________________________________________________________________\n", 97 | "dense_3 (Dense) (None, 32, 32, 100) 200 \n", 98 | "_________________________________________________________________\n", 99 | "dense_4 (Dense) (None, 32, 32, 50) 5050 \n", 100 | "_________________________________________________________________\n", 101 | "dense_5 (Dense) (None, 32, 32, 5) 255 \n", 102 | "=================================================================\n", 103 | "Total params: 5,505\n", 104 | "Trainable params: 5,505\n", 105 | "Non-trainable params: 0\n", 106 | "_________________________________________________________________\n" 107 | ] 108 | } 109 | ], 110 | "source": [ 111 | "model2.summary()" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "# 활성화 함수 사용" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "x = Dense(units=100)(x)\n", 128 | "x = Activation('relu')(x)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "x = Dense(units=100, activation='relu')(x)" 138 | ] 139 | } 140 | ], 141 | "metadata": { 142 | "kernelspec": { 143 | "display_name": "Python 3", 144 | "language": "python", 145 | "name": "python3" 146 | }, 147 | "language_info": { 148 | "codemirror_mode": { 149 | "name": "ipython", 150 | "version": 3 151 | }, 152 | "file_extension": ".py", 153 | "mimetype": "text/x-python", 154 | "name": "python", 155 | "nbconvert_exporter": "python", 156 | "pygments_lexer": "ipython3", 157 | "version": "3.7.6" 158 | } 159 | }, 160 | "nbformat": 4, 161 | "nbformat_minor": 4 162 | } 163 | -------------------------------------------------------------------------------- /.ipynb_checkpoints/12장_딥러닝_7절_1_자연어처리-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 단어의 토큰화" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 55, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from tensorflow.keras.preprocessing.text import Tokenizer\n", 17 | "\n", 18 | "paper = ['많은 것을 바꾸고 싶다면 많은 것을 받아들여라']\n", 19 | "\n", 20 | "tknz = Tokenizer()\n", 21 | "tknz.fit_on_texts(paper)" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 57, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "name": "stdout", 31 | "output_type": "stream", 32 | "text": [ 33 | "{'많은': 1, '것을': 2, '바꾸고': 3, '싶다면': 4, '받아들여라': 5}\n", 34 | "OrderedDict([('많은', 2), ('것을', 2), ('바꾸고', 1), ('싶다면', 1), ('받아들여라', 1)])\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "print(tknz.word_index)\n", 40 | "print(tknz.word_counts)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "# 원 핫 인코딩" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 70, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "from tensorflow.keras.utils import to_categorical\n", 57 | "from tensorflow.keras.preprocessing.text import Tokenizer\n", 58 | "\n", 59 | "\n", 60 | "paper = ['많은 것을 바꾸고 싶다면 많은 것을 받아들여라']\n", 61 | "tknz = Tokenizer()\n", 62 | "tknz.fit_on_texts(paper)\n", 63 | "\n", 64 | "idx_paper = tknz.texts_to_sequences(paper)\n", 65 | "n = len(tknz.word_index)+1\n", 66 | "idx_onehot = to_categorical(idx_paper, num_classes=n)" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 71, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "name": "stdout", 76 | "output_type": "stream", 77 | "text": [ 78 | "[[1, 2, 3, 4, 1, 2, 5]]\n", 79 | "6\n", 80 | "[[[0. 1. 0. 0. 0. 0.]\n", 81 | " [0. 0. 1. 0. 0. 0.]\n", 82 | " [0. 0. 0. 1. 0. 0.]\n", 83 | " [0. 0. 0. 0. 1. 0.]\n", 84 | " [0. 1. 0. 0. 0. 0.]\n", 85 | " [0. 0. 1. 0. 0. 0.]\n", 86 | " [0. 0. 0. 0. 0. 1.]]]\n" 87 | ] 88 | } 89 | ], 90 | "source": [ 91 | "print(idx_paper)\n", 92 | "print(n)\n", 93 | "print(idx_onehot)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "# 단어 임베딩" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 76, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "from tensorflow.keras.models import Sequential\n", 110 | "from tensorflow.keras.layers import Embedding\n", 111 | "\n", 112 | "model = Sequential()\n", 113 | "model.add(Embedding(input_dim=n, output_dim=3))\n", 114 | "model.compile(optimizer='rmsprop', loss='mse')\n", 115 | "embedding = model.predict(idx_paper)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 77, 121 | "metadata": {}, 122 | "outputs": [ 123 | { 124 | "name": "stdout", 125 | "output_type": "stream", 126 | "text": [ 127 | "[[[-0.02796837 -0.03958071 -0.03936887]\n", 128 | " [-0.02087821 -0.02005102 0.0131931 ]\n", 129 | " [-0.00142742 -0.03759698 0.02437944]\n", 130 | " [ 0.01546348 -0.00769221 -0.01694027]\n", 131 | " [-0.02796837 -0.03958071 -0.03936887]\n", 132 | " [-0.02087821 -0.02005102 0.0131931 ]\n", 133 | " [ 0.024049 -0.03488786 0.02603838]]]\n" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "print(embedding)" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [] 147 | } 148 | ], 149 | "metadata": { 150 | "kernelspec": { 151 | "display_name": "Python 3", 152 | "language": "python", 153 | "name": "python3" 154 | }, 155 | "language_info": { 156 | "codemirror_mode": { 157 | "name": "ipython", 158 | "version": 3 159 | }, 160 | "file_extension": ".py", 161 | "mimetype": "text/x-python", 162 | "name": "python", 163 | "nbconvert_exporter": "python", 164 | "pygments_lexer": "ipython3", 165 | "version": "3.7.6" 166 | } 167 | }, 168 | "nbformat": 4, 169 | "nbformat_minor": 4 170 | } 171 | -------------------------------------------------------------------------------- /07장_3절_파이프라인.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 파이프 라인" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 8, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from sklearn import datasets\n", 17 | "from sklearn.pipeline import Pipeline\n", 18 | "from sklearn.preprocessing import StandardScaler\n", 19 | "from sklearn.linear_model import LinearRegression\n", 20 | "from sklearn.model_selection import train_test_split\n", 21 | "from sklearn.metrics import mean_squared_error" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 9, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "data": { 31 | "text/plain": [ 32 | "29.515137790197567" 33 | ] 34 | }, 35 | "execution_count": 9, 36 | "metadata": {}, 37 | "output_type": "execute_result" 38 | } 39 | ], 40 | "source": [ 41 | "raw_boston = datasets.load_boston()\n", 42 | "\n", 43 | "X = raw_boston.data\n", 44 | "y = raw_boston.target\n", 45 | "\n", 46 | "# 트레이닝 / 테스트 데이터 분할\n", 47 | "X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=7)\n", 48 | "\n", 49 | "# 표준화 스케일링\n", 50 | "std_scale = StandardScaler()\n", 51 | "X_tn_std = std_scale.fit_transform(X_tn)\n", 52 | "X_te_std = std_scale.transform(X_te)\n", 53 | "\n", 54 | "# 학습\n", 55 | "clf_linear = LinearRegression()\n", 56 | "clf_linear.fit(X_tn_std, y_tn)\n", 57 | "\n", 58 | "# 예측\n", 59 | "pred_linear = clf_linear.predict(X_te_std)\n", 60 | "\n", 61 | "# 평가\n", 62 | "mean_squared_error(y_te, pred_linear)" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 10, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "data": { 72 | "text/plain": [ 73 | "29.515137790197567" 74 | ] 75 | }, 76 | "execution_count": 10, 77 | "metadata": {}, 78 | "output_type": "execute_result" 79 | } 80 | ], 81 | "source": [ 82 | "# 트레이닝 / 테스트 데이터 분할\n", 83 | "X_tn, X_te, y_tn, y_te = train_test_split(X,y,random_state=7)\n", 84 | "\n", 85 | "# 파이프라인\n", 86 | "linear_pipline = Pipeline([\n", 87 | " ('scaler',StandardScaler()), \n", 88 | " ('linear_regression', LinearRegression()) \n", 89 | "])\n", 90 | "\n", 91 | "# 학습\n", 92 | "linear_pipline.fit(X_tn, y_tn)\n", 93 | "\n", 94 | "# 예측\n", 95 | "pred_linear = linear_pipline.predict(X_te)\n", 96 | "\n", 97 | "# 평가\n", 98 | "mean_squared_error(y_te, pred_linear)" 99 | ] 100 | } 101 | ], 102 | "metadata": { 103 | "kernelspec": { 104 | "display_name": "Python 3", 105 | "language": "python", 106 | "name": "python3" 107 | }, 108 | "language_info": { 109 | "codemirror_mode": { 110 | "name": "ipython", 111 | "version": 3 112 | }, 113 | "file_extension": ".py", 114 | "mimetype": "text/x-python", 115 | "name": "python", 116 | "nbconvert_exporter": "python", 117 | "pygments_lexer": "ipython3", 118 | "version": "3.7.6" 119 | } 120 | }, 121 | "nbformat": 4, 122 | "nbformat_minor": 4 123 | } 124 | -------------------------------------------------------------------------------- /07장_4절_그리드서치.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 그리드 서치" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 7, 13 | "metadata": { 14 | "scrolled": true 15 | }, 16 | "outputs": [ 17 | { 18 | "name": "stdout", 19 | "output_type": "stream", 20 | "text": [ 21 | "{'k': 3}\n", 22 | "0.9736842105263158\n" 23 | ] 24 | } 25 | ], 26 | "source": [ 27 | "from sklearn import datasets\n", 28 | "from sklearn.preprocessing import StandardScaler\n", 29 | "from sklearn.neighbors import KNeighborsClassifier\n", 30 | "from sklearn.model_selection import train_test_split\n", 31 | "\n", 32 | "from sklearn.metrics import accuracy_score\n", 33 | "from sklearn.metrics import confusion_matrix\n", 34 | "from sklearn.metrics import classification_report\n", 35 | "\n", 36 | "# 꽃 데이터 불러오기\n", 37 | "raw_iris = datasets.load_iris()\n", 38 | "\n", 39 | "# 피쳐 / 타겟\n", 40 | "X = raw_iris.data\n", 41 | "y = raw_iris.target\n", 42 | "\n", 43 | "# 트레이닝 / 테스트 데이터 분할\n", 44 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 45 | "\n", 46 | "# 표준화 스케일\n", 47 | "std_scale = StandardScaler()\n", 48 | "std_scale.fit(X_tn)\n", 49 | "X_tn_std = std_scale.transform(X_tn)\n", 50 | "X_te_std = std_scale.transform(X_te)\n", 51 | "\n", 52 | "best_accuracy = 0\n", 53 | "\n", 54 | "for k in [1,2,3,4,5,6,7,8,9,10]:\n", 55 | " clf_knn = KNeighborsClassifier(n_neighbors=k)\n", 56 | " clf_knn.fit(X_tn_std, y_tn)\n", 57 | " knn_pred = clf_knn.predict(X_te_std)\n", 58 | " accuracy = accuracy_score(y_te, knn_pred)\n", 59 | " if accuracy > best_accuracy:\n", 60 | " best_accuracy = accuracy\n", 61 | " final_k = {'k': k}\n", 62 | " \n", 63 | "print(final_k)\n", 64 | "print(accuracy)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": null, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [] 73 | } 74 | ], 75 | "metadata": { 76 | "kernelspec": { 77 | "display_name": "Python 3", 78 | "language": "python", 79 | "name": "python3" 80 | }, 81 | "language_info": { 82 | "codemirror_mode": { 83 | "name": "ipython", 84 | "version": 3 85 | }, 86 | "file_extension": ".py", 87 | "mimetype": "text/x-python", 88 | "name": "python", 89 | "nbconvert_exporter": "python", 90 | "pygments_lexer": "ipython3", 91 | "version": "3.7.6" 92 | } 93 | }, 94 | "nbformat": 4, 95 | "nbformat_minor": 4 96 | } 97 | -------------------------------------------------------------------------------- /07장_모형평가_6절_분류_회귀_군집.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 분류" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "## 정확도" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 3, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "name": "stdout", 26 | "output_type": "stream", 27 | "text": [ 28 | "0.5\n", 29 | "2\n" 30 | ] 31 | } 32 | ], 33 | "source": [ 34 | "#import numpy as np\n", 35 | "from sklearn.metrics import accuracy_score\n", 36 | "y_pred = [0, 2, 1, 3]\n", 37 | "y_true = [0, 1, 2, 3]\n", 38 | "print(accuracy_score(y_true, y_pred))\n", 39 | "print(accuracy_score(y_true, y_pred, normalize=False))" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 3, 45 | "metadata": {}, 46 | "outputs": [], 47 | "source": [ 48 | "## confusionm matrix" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 4, 54 | "metadata": {}, 55 | "outputs": [ 56 | { 57 | "data": { 58 | "text/plain": [ 59 | "array([[2, 0, 0],\n", 60 | " [0, 0, 1],\n", 61 | " [1, 0, 2]])" 62 | ] 63 | }, 64 | "execution_count": 4, 65 | "metadata": {}, 66 | "output_type": "execute_result" 67 | } 68 | ], 69 | "source": [ 70 | "from sklearn.metrics import confusion_matrix\n", 71 | "y_true = [2, 0, 2, 2, 0, 1]\n", 72 | "y_pred = [0, 0, 2, 2, 0, 2]\n", 73 | "confusion_matrix(y_true, y_pred)" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 5, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "## classification report " 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 6, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "name": "stdout", 92 | "output_type": "stream", 93 | "text": [ 94 | " precision recall f1-score support\n", 95 | "\n", 96 | " class 0 0.67 1.00 0.80 2\n", 97 | " class 1 0.00 0.00 0.00 1\n", 98 | " class 2 1.00 0.50 0.67 2\n", 99 | "\n", 100 | " accuracy 0.60 5\n", 101 | " macro avg 0.56 0.50 0.49 5\n", 102 | "weighted avg 0.67 0.60 0.59 5\n", 103 | "\n" 104 | ] 105 | } 106 | ], 107 | "source": [ 108 | "from sklearn.metrics import classification_report\n", 109 | "y_true = [0, 1, 2, 2, 0]\n", 110 | "y_pred = [0, 0, 2, 1, 0]\n", 111 | "target_names = ['class 0', 'class 1', 'class 2']\n", 112 | "print(classification_report(y_true, y_pred, target_names=target_names))" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "# 회귀" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": null, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "# mean absolute error" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 5, 134 | "metadata": {}, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "0.5\n" 141 | ] 142 | } 143 | ], 144 | "source": [ 145 | "from sklearn.metrics import mean_absolute_error\n", 146 | "y_true = [3, -0.5, 2, 7]\n", 147 | "y_pred = [2.5, 0.0, 2, 8]\n", 148 | "\n", 149 | "print(mean_absolute_error(y_true, y_pred))" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 6, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [ 158 | "# mean squared error" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 7, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "data": { 168 | "text/plain": [ 169 | "0.375" 170 | ] 171 | }, 172 | "execution_count": 7, 173 | "metadata": {}, 174 | "output_type": "execute_result" 175 | } 176 | ], 177 | "source": [ 178 | "from sklearn.metrics import mean_squared_error\n", 179 | "y_true = [3, -0.5, 2, 7]\n", 180 | "y_pred = [2.5, 0.0, 2, 8]\n", 181 | "print(mean_squared_error(y_true, y_pred))" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 8, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "# R2" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 10, 196 | "metadata": {}, 197 | "outputs": [ 198 | { 199 | "name": "stdout", 200 | "output_type": "stream", 201 | "text": [ 202 | "0.9486081370449679\n" 203 | ] 204 | } 205 | ], 206 | "source": [ 207 | "from sklearn.metrics import r2_score\n", 208 | "y_true = [3, -0.5, 2, 7]\n", 209 | "y_pred = [2.5, 0.0, 2, 8]\n", 210 | "print(r2_score(y_true, y_pred))" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "# 군집" 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": {}, 224 | "outputs": [], 225 | "source": [ 226 | "# adjusted rand index" 227 | ] 228 | }, 229 | { 230 | "cell_type": "code", 231 | "execution_count": 2, 232 | "metadata": {}, 233 | "outputs": [ 234 | { 235 | "name": "stdout", 236 | "output_type": "stream", 237 | "text": [ 238 | "0.24242424242424246\n" 239 | ] 240 | } 241 | ], 242 | "source": [ 243 | "from sklearn.metrics import adjusted_rand_score\n", 244 | "labels_true = [0, 0, 0, 1, 1, 1]\n", 245 | "labels_pred = [0, 0, 1, 1, 2, 2]\n", 246 | "\n", 247 | "print(adjusted_rand_score(labels_true, labels_pred))" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 3, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "# silloutte score" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": 2, 262 | "metadata": {}, 263 | "outputs": [ 264 | { 265 | "name": "stdout", 266 | "output_type": "stream", 267 | "text": [ 268 | "0.5789497702625118\n" 269 | ] 270 | } 271 | ], 272 | "source": [ 273 | "from sklearn.metrics import silhouette_score\n", 274 | "X = [[1, 2], [4, 5], [2, 1], [6, 7], [2, 3]]\n", 275 | "labels = [0, 1, 0, 1, 0] \n", 276 | "sil_score = silhouette_score(X, labels)\n", 277 | "print(sil_score)" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": null, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [] 286 | } 287 | ], 288 | "metadata": { 289 | "kernelspec": { 290 | "display_name": "Python 3", 291 | "language": "python", 292 | "name": "python3" 293 | }, 294 | "language_info": { 295 | "codemirror_mode": { 296 | "name": "ipython", 297 | "version": 3 298 | }, 299 | "file_extension": ".py", 300 | "mimetype": "text/x-python", 301 | "name": "python", 302 | "nbconvert_exporter": "python", 303 | "pygments_lexer": "ipython3", 304 | "version": "3.7.6" 305 | } 306 | }, 307 | "nbformat": 4, 308 | "nbformat_minor": 4 309 | } 310 | -------------------------------------------------------------------------------- /08장_지도학습_3절_k최근접이웃.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_iris = datasets.load_iris()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐/타겟\n", 28 | "X = raw_iris.data\n", 29 | "y = raw_iris.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "#데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", 66 | " metric_params=None, n_jobs=None, n_neighbors=2, p=2,\n", 67 | " weights='uniform')" 68 | ] 69 | }, 70 | "execution_count": 5, 71 | "metadata": {}, 72 | "output_type": "execute_result" 73 | } 74 | ], 75 | "source": [ 76 | "# 학습\n", 77 | "from sklearn.neighbors import KNeighborsClassifier\n", 78 | "clf_knn = KNeighborsClassifier(n_neighbors=2)\n", 79 | "clf_knn.fit(X_tn_std, y_tn)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 8, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "name": "stdout", 89 | "output_type": "stream", 90 | "text": [ 91 | "[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n", 92 | " 2]\n" 93 | ] 94 | } 95 | ], 96 | "source": [ 97 | "# 예측\n", 98 | "knn_pred = clf_knn.predict(X_te_std)\n", 99 | "print(knn_pred)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 9, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "name": "stdout", 109 | "output_type": "stream", 110 | "text": [ 111 | "0.9473684210526315\n" 112 | ] 113 | } 114 | ], 115 | "source": [ 116 | "# 정확도\n", 117 | "from sklearn.metrics import accuracy_score\n", 118 | "accuracy = accuracy_score(y_te, knn_pred)\n", 119 | "print(accuracy)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 10, 125 | "metadata": {}, 126 | "outputs": [ 127 | { 128 | "name": "stdout", 129 | "output_type": "stream", 130 | "text": [ 131 | "[[13 0 0]\n", 132 | " [ 0 15 1]\n", 133 | " [ 0 1 8]]\n" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "# confusion matrix 확인 \n", 139 | "from sklearn.metrics import confusion_matrix\n", 140 | "conf_matrix = confusion_matrix(y_te, knn_pred)\n", 141 | "print(conf_matrix)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 22, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | " precision recall f1-score support\n", 154 | "\n", 155 | " 0 1.00 1.00 1.00 13\n", 156 | " 1 0.94 0.94 0.94 16\n", 157 | " 2 0.89 0.89 0.89 9\n", 158 | "\n", 159 | " accuracy 0.95 38\n", 160 | " macro avg 0.94 0.94 0.94 38\n", 161 | "weighted avg 0.95 0.95 0.95 38\n", 162 | "\n" 163 | ] 164 | } 165 | ], 166 | "source": [ 167 | "# 분류 레포트 확인\n", 168 | "from sklearn.metrics import classification_report\n", 169 | "class_report = classification_report(y_te, knn_pred)\n", 170 | "print(class_report)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "# 통합 코드" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 11, 183 | "metadata": {}, 184 | "outputs": [ 185 | { 186 | "name": "stdout", 187 | "output_type": "stream", 188 | "text": [ 189 | "[2 1 0 2 0 2 0 1 1 1 1 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n", 190 | " 2]\n", 191 | "0.9473684210526315\n", 192 | "[[13 0 0]\n", 193 | " [ 0 15 1]\n", 194 | " [ 0 1 8]]\n", 195 | " precision recall f1-score support\n", 196 | "\n", 197 | " 0 1.00 1.00 1.00 13\n", 198 | " 1 0.94 0.94 0.94 16\n", 199 | " 2 0.89 0.89 0.89 9\n", 200 | "\n", 201 | " accuracy 0.95 38\n", 202 | " macro avg 0.94 0.94 0.94 38\n", 203 | "weighted avg 0.95 0.95 0.95 38\n", 204 | "\n" 205 | ] 206 | } 207 | ], 208 | "source": [ 209 | "from sklearn import datasets\n", 210 | "from sklearn.preprocessing import StandardScaler\n", 211 | "from sklearn.neighbors import KNeighborsClassifier\n", 212 | "from sklearn.model_selection import train_test_split\n", 213 | "\n", 214 | "from sklearn.metrics import accuracy_score\n", 215 | "from sklearn.metrics import confusion_matrix\n", 216 | "from sklearn.metrics import classification_report\n", 217 | "\n", 218 | "# 꽃 데이터 불러오기\n", 219 | "raw_iris = datasets.load_iris()\n", 220 | "\n", 221 | "# 피쳐 / 타겟\n", 222 | "X = raw_iris.data\n", 223 | "y = raw_iris.target\n", 224 | "\n", 225 | "# 트레이닝 / 테스트 데이터 분할\n", 226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 227 | "\n", 228 | "\n", 229 | "# 표준화 스케일\n", 230 | "std_scale = StandardScaler()\n", 231 | "std_scale.fit(X_tn)\n", 232 | "X_tn_std = std_scale.transform(X_tn)\n", 233 | "X_te_std = std_scale.transform(X_te)\n", 234 | "\n", 235 | "#학습\n", 236 | "clf_knn = KNeighborsClassifier(n_neighbors=2)\n", 237 | "clf_knn.fit(X_tn_std, y_tn)\n", 238 | "\n", 239 | "# 예측\n", 240 | "knn_pred = clf_knn.predict(X_te_std)\n", 241 | "print(knn_pred)\n", 242 | "\n", 243 | "# 정확도\n", 244 | "accuracy = accuracy_score(y_te, knn_pred)\n", 245 | "print(accuracy)\n", 246 | "\n", 247 | "# confusion matrix 확인 \n", 248 | "conf_matrix = confusion_matrix(y_te, knn_pred)\n", 249 | "print(conf_matrix)\n", 250 | "\n", 251 | "# 분류 레포트 확인\n", 252 | "class_report = classification_report(y_te, knn_pred)\n", 253 | "print(class_report)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": {}, 260 | "outputs": [], 261 | "source": [] 262 | } 263 | ], 264 | "metadata": { 265 | "kernelspec": { 266 | "display_name": "Python 3", 267 | "language": "python", 268 | "name": "python3" 269 | }, 270 | "language_info": { 271 | "codemirror_mode": { 272 | "name": "ipython", 273 | "version": 3 274 | }, 275 | "file_extension": ".py", 276 | "mimetype": "text/x-python", 277 | "name": "python", 278 | "nbconvert_exporter": "python", 279 | "pygments_lexer": "ipython3", 280 | "version": "3.7.6" 281 | } 282 | }, 283 | "nbformat": 4, 284 | "nbformat_minor": 4 285 | } 286 | -------------------------------------------------------------------------------- /08장_지도학습_6절_나이브베이즈.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드 " 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "#데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "GaussianNB(priors=None, var_smoothing=1e-09)" 66 | ] 67 | }, 68 | "execution_count": 5, 69 | "metadata": {}, 70 | "output_type": "execute_result" 71 | } 72 | ], 73 | "source": [ 74 | "# 나이브 베이즈 학습\n", 75 | "from sklearn.naive_bayes import GaussianNB\n", 76 | "clf_gnb = GaussianNB()\n", 77 | "clf_gnb.fit(X_tn_std, y_tn)" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 12, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "name": "stdout", 87 | "output_type": "stream", 88 | "text": [ 89 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 90 | " 1 1 2 0 0 1 1 1]\n" 91 | ] 92 | } 93 | ], 94 | "source": [ 95 | "# 예측\n", 96 | "pred_gnb = clf_gnb.predict(X_te_std)\n", 97 | "print(pred_gnb)" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 15, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "name": "stdout", 107 | "output_type": "stream", 108 | "text": [ 109 | "0.9523809523809524\n" 110 | ] 111 | } 112 | ], 113 | "source": [ 114 | "# 리콜\n", 115 | "from sklearn.metrics import recall_score\n", 116 | "recall = recall_score(y_te, pred_gnb, average='macro')\n", 117 | "print(recall)" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 26, 123 | "metadata": {}, 124 | "outputs": [ 125 | { 126 | "name": "stdout", 127 | "output_type": "stream", 128 | "text": [ 129 | "[[16 0 0]\n", 130 | " [ 2 18 1]\n", 131 | " [ 0 0 8]]\n" 132 | ] 133 | } 134 | ], 135 | "source": [ 136 | "# confusion matrix 확인 \n", 137 | "from sklearn.metrics import confusion_matrix\n", 138 | "conf_matrix = confusion_matrix(y_te, pred_gnb)\n", 139 | "print(conf_matrix)" 140 | ] 141 | }, 142 | { 143 | "cell_type": "code", 144 | "execution_count": 27, 145 | "metadata": {}, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | " precision recall f1-score support\n", 152 | "\n", 153 | " 0 0.89 1.00 0.94 16\n", 154 | " 1 1.00 0.86 0.92 21\n", 155 | " 2 0.89 1.00 0.94 8\n", 156 | "\n", 157 | " accuracy 0.93 45\n", 158 | " macro avg 0.93 0.95 0.94 45\n", 159 | "weighted avg 0.94 0.93 0.93 45\n", 160 | "\n" 161 | ] 162 | } 163 | ], 164 | "source": [ 165 | "# 분류 레포트 확인\n", 166 | "from sklearn.metrics import classification_report\n", 167 | "class_report = classification_report(y_te, pred_gnb)\n", 168 | "print(class_report)" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "# 통합코드" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 1, 181 | "metadata": {}, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 0 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 188 | " 1 1 2 0 0 1 1 1]\n", 189 | "0.9523809523809524\n", 190 | "[[16 0 0]\n", 191 | " [ 2 18 1]\n", 192 | " [ 0 0 8]]\n", 193 | " precision recall f1-score support\n", 194 | "\n", 195 | " 0 0.89 1.00 0.94 16\n", 196 | " 1 1.00 0.86 0.92 21\n", 197 | " 2 0.89 1.00 0.94 8\n", 198 | "\n", 199 | " accuracy 0.93 45\n", 200 | " macro avg 0.93 0.95 0.94 45\n", 201 | "weighted avg 0.94 0.93 0.93 45\n", 202 | "\n" 203 | ] 204 | } 205 | ], 206 | "source": [ 207 | "from sklearn import datasets\n", 208 | "from sklearn.preprocessing import StandardScaler\n", 209 | "from sklearn.model_selection import train_test_split\n", 210 | "\n", 211 | "from sklearn.naive_bayes import GaussianNB\n", 212 | "\n", 213 | "from sklearn.metrics import recall_score\n", 214 | "from sklearn.metrics import confusion_matrix\n", 215 | "from sklearn.metrics import classification_report\n", 216 | "\n", 217 | "\n", 218 | "# 데이터 불러오기\n", 219 | "raw_wine = datasets.load_wine()\n", 220 | "\n", 221 | "# 피쳐, 타겟 데이터 지정\n", 222 | "X = raw_wine.data\n", 223 | "y = raw_wine.target\n", 224 | "\n", 225 | "# 트레이닝/테스트 데이터 분할\n", 226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 227 | "\n", 228 | "# 데이터 표준화\n", 229 | "std_scale = StandardScaler()\n", 230 | "std_scale.fit(X_tn)\n", 231 | "X_tn_std = std_scale.transform(X_tn)\n", 232 | "X_te_std = std_scale.transform(X_te)\n", 233 | "\n", 234 | "# 나이브 베이즈 학습\n", 235 | "clf_gnb = GaussianNB()\n", 236 | "clf_gnb.fit(X_tn_std, y_tn)\n", 237 | "\n", 238 | "# 예측\n", 239 | "pred_gnb = clf_gnb.predict(X_te_std)\n", 240 | "print(pred_gnb)\n", 241 | "\n", 242 | "# 리콜\n", 243 | "recall = recall_score(y_te, pred_gnb, average='macro')\n", 244 | "print(recall)\n", 245 | "\n", 246 | "# confusion matrix 확인 \n", 247 | "conf_matrix = confusion_matrix(y_te, pred_gnb)\n", 248 | "print(conf_matrix)\n", 249 | "\n", 250 | "# 분류 레포트 확인\n", 251 | "class_report = classification_report(y_te, pred_gnb)\n", 252 | "print(class_report)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": {}, 259 | "outputs": [], 260 | "source": [] 261 | } 262 | ], 263 | "metadata": { 264 | "kernelspec": { 265 | "display_name": "Python 3", 266 | "language": "python", 267 | "name": "python3" 268 | }, 269 | "language_info": { 270 | "codemirror_mode": { 271 | "name": "ipython", 272 | "version": 3 273 | }, 274 | "file_extension": ".py", 275 | "mimetype": "text/x-python", 276 | "name": "python", 277 | "nbconvert_exporter": "python", 278 | "pygments_lexer": "ipython3", 279 | "version": "3.7.6" 280 | } 281 | }, 282 | "nbformat": 4, 283 | "nbformat_minor": 4 284 | } 285 | -------------------------------------------------------------------------------- /08장_지도학습_7절_의사결정나무.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini',\n", 66 | " max_depth=None, max_features=None, max_leaf_nodes=None,\n", 67 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 68 | " min_samples_leaf=1, min_samples_split=2,\n", 69 | " min_weight_fraction_leaf=0.0, presort='deprecated',\n", 70 | " random_state=0, splitter='best')" 71 | ] 72 | }, 73 | "execution_count": 5, 74 | "metadata": {}, 75 | "output_type": "execute_result" 76 | } 77 | ], 78 | "source": [ 79 | "# 의사결정나무 학습\n", 80 | "from sklearn import tree \n", 81 | "clf_tree = tree.DecisionTreeClassifier(random_state=0)\n", 82 | "clf_tree.fit(X_tn_std, y_tn)" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 6, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "name": "stdout", 92 | "output_type": "stream", 93 | "text": [ 94 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 95 | " 1 1 2 1 0 1 1 1]\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "# 예측\n", 101 | "pred_tree = clf_tree.predict(X_te_std)\n", 102 | "print(pred_tree)" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 7, 108 | "metadata": {}, 109 | "outputs": [ 110 | { 111 | "name": "stdout", 112 | "output_type": "stream", 113 | "text": [ 114 | "0.9349141206870346\n" 115 | ] 116 | } 117 | ], 118 | "source": [ 119 | "# f1 score\n", 120 | "from sklearn.metrics import f1_score\n", 121 | "f1 = f1_score(y_te, pred_tree, average='macro')\n", 122 | "print(f1)" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 8, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "name": "stdout", 132 | "output_type": "stream", 133 | "text": [ 134 | "[[14 2 0]\n", 135 | " [ 0 20 1]\n", 136 | " [ 0 0 8]]\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "# confusion matrix 확인 \n", 142 | "from sklearn.metrics import confusion_matrix\n", 143 | "conf_matrix = confusion_matrix(y_te, pred_tree)\n", 144 | "print(conf_matrix)" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 9, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "name": "stdout", 154 | "output_type": "stream", 155 | "text": [ 156 | " precision recall f1-score support\n", 157 | "\n", 158 | " 0 1.00 0.88 0.93 16\n", 159 | " 1 0.91 0.95 0.93 21\n", 160 | " 2 0.89 1.00 0.94 8\n", 161 | "\n", 162 | " accuracy 0.93 45\n", 163 | " macro avg 0.93 0.94 0.93 45\n", 164 | "weighted avg 0.94 0.93 0.93 45\n", 165 | "\n" 166 | ] 167 | } 168 | ], 169 | "source": [ 170 | "# 분류 레포트 확인\n", 171 | "from sklearn.metrics import classification_report\n", 172 | "class_report = classification_report(y_te, pred_tree)\n", 173 | "print(class_report)" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "# 통합 코드" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 10, 186 | "metadata": {}, 187 | "outputs": [ 188 | { 189 | "name": "stdout", 190 | "output_type": "stream", 191 | "text": [ 192 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 1 0 1 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 193 | " 1 1 2 1 0 1 1 1]\n", 194 | "0.9349141206870346\n", 195 | "[[14 2 0]\n", 196 | " [ 0 20 1]\n", 197 | " [ 0 0 8]]\n", 198 | " precision recall f1-score support\n", 199 | "\n", 200 | " 0 1.00 0.88 0.93 16\n", 201 | " 1 0.91 0.95 0.93 21\n", 202 | " 2 0.89 1.00 0.94 8\n", 203 | "\n", 204 | " accuracy 0.93 45\n", 205 | " macro avg 0.93 0.94 0.93 45\n", 206 | "weighted avg 0.94 0.93 0.93 45\n", 207 | "\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "from sklearn import datasets\n", 213 | "from sklearn.preprocessing import StandardScaler\n", 214 | "from sklearn.model_selection import train_test_split\n", 215 | "\n", 216 | "from sklearn import tree \n", 217 | "\n", 218 | "from sklearn.metrics import f1_score\n", 219 | "from sklearn.metrics import confusion_matrix\n", 220 | "from sklearn.metrics import classification_report\n", 221 | "\n", 222 | "\n", 223 | "# 데이터 불러오기\n", 224 | "raw_wine = datasets.load_wine()\n", 225 | "\n", 226 | "# 피쳐, 타겟 데이터 지정\n", 227 | "X = raw_wine.data\n", 228 | "y = raw_wine.target\n", 229 | "\n", 230 | "# 트레이닝/테스트 데이터 분할\n", 231 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 232 | "\n", 233 | "# 데이터 표준화\n", 234 | "std_scale = StandardScaler()\n", 235 | "std_scale.fit(X_tn)\n", 236 | "X_tn_std = std_scale.transform(X_tn)\n", 237 | "X_te_std = std_scale.transform(X_te)\n", 238 | "\n", 239 | "# 의사결정나무 학습\n", 240 | "clf_tree = tree.DecisionTreeClassifier(random_state=0)\n", 241 | "clf_tree.fit(X_tn_std, y_tn)\n", 242 | "\n", 243 | "# 예측\n", 244 | "pred_tree = clf_tree.predict(X_te_std)\n", 245 | "print(pred_tree)\n", 246 | "\n", 247 | "# f1 score\n", 248 | "from sklearn.metrics import f1_score\n", 249 | "f1 = f1_score(y_te, pred_tree, average='macro')\n", 250 | "print(f1)\n", 251 | "\n", 252 | "# confusion matrix 확인 \n", 253 | "conf_matrix = confusion_matrix(y_te, pred_tree)\n", 254 | "print(conf_matrix)\n", 255 | "\n", 256 | "# 분류 레포트 확인\n", 257 | "class_report = classification_report(y_te, pred_tree)\n", 258 | "print(class_report)" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": null, 264 | "metadata": {}, 265 | "outputs": [], 266 | "source": [] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": null, 271 | "metadata": {}, 272 | "outputs": [], 273 | "source": [] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": null, 278 | "metadata": {}, 279 | "outputs": [], 280 | "source": [] 281 | } 282 | ], 283 | "metadata": { 284 | "kernelspec": { 285 | "display_name": "Python 3", 286 | "language": "python", 287 | "name": "python3" 288 | }, 289 | "language_info": { 290 | "codemirror_mode": { 291 | "name": "ipython", 292 | "version": 3 293 | }, 294 | "file_extension": ".py", 295 | "mimetype": "text/x-python", 296 | "name": "python", 297 | "nbconvert_exporter": "python", 298 | "pygments_lexer": "ipython3", 299 | "version": "3.7.6" 300 | } 301 | }, 302 | "nbformat": 4, 303 | "nbformat_minor": 4 304 | } 305 | -------------------------------------------------------------------------------- /08장_지도학습_8절_서포트벡터머신.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별코드 " 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,\n", 66 | " decision_function_shape='ovr', degree=3, gamma='scale', kernel='linear',\n", 67 | " max_iter=-1, probability=False, random_state=0, shrinking=True, tol=0.001,\n", 68 | " verbose=False)" 69 | ] 70 | }, 71 | "execution_count": 5, 72 | "metadata": {}, 73 | "output_type": "execute_result" 74 | } 75 | ], 76 | "source": [ 77 | "# 서포트벡터머신 학습\n", 78 | "from sklearn import svm \n", 79 | "clf_svm_lr = svm.SVC(kernel='linear', random_state=0)\n", 80 | "clf_svm_lr.fit(X_tn_std, y_tn)" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 6, 86 | "metadata": {}, 87 | "outputs": [ 88 | { 89 | "name": "stdout", 90 | "output_type": "stream", 91 | "text": [ 92 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 93 | " 1 1 2 0 0 1 1 1]\n" 94 | ] 95 | } 96 | ], 97 | "source": [ 98 | "# 예측\n", 99 | "pred_svm = clf_svm_lr.predict(X_te_std)\n", 100 | "print(pred_svm)" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 7, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "name": "stdout", 110 | "output_type": "stream", 111 | "text": [ 112 | "1.0\n" 113 | ] 114 | } 115 | ], 116 | "source": [ 117 | "# 정확도\n", 118 | "from sklearn.metrics import accuracy_score\n", 119 | "accuracy = accuracy_score(y_te, pred_svm)\n", 120 | "print(accuracy)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 8, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "[[16 0 0]\n", 133 | " [ 0 21 0]\n", 134 | " [ 0 0 8]]\n" 135 | ] 136 | } 137 | ], 138 | "source": [ 139 | "# confusion matrix 확인 \n", 140 | "from sklearn.metrics import confusion_matrix\n", 141 | "conf_matrix = confusion_matrix(y_te, pred_svm)\n", 142 | "print(conf_matrix)" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 9, 148 | "metadata": {}, 149 | "outputs": [ 150 | { 151 | "name": "stdout", 152 | "output_type": "stream", 153 | "text": [ 154 | " precision recall f1-score support\n", 155 | "\n", 156 | " 0 1.00 1.00 1.00 16\n", 157 | " 1 1.00 1.00 1.00 21\n", 158 | " 2 1.00 1.00 1.00 8\n", 159 | "\n", 160 | " accuracy 1.00 45\n", 161 | " macro avg 1.00 1.00 1.00 45\n", 162 | "weighted avg 1.00 1.00 1.00 45\n", 163 | "\n" 164 | ] 165 | } 166 | ], 167 | "source": [ 168 | "# 분류 레포트 확인\n", 169 | "from sklearn.metrics import classification_report\n", 170 | "class_report = classification_report(y_te, pred_svm)\n", 171 | "print(class_report)" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "# 통합코드" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 1, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "name": "stdout", 188 | "output_type": "stream", 189 | "text": [ 190 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 1 0 1 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 191 | " 1 1 2 0 0 1 1 1]\n", 192 | "1.0\n", 193 | "[[16 0 0]\n", 194 | " [ 0 21 0]\n", 195 | " [ 0 0 8]]\n", 196 | " precision recall f1-score support\n", 197 | "\n", 198 | " 0 1.00 1.00 1.00 16\n", 199 | " 1 1.00 1.00 1.00 21\n", 200 | " 2 1.00 1.00 1.00 8\n", 201 | "\n", 202 | " accuracy 1.00 45\n", 203 | " macro avg 1.00 1.00 1.00 45\n", 204 | "weighted avg 1.00 1.00 1.00 45\n", 205 | "\n" 206 | ] 207 | } 208 | ], 209 | "source": [ 210 | "from sklearn import datasets\n", 211 | "from sklearn.preprocessing import StandardScaler\n", 212 | "from sklearn.model_selection import train_test_split\n", 213 | "\n", 214 | "from sklearn import svm \n", 215 | "\n", 216 | "from sklearn.metrics import accuracy_score\n", 217 | "from sklearn.metrics import confusion_matrix\n", 218 | "from sklearn.metrics import classification_report\n", 219 | "\n", 220 | "# 데이터 불러오기\n", 221 | "raw_wine = datasets.load_wine()\n", 222 | "\n", 223 | "# 피쳐, 타겟 데이터 지정\n", 224 | "X = raw_wine.data\n", 225 | "y = raw_wine.target\n", 226 | "\n", 227 | "# 트레이닝/테스트 데이터 분할\n", 228 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 229 | "\n", 230 | "# 데이터 표준화\n", 231 | "std_scale = StandardScaler()\n", 232 | "std_scale.fit(X_tn)\n", 233 | "X_tn_std = std_scale.transform(X_tn)\n", 234 | "X_te_std = std_scale.transform(X_te)\n", 235 | "\n", 236 | "# 서포트벡터머신 학습\n", 237 | "clf_svm_lr = svm.SVC(kernel='linear', random_state=0)\n", 238 | "clf_svm_lr.fit(X_tn_std, y_tn)\n", 239 | "\n", 240 | "# 예측\n", 241 | "pred_svm = clf_svm_lr.predict(X_te_std)\n", 242 | "print(pred_svm)\n", 243 | "\n", 244 | "# 정확도\n", 245 | "accuracy = accuracy_score(y_te, pred_svm)\n", 246 | "print(accuracy)\n", 247 | "\n", 248 | "# confusion matrix 확인 \n", 249 | "conf_matrix = confusion_matrix(y_te, pred_svm)\n", 250 | "print(conf_matrix)\n", 251 | "\n", 252 | "# 분류 레포트 확인\n", 253 | "class_report = classification_report(y_te, pred_svm)\n", 254 | "print(class_report)" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": null, 260 | "metadata": {}, 261 | "outputs": [], 262 | "source": [] 263 | } 264 | ], 265 | "metadata": { 266 | "kernelspec": { 267 | "display_name": "Python 3", 268 | "language": "python", 269 | "name": "python3" 270 | }, 271 | "language_info": { 272 | "codemirror_mode": { 273 | "name": "ipython", 274 | "version": 3 275 | }, 276 | "file_extension": ".py", 277 | "mimetype": "text/x-python", 278 | "name": "python", 279 | "nbconvert_exporter": "python", 280 | "pygments_lexer": "ipython3", 281 | "version": "3.7.6" 282 | } 283 | }, 284 | "nbformat": 4, 285 | "nbformat_minor": 4 286 | } 287 | -------------------------------------------------------------------------------- /09장_앙상블_2절_보팅.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_iris = datasets.load_iris()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_iris.data\n", 29 | "y = raw_iris.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "VotingClassifier(estimators=[('lr',\n", 66 | " LogisticRegression(C=1.0, class_weight=None,\n", 67 | " dual=False, fit_intercept=True,\n", 68 | " intercept_scaling=1,\n", 69 | " l1_ratio=None, max_iter=100,\n", 70 | " multi_class='multinomial',\n", 71 | " n_jobs=None, penalty='l2',\n", 72 | " random_state=1, solver='lbfgs',\n", 73 | " tol=0.0001, verbose=0,\n", 74 | " warm_start=False)),\n", 75 | " ('svm',\n", 76 | " SVC(C=1.0, break_ties=False, cache_size=200,\n", 77 | " class_weight=None, coef0=0.0,\n", 78 | " decision_function_shape='ovr', degree=3,\n", 79 | " gamma='scale', kernel='linear', max_iter=-1,\n", 80 | " probability=False, random_state=1,\n", 81 | " shrinking=True, tol=0.001, verbose=False)),\n", 82 | " ('gnb',\n", 83 | " GaussianNB(priors=None, var_smoothing=1e-09))],\n", 84 | " flatten_transform=True, n_jobs=None, voting='hard',\n", 85 | " weights=[1, 1, 1])" 86 | ] 87 | }, 88 | "execution_count": 5, 89 | "metadata": {}, 90 | "output_type": "execute_result" 91 | } 92 | ], 93 | "source": [ 94 | "# 보팅 학습\n", 95 | "from sklearn.linear_model import LogisticRegression\n", 96 | "from sklearn import svm\n", 97 | "from sklearn.naive_bayes import GaussianNB\n", 98 | "from sklearn.ensemble import VotingClassifier\n", 99 | "\n", 100 | "clf1 = LogisticRegression(multi_class='multinomial', \n", 101 | " random_state=1)\n", 102 | "clf2 = svm.SVC(kernel='linear', \n", 103 | " random_state=1) \n", 104 | "clf3 = GaussianNB()\n", 105 | "\n", 106 | "clf_voting = VotingClassifier(\n", 107 | " estimators=[\n", 108 | " ('lr', clf1), \n", 109 | " ('svm', clf2), \n", 110 | " ('gnb', clf3)\n", 111 | " ],\n", 112 | " voting='hard',\n", 113 | " weights=[1,1,1])\n", 114 | "clf_voting.fit(X_tn_std, y_tn)" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 6, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "name": "stdout", 124 | "output_type": "stream", 125 | "text": [ 126 | "[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n", 127 | " 2]\n" 128 | ] 129 | } 130 | ], 131 | "source": [ 132 | "# 예측\n", 133 | "pred_voting = clf_voting.predict(X_te_std)\n", 134 | "print(pred_voting)" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 7, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "name": "stdout", 144 | "output_type": "stream", 145 | "text": [ 146 | "0.9736842105263158\n" 147 | ] 148 | } 149 | ], 150 | "source": [ 151 | "# 정확도\n", 152 | "from sklearn.metrics import accuracy_score\n", 153 | "accuracy = accuracy_score(y_te, pred_voting)\n", 154 | "print(accuracy)" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 8, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "name": "stdout", 164 | "output_type": "stream", 165 | "text": [ 166 | "[[13 0 0]\n", 167 | " [ 0 15 1]\n", 168 | " [ 0 0 9]]\n" 169 | ] 170 | } 171 | ], 172 | "source": [ 173 | "# confusion matrix 확인 \n", 174 | "from sklearn.metrics import confusion_matrix\n", 175 | "conf_matrix = confusion_matrix(y_te, pred_voting)\n", 176 | "print(conf_matrix)" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 9, 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "name": "stdout", 186 | "output_type": "stream", 187 | "text": [ 188 | " precision recall f1-score support\n", 189 | "\n", 190 | " 0 1.00 1.00 1.00 13\n", 191 | " 1 1.00 0.94 0.97 16\n", 192 | " 2 0.90 1.00 0.95 9\n", 193 | "\n", 194 | " accuracy 0.97 38\n", 195 | " macro avg 0.97 0.98 0.97 38\n", 196 | "weighted avg 0.98 0.97 0.97 38\n", 197 | "\n" 198 | ] 199 | } 200 | ], 201 | "source": [ 202 | "# 분류 레포트 확인\n", 203 | "from sklearn.metrics import classification_report\n", 204 | "class_report = classification_report(y_te, pred_voting)\n", 205 | "print(class_report)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "# 통합 코드" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 11, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "name": "stdout", 222 | "output_type": "stream", 223 | "text": [ 224 | "[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0\n", 225 | " 2]\n", 226 | "0.9736842105263158\n", 227 | "[[13 0 0]\n", 228 | " [ 0 15 1]\n", 229 | " [ 0 0 9]]\n", 230 | " precision recall f1-score support\n", 231 | "\n", 232 | " 0 1.00 1.00 1.00 13\n", 233 | " 1 1.00 0.94 0.97 16\n", 234 | " 2 0.90 1.00 0.95 9\n", 235 | "\n", 236 | " accuracy 0.97 38\n", 237 | " macro avg 0.97 0.98 0.97 38\n", 238 | "weighted avg 0.98 0.97 0.97 38\n", 239 | "\n" 240 | ] 241 | } 242 | ], 243 | "source": [ 244 | "from sklearn import datasets\n", 245 | "from sklearn.model_selection import train_test_split\n", 246 | "from sklearn.preprocessing import StandardScaler\n", 247 | "\n", 248 | "from sklearn.linear_model import LogisticRegression\n", 249 | "from sklearn import svm\n", 250 | "from sklearn.naive_bayes import GaussianNB\n", 251 | "from sklearn.ensemble import VotingClassifier\n", 252 | "\n", 253 | "from sklearn.metrics import accuracy_score\n", 254 | "from sklearn.metrics import confusion_matrix\n", 255 | "from sklearn.metrics import classification_report\n", 256 | "\n", 257 | "# 데이터 불러오기\n", 258 | "raw_iris = datasets.load_iris()\n", 259 | "\n", 260 | "# 피쳐, 타겟 데이터 지정\n", 261 | "X = raw_iris.data\n", 262 | "y = raw_iris.target\n", 263 | "\n", 264 | "# 트레이닝/테스트 데이터 분할\n", 265 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 266 | "\n", 267 | "# 데이터 표준화\n", 268 | "std_scale = StandardScaler()\n", 269 | "std_scale.fit(X_tn)\n", 270 | "X_tn_std = std_scale.transform(X_tn)\n", 271 | "X_te_std = std_scale.transform(X_te)\n", 272 | "\n", 273 | "# 보팅 학습\n", 274 | "clf1 = LogisticRegression(multi_class='multinomial', \n", 275 | " random_state=1)\n", 276 | "clf2 = svm.SVC(kernel='linear', \n", 277 | " random_state=1) \n", 278 | "clf3 = GaussianNB()\n", 279 | "\n", 280 | "clf_voting = VotingClassifier(\n", 281 | " estimators=[\n", 282 | " ('lr', clf1), \n", 283 | " ('svm', clf2), \n", 284 | " ('gnb', clf3)\n", 285 | " ],\n", 286 | " voting='hard',\n", 287 | " weights=[1,1,1])\n", 288 | "clf_voting.fit(X_tn_std, y_tn)\n", 289 | "\n", 290 | "# 예측\n", 291 | "pred_voting = clf_voting.predict(X_te_std)\n", 292 | "print(pred_voting)\n", 293 | "\n", 294 | "# 정확도\n", 295 | "accuracy = accuracy_score(y_te, pred_voting)\n", 296 | "print(accuracy)\n", 297 | "\n", 298 | "# confusion matrix 확인 \n", 299 | "conf_matrix = confusion_matrix(y_te, pred_voting)\n", 300 | "print(conf_matrix)\n", 301 | "\n", 302 | "# 분류 레포트 확인\n", 303 | "class_report = classification_report(y_te, pred_voting)\n", 304 | "print(class_report)" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [] 313 | }, 314 | { 315 | "cell_type": "code", 316 | "execution_count": null, 317 | "metadata": {}, 318 | "outputs": [], 319 | "source": [] 320 | } 321 | ], 322 | "metadata": { 323 | "kernelspec": { 324 | "display_name": "Python 3", 325 | "language": "python", 326 | "name": "python3" 327 | }, 328 | "language_info": { 329 | "codemirror_mode": { 330 | "name": "ipython", 331 | "version": 3 332 | }, 333 | "file_extension": ".py", 334 | "mimetype": "text/x-python", 335 | "name": "python", 336 | "nbconvert_exporter": "python", 337 | "pygments_lexer": "ipython3", 338 | "version": "3.7.6" 339 | } 340 | }, 341 | "nbformat": 4, 342 | "nbformat_minor": 4 343 | } 344 | -------------------------------------------------------------------------------- /09장_앙상블_3절_1_랜덤포레스트.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 2, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 3, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 4, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 5, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 6, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,\n", 66 | " criterion='gini', max_depth=2, max_features='auto',\n", 67 | " max_leaf_nodes=None, max_samples=None,\n", 68 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 69 | " min_samples_leaf=1, min_samples_split=2,\n", 70 | " min_weight_fraction_leaf=0.0, n_estimators=100,\n", 71 | " n_jobs=None, oob_score=False, random_state=0, verbose=0,\n", 72 | " warm_start=False)" 73 | ] 74 | }, 75 | "execution_count": 6, 76 | "metadata": {}, 77 | "output_type": "execute_result" 78 | } 79 | ], 80 | "source": [ 81 | "from sklearn.ensemble import RandomForestClassifier\n", 82 | "clf_rf = RandomForestClassifier(max_depth=2, \n", 83 | " random_state=0)\n", 84 | "clf_rf.fit(X_tn_std, y_tn)" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 7, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "name": "stdout", 94 | "output_type": "stream", 95 | "text": [ 96 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 97 | " 1 1 2 0 0 1 1 1]\n" 98 | ] 99 | } 100 | ], 101 | "source": [ 102 | "# 예측\n", 103 | "pred_rf = clf_rf.predict(X_te_std)\n", 104 | "print(pred_rf)" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 8, 110 | "metadata": {}, 111 | "outputs": [ 112 | { 113 | "name": "stdout", 114 | "output_type": "stream", 115 | "text": [ 116 | "0.9555555555555556\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "# 정확도\n", 122 | "from sklearn.metrics import accuracy_score\n", 123 | "accuracy = accuracy_score(y_te, pred_rf)\n", 124 | "print(accuracy)" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 9, 130 | "metadata": {}, 131 | "outputs": [ 132 | { 133 | "name": "stdout", 134 | "output_type": "stream", 135 | "text": [ 136 | "[[16 0 0]\n", 137 | " [ 1 19 1]\n", 138 | " [ 0 0 8]]\n" 139 | ] 140 | } 141 | ], 142 | "source": [ 143 | "# confusion matrix 확인 \n", 144 | "from sklearn.metrics import confusion_matrix\n", 145 | "conf_matrix = confusion_matrix(y_te, pred_rf)\n", 146 | "print(conf_matrix)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 10, 152 | "metadata": {}, 153 | "outputs": [ 154 | { 155 | "name": "stdout", 156 | "output_type": "stream", 157 | "text": [ 158 | " precision recall f1-score support\n", 159 | "\n", 160 | " 0 0.94 1.00 0.97 16\n", 161 | " 1 1.00 0.90 0.95 21\n", 162 | " 2 0.89 1.00 0.94 8\n", 163 | "\n", 164 | " accuracy 0.96 45\n", 165 | " macro avg 0.94 0.97 0.95 45\n", 166 | "weighted avg 0.96 0.96 0.96 45\n", 167 | "\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "# 분류 레포트 확인\n", 173 | "from sklearn.metrics import classification_report\n", 174 | "class_report = classification_report(y_te, pred_rf)\n", 175 | "print(class_report)" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "# 통합 코드" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 11, 188 | "metadata": {}, 189 | "outputs": [ 190 | { 191 | "name": "stdout", 192 | "output_type": "stream", 193 | "text": [ 194 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 195 | " 1 1 2 0 0 1 1 1]\n", 196 | "0.9555555555555556\n", 197 | "[[16 0 0]\n", 198 | " [ 1 19 1]\n", 199 | " [ 0 0 8]]\n", 200 | " precision recall f1-score support\n", 201 | "\n", 202 | " 0 0.94 1.00 0.97 16\n", 203 | " 1 1.00 0.90 0.95 21\n", 204 | " 2 0.89 1.00 0.94 8\n", 205 | "\n", 206 | " accuracy 0.96 45\n", 207 | " macro avg 0.94 0.97 0.95 45\n", 208 | "weighted avg 0.96 0.96 0.96 45\n", 209 | "\n" 210 | ] 211 | } 212 | ], 213 | "source": [ 214 | "from sklearn import datasets\n", 215 | "from sklearn.model_selection import train_test_split\n", 216 | "from sklearn.preprocessing import StandardScaler\n", 217 | "\n", 218 | "from sklearn.ensemble import RandomForestClassifier\n", 219 | "\n", 220 | "from sklearn.metrics import accuracy_score\n", 221 | "from sklearn.metrics import confusion_matrix\n", 222 | "from sklearn.metrics import classification_report\n", 223 | "\n", 224 | "# 데이터 불러오기\n", 225 | "raw_wine = datasets.load_wine()\n", 226 | "\n", 227 | "# 피쳐, 타겟 데이터 지정\n", 228 | "X = raw_wine.data\n", 229 | "y = raw_wine.target\n", 230 | "\n", 231 | "# 트레이닝/테스트 데이터 분할\n", 232 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 233 | "\n", 234 | "# 데이터 표준화\n", 235 | "std_scale = StandardScaler()\n", 236 | "std_scale.fit(X_tn)\n", 237 | "X_tn_std = std_scale.transform(X_tn)\n", 238 | "X_te_std = std_scale.transform(X_te)\n", 239 | "\n", 240 | "# 랜덤포레스트 학습\n", 241 | "clf_rf = RandomForestClassifier(max_depth=2, \n", 242 | " random_state=0)\n", 243 | "clf_rf.fit(X_tn_std, y_tn)\n", 244 | "\n", 245 | "# 예측\n", 246 | "pred_rf = clf_rf.predict(X_te_std)\n", 247 | "print(pred_rf)\n", 248 | "\n", 249 | "# 정확도\n", 250 | "accuracy = accuracy_score(y_te, pred_rf)\n", 251 | "print(accuracy)\n", 252 | "\n", 253 | "# confusion matrix 확인 \n", 254 | "conf_matrix = confusion_matrix(y_te, pred_rf)\n", 255 | "print(conf_matrix)\n", 256 | "\n", 257 | "# 분류 레포트 확인\n", 258 | "class_report = classification_report(y_te, pred_rf)\n", 259 | "print(class_report)" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [] 268 | } 269 | ], 270 | "metadata": { 271 | "kernelspec": { 272 | "display_name": "Python 3", 273 | "language": "python", 274 | "name": "python3" 275 | }, 276 | "language_info": { 277 | "codemirror_mode": { 278 | "name": "ipython", 279 | "version": 3 280 | }, 281 | "file_extension": ".py", 282 | "mimetype": "text/x-python", 283 | "name": "python", 284 | "nbconvert_exporter": "python", 285 | "pygments_lexer": "ipython3", 286 | "version": "3.7.6" 287 | } 288 | }, 289 | "nbformat": 4, 290 | "nbformat_minor": 4 291 | } 292 | -------------------------------------------------------------------------------- /09장_앙상블_3절_2_배깅.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_wine = datasets.load_wine()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_wine.data\n", 29 | "y = raw_wine.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 16, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "BaggingClassifier(base_estimator=GaussianNB(priors=None, var_smoothing=1e-09),\n", 66 | " bootstrap=True, bootstrap_features=False, max_features=1.0,\n", 67 | " max_samples=1.0, n_estimators=10, n_jobs=None,\n", 68 | " oob_score=False, random_state=0, verbose=0, warm_start=False)" 69 | ] 70 | }, 71 | "execution_count": 16, 72 | "metadata": {}, 73 | "output_type": "execute_result" 74 | } 75 | ], 76 | "source": [ 77 | "# 배깅 학습\n", 78 | "from sklearn.naive_bayes import GaussianNB\n", 79 | "from sklearn.ensemble import BaggingClassifier\n", 80 | "clf_bagging = BaggingClassifier(base_estimator=GaussianNB(),\n", 81 | " n_estimators=10, \n", 82 | " random_state=0)\n", 83 | "clf_bagging.fit(X_tn_std, y_tn)" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 17, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "name": "stdout", 93 | "output_type": "stream", 94 | "text": [ 95 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 96 | " 1 1 2 0 0 1 1 1]\n" 97 | ] 98 | } 99 | ], 100 | "source": [ 101 | "# 예측\n", 102 | "pred_bagging = clf_bagging.predict(X_te_std)\n", 103 | "print(pred_bagging)" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 18, 109 | "metadata": {}, 110 | "outputs": [ 111 | { 112 | "name": "stdout", 113 | "output_type": "stream", 114 | "text": [ 115 | "0.9555555555555556\n" 116 | ] 117 | } 118 | ], 119 | "source": [ 120 | "# 정확도\n", 121 | "from sklearn.metrics import accuracy_score\n", 122 | "accuracy = accuracy_score(y_te, pred_bagging)\n", 123 | "print(accuracy)" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 19, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "[[16 0 0]\n", 136 | " [ 1 19 1]\n", 137 | " [ 0 0 8]]\n" 138 | ] 139 | } 140 | ], 141 | "source": [ 142 | "# confusion matrix 확인 \n", 143 | "from sklearn.metrics import confusion_matrix\n", 144 | "conf_matrix = confusion_matrix(y_te, pred_bagging)\n", 145 | "print(conf_matrix)" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 20, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | " precision recall f1-score support\n", 158 | "\n", 159 | " 0 0.94 1.00 0.97 16\n", 160 | " 1 1.00 0.90 0.95 21\n", 161 | " 2 0.89 1.00 0.94 8\n", 162 | "\n", 163 | " accuracy 0.96 45\n", 164 | " macro avg 0.94 0.97 0.95 45\n", 165 | "weighted avg 0.96 0.96 0.96 45\n", 166 | "\n" 167 | ] 168 | } 169 | ], 170 | "source": [ 171 | "# 분류 레포트 확인\n", 172 | "from sklearn.metrics import classification_report\n", 173 | "class_report = classification_report(y_te, pred_bagging)\n", 174 | "print(class_report)" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "# 통합 코드" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 21, 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "name": "stdout", 191 | "output_type": "stream", 192 | "text": [ 193 | "[0 2 1 0 1 1 0 2 1 1 2 2 0 1 2 1 0 0 2 0 0 0 0 1 1 1 1 1 1 2 0 0 1 0 0 0 2\n", 194 | " 1 1 2 0 0 1 1 1]\n", 195 | "0.9555555555555556\n", 196 | "[[16 0 0]\n", 197 | " [ 1 19 1]\n", 198 | " [ 0 0 8]]\n", 199 | " precision recall f1-score support\n", 200 | "\n", 201 | " 0 0.94 1.00 0.97 16\n", 202 | " 1 1.00 0.90 0.95 21\n", 203 | " 2 0.89 1.00 0.94 8\n", 204 | "\n", 205 | " accuracy 0.96 45\n", 206 | " macro avg 0.94 0.97 0.95 45\n", 207 | "weighted avg 0.96 0.96 0.96 45\n", 208 | "\n" 209 | ] 210 | } 211 | ], 212 | "source": [ 213 | "from sklearn import datasets\n", 214 | "from sklearn.model_selection import train_test_split\n", 215 | "from sklearn.preprocessing import StandardScaler\n", 216 | "\n", 217 | "from sklearn.naive_bayes import GaussianNB\n", 218 | "from sklearn.ensemble import BaggingClassifier\n", 219 | "\n", 220 | "from sklearn.metrics import accuracy_score\n", 221 | "from sklearn.metrics import confusion_matrix\n", 222 | "from sklearn.metrics import classification_report\n", 223 | "\n", 224 | "# 데이터 불러오기\n", 225 | "raw_wine = datasets.load_wine()\n", 226 | "\n", 227 | "# 피쳐, 타겟 데이터 지정\n", 228 | "X = raw_wine.data\n", 229 | "y = raw_wine.target\n", 230 | "\n", 231 | "# 트레이닝/테스트 데이터 분할\n", 232 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 233 | "\n", 234 | "# 데이터 표준화\n", 235 | "std_scale = StandardScaler()\n", 236 | "std_scale.fit(X_tn)\n", 237 | "X_tn_std = std_scale.transform(X_tn)\n", 238 | "X_te_std = std_scale.transform(X_te)\n", 239 | "\n", 240 | "# 배깅 학습\n", 241 | "clf_bagging = BaggingClassifier(base_estimator=GaussianNB(),\n", 242 | " n_estimators=10, \n", 243 | " random_state=0)\n", 244 | "clf_bagging.fit(X_tn_std, y_tn)\n", 245 | "\n", 246 | "# 예측\n", 247 | "pred_bagging = clf_bagging.predict(X_te_std)\n", 248 | "print(pred_bagging)\n", 249 | "\n", 250 | "# 정확도\n", 251 | "accuracy = accuracy_score(y_te, pred_bagging)\n", 252 | "print(accuracy)\n", 253 | "\n", 254 | "# confusion matrix 확인 \n", 255 | "conf_matrix = confusion_matrix(y_te, pred_bagging)\n", 256 | "print(conf_matrix)\n", 257 | "\n", 258 | "# 분류 레포트 확인\n", 259 | "class_report = classification_report(y_te, pred_bagging)\n", 260 | "print(class_report)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [] 269 | } 270 | ], 271 | "metadata": { 272 | "kernelspec": { 273 | "display_name": "Python 3", 274 | "language": "python", 275 | "name": "python3" 276 | }, 277 | "language_info": { 278 | "codemirror_mode": { 279 | "name": "ipython", 280 | "version": 3 281 | }, 282 | "file_extension": ".py", 283 | "mimetype": "text/x-python", 284 | "name": "python", 285 | "nbconvert_exporter": "python", 286 | "pygments_lexer": "ipython3", 287 | "version": "3.7.6" 288 | } 289 | }, 290 | "nbformat": 4, 291 | "nbformat_minor": 4 292 | } 293 | -------------------------------------------------------------------------------- /09장_앙상블_4절_1_adaboost.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 11, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_breast_cancer = datasets.load_breast_cancer()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 12, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_breast_cancer.data\n", 29 | "y = raw_breast_cancer.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 13, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 14, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 15, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "AdaBoostClassifier(algorithm='SAMME.R', base_estimator=None, learning_rate=1.0,\n", 66 | " n_estimators=50, random_state=0)" 67 | ] 68 | }, 69 | "execution_count": 15, 70 | "metadata": {}, 71 | "output_type": "execute_result" 72 | } 73 | ], 74 | "source": [ 75 | "# 에이다 부스트 학습\n", 76 | "from sklearn.ensemble import AdaBoostClassifier\n", 77 | "clf_ada = AdaBoostClassifier(random_state=0)\n", 78 | "clf_ada.fit(X_tn_std, y_tn)" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 16, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "name": "stdout", 88 | "output_type": "stream", 89 | "text": [ 90 | "[0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 91 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n", 92 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n", 93 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0]\n" 94 | ] 95 | } 96 | ], 97 | "source": [ 98 | "# 예측\n", 99 | "pred_ada = clf_ada.predict(X_te_std)\n", 100 | "print(pred_ada)" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 17, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "name": "stdout", 110 | "output_type": "stream", 111 | "text": [ 112 | "0.9790209790209791\n" 113 | ] 114 | } 115 | ], 116 | "source": [ 117 | "# 정확도\n", 118 | "from sklearn.metrics import accuracy_score\n", 119 | "accuracy = accuracy_score(y_te, pred_ada)\n", 120 | "print(accuracy)" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 18, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "[[52 1]\n", 133 | " [ 2 88]]\n" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "# confusion matrix 확인 \n", 139 | "from sklearn.metrics import confusion_matrix\n", 140 | "conf_matrix = confusion_matrix(y_te, pred_ada)\n", 141 | "print(conf_matrix)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 19, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | " precision recall f1-score support\n", 154 | "\n", 155 | " 0 0.96 0.98 0.97 53\n", 156 | " 1 0.99 0.98 0.98 90\n", 157 | "\n", 158 | " accuracy 0.98 143\n", 159 | " macro avg 0.98 0.98 0.98 143\n", 160 | "weighted avg 0.98 0.98 0.98 143\n", 161 | "\n" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "# 분류 레포트 확인\n", 167 | "from sklearn.metrics import classification_report\n", 168 | "class_report = classification_report(y_te, pred_ada)\n", 169 | "print(class_report)" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "# 통합 코드 " 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 20, 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "name": "stdout", 186 | "output_type": "stream", 187 | "text": [ 188 | "[0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 189 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n", 190 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n", 191 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 0 1 1 1 0]\n", 192 | "0.9790209790209791\n", 193 | "[[52 1]\n", 194 | " [ 2 88]]\n", 195 | " precision recall f1-score support\n", 196 | "\n", 197 | " 0 0.96 0.98 0.97 53\n", 198 | " 1 0.99 0.98 0.98 90\n", 199 | "\n", 200 | " accuracy 0.98 143\n", 201 | " macro avg 0.98 0.98 0.98 143\n", 202 | "weighted avg 0.98 0.98 0.98 143\n", 203 | "\n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "from sklearn import datasets\n", 209 | "from sklearn.model_selection import train_test_split\n", 210 | "from sklearn.preprocessing import StandardScaler\n", 211 | "\n", 212 | "from sklearn.ensemble import AdaBoostClassifier\n", 213 | "\n", 214 | "from sklearn.metrics import accuracy_score\n", 215 | "from sklearn.metrics import confusion_matrix\n", 216 | "from sklearn.metrics import classification_report\n", 217 | "\n", 218 | "# 데이터 불러오기\n", 219 | "raw_breast_cancer = datasets.load_breast_cancer()\n", 220 | "\n", 221 | "# 피쳐, 타겟 데이터 지정\n", 222 | "X = raw_breast_cancer.data\n", 223 | "y = raw_breast_cancer.target\n", 224 | "\n", 225 | "# 트레이닝/테스트 데이터 분할\n", 226 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 227 | "\n", 228 | "# 데이터 표준화\n", 229 | "std_scale = StandardScaler()\n", 230 | "std_scale.fit(X_tn)\n", 231 | "X_tn_std = std_scale.transform(X_tn)\n", 232 | "X_te_std = std_scale.transform(X_te)\n", 233 | "\n", 234 | "# 에이다 부스트 학습\n", 235 | "clf_ada = AdaBoostClassifier(random_state=0)\n", 236 | "clf_ada.fit(X_tn_std, y_tn)\n", 237 | "\n", 238 | "# 예측\n", 239 | "pred_ada = clf_ada.predict(X_te_std)\n", 240 | "print(pred_ada)\n", 241 | "\n", 242 | "# 정확도\n", 243 | "accuracy = accuracy_score(y_te, pred_ada)\n", 244 | "print(accuracy)\n", 245 | "\n", 246 | "# confusion matrix 확인 \n", 247 | "conf_matrix = confusion_matrix(y_te, pred_ada)\n", 248 | "print(conf_matrix)\n", 249 | "\n", 250 | "# 분류 레포트 확인\n", 251 | "class_report = classification_report(y_te, pred_ada)\n", 252 | "print(class_report)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": {}, 259 | "outputs": [], 260 | "source": [] 261 | } 262 | ], 263 | "metadata": { 264 | "kernelspec": { 265 | "display_name": "Python 3", 266 | "language": "python", 267 | "name": "python3" 268 | }, 269 | "language_info": { 270 | "codemirror_mode": { 271 | "name": "ipython", 272 | "version": 3 273 | }, 274 | "file_extension": ".py", 275 | "mimetype": "text/x-python", 276 | "name": "python", 277 | "nbconvert_exporter": "python", 278 | "pygments_lexer": "ipython3", 279 | "version": "3.7.6" 280 | } 281 | }, 282 | "nbformat": 4, 283 | "nbformat_minor": 4 284 | } 285 | -------------------------------------------------------------------------------- /09장_앙상블_4절_2_gradient_boost.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별 코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_breast_cancer = datasets.load_breast_cancer()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_breast_cancer.data\n", 29 | "y = raw_breast_cancer.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "GradientBoostingClassifier(ccp_alpha=0.0, criterion='friedman_mse', init=None,\n", 66 | " learning_rate=0.01, loss='deviance', max_depth=2,\n", 67 | " max_features=None, max_leaf_nodes=None,\n", 68 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 69 | " min_samples_leaf=1, min_samples_split=2,\n", 70 | " min_weight_fraction_leaf=0.0, n_estimators=100,\n", 71 | " n_iter_no_change=None, presort='deprecated',\n", 72 | " random_state=0, subsample=1.0, tol=0.0001,\n", 73 | " validation_fraction=0.1, verbose=0,\n", 74 | " warm_start=False)" 75 | ] 76 | }, 77 | "execution_count": 5, 78 | "metadata": {}, 79 | "output_type": "execute_result" 80 | } 81 | ], 82 | "source": [ 83 | "# Gradient Boosting 학습\n", 84 | "from sklearn.ensemble import GradientBoostingClassifier\n", 85 | "clf_gbt = GradientBoostingClassifier(max_depth=2, \n", 86 | " learning_rate=0.01,\n", 87 | " random_state=0)\n", 88 | "clf_gbt.fit(X_tn_std, y_tn)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 6, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "name": "stdout", 98 | "output_type": "stream", 99 | "text": [ 100 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 101 | " 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1\n", 102 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n", 103 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n" 104 | ] 105 | } 106 | ], 107 | "source": [ 108 | "# 예측\n", 109 | "pred_gboost = clf_gbt.predict(X_te_std)\n", 110 | "print(pred_gboost)" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 7, 116 | "metadata": {}, 117 | "outputs": [ 118 | { 119 | "name": "stdout", 120 | "output_type": "stream", 121 | "text": [ 122 | "0.965034965034965\n" 123 | ] 124 | } 125 | ], 126 | "source": [ 127 | "# 정확도\n", 128 | "from sklearn.metrics import accuracy_score\n", 129 | "accuracy = accuracy_score(y_te, pred_gboost)\n", 130 | "print(accuracy)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 8, 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "name": "stdout", 140 | "output_type": "stream", 141 | "text": [ 142 | "[[49 4]\n", 143 | " [ 1 89]]\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "# confusion matrix 확인 \n", 149 | "from sklearn.metrics import confusion_matrix\n", 150 | "conf_matrix = confusion_matrix(y_te, pred_gboost)\n", 151 | "print(conf_matrix)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 9, 157 | "metadata": {}, 158 | "outputs": [ 159 | { 160 | "name": "stdout", 161 | "output_type": "stream", 162 | "text": [ 163 | " precision recall f1-score support\n", 164 | "\n", 165 | " 0 0.98 0.92 0.95 53\n", 166 | " 1 0.96 0.99 0.97 90\n", 167 | "\n", 168 | " accuracy 0.97 143\n", 169 | " macro avg 0.97 0.96 0.96 143\n", 170 | "weighted avg 0.97 0.97 0.96 143\n", 171 | "\n" 172 | ] 173 | } 174 | ], 175 | "source": [ 176 | "# 분류 레포트 확인\n", 177 | "from sklearn.metrics import classification_report\n", 178 | "class_report = classification_report(y_te, pred_gboost)\n", 179 | "print(class_report)" 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "# 통합 코드" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 1, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 199 | " 0 1 0 1 1 1 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 1\n", 200 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 0 1\n", 201 | " 0 0 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n", 202 | "0.965034965034965\n", 203 | "[[49 4]\n", 204 | " [ 1 89]]\n", 205 | " precision recall f1-score support\n", 206 | "\n", 207 | " 0 0.98 0.92 0.95 53\n", 208 | " 1 0.96 0.99 0.97 90\n", 209 | "\n", 210 | " accuracy 0.97 143\n", 211 | " macro avg 0.97 0.96 0.96 143\n", 212 | "weighted avg 0.97 0.97 0.96 143\n", 213 | "\n" 214 | ] 215 | } 216 | ], 217 | "source": [ 218 | "from sklearn import datasets\n", 219 | "from sklearn.model_selection import train_test_split\n", 220 | "from sklearn.preprocessing import StandardScaler\n", 221 | "\n", 222 | "from sklearn.ensemble import GradientBoostingClassifier\n", 223 | "\n", 224 | "from sklearn.metrics import accuracy_score\n", 225 | "from sklearn.metrics import confusion_matrix\n", 226 | "from sklearn.metrics import classification_report\n", 227 | "\n", 228 | "# 데이터 불러오기\n", 229 | "raw_breast_cancer = datasets.load_breast_cancer()\n", 230 | "\n", 231 | "# 피쳐, 타겟 데이터 지정\n", 232 | "X = raw_breast_cancer.data\n", 233 | "y = raw_breast_cancer.target\n", 234 | "\n", 235 | "# 트레이닝/테스트 데이터 분할\n", 236 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 237 | "\n", 238 | "# 데이터 표준화\n", 239 | "std_scale = StandardScaler()\n", 240 | "std_scale.fit(X_tn)\n", 241 | "X_tn_std = std_scale.transform(X_tn)\n", 242 | "X_te_std = std_scale.transform(X_te)\n", 243 | "\n", 244 | "# Gradient Boosting 학습\n", 245 | "clf_gbt = GradientBoostingClassifier(max_depth=2, \n", 246 | " learning_rate=0.01,\n", 247 | " random_state=0)\n", 248 | "clf_gbt.fit(X_tn_std, y_tn)\n", 249 | "\n", 250 | "# 예측\n", 251 | "pred_gboost = clf_gbt.predict(X_te_std)\n", 252 | "print(pred_gboost)\n", 253 | "\n", 254 | "# 정확도\n", 255 | "accuracy = accuracy_score(y_te, pred_gboost)\n", 256 | "print(accuracy)\n", 257 | "\n", 258 | "# confusion matrix 확인 \n", 259 | "conf_matrix = confusion_matrix(y_te, pred_gboost)\n", 260 | "print(conf_matrix)\n", 261 | "\n", 262 | "# 분류 레포트 확인\n", 263 | "class_report = classification_report(y_te, pred_gboost)\n", 264 | "print(class_report)" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": null, 270 | "metadata": {}, 271 | "outputs": [], 272 | "source": [] 273 | } 274 | ], 275 | "metadata": { 276 | "kernelspec": { 277 | "display_name": "Python 3", 278 | "language": "python", 279 | "name": "python3" 280 | }, 281 | "language_info": { 282 | "codemirror_mode": { 283 | "name": "ipython", 284 | "version": 3 285 | }, 286 | "file_extension": ".py", 287 | "mimetype": "text/x-python", 288 | "name": "python", 289 | "nbconvert_exporter": "python", 290 | "pygments_lexer": "ipython3", 291 | "version": "3.7.6" 292 | } 293 | }, 294 | "nbformat": 4, 295 | "nbformat_minor": 4 296 | } 297 | -------------------------------------------------------------------------------- /09장_앙상블_5절_스태킹.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# 데이터 불러오기\n", 17 | "from sklearn import datasets\n", 18 | "raw_breast_cancer = datasets.load_breast_cancer()" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "# 피쳐, 타겟 데이터 지정\n", 28 | "X = raw_breast_cancer.data\n", 29 | "y = raw_breast_cancer.target" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 3, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "# 트레이닝/테스트 데이터 분할\n", 39 | "from sklearn.model_selection import train_test_split\n", 40 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": 4, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "# 데이터 표준화\n", 50 | "from sklearn.preprocessing import StandardScaler\n", 51 | "std_scale = StandardScaler()\n", 52 | "std_scale.fit(X_tn)\n", 53 | "X_tn_std = std_scale.transform(X_tn)\n", 54 | "X_te_std = std_scale.transform(X_te)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 5, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "data": { 64 | "text/plain": [ 65 | "StackingClassifier(cv=None,\n", 66 | " estimators=[('svm',\n", 67 | " SVC(C=1.0, break_ties=False, cache_size=200,\n", 68 | " class_weight=None, coef0=0.0,\n", 69 | " decision_function_shape='ovr', degree=3,\n", 70 | " gamma='scale', kernel='linear', max_iter=-1,\n", 71 | " probability=False, random_state=1,\n", 72 | " shrinking=True, tol=0.001, verbose=False)),\n", 73 | " ('gnb',\n", 74 | " GaussianNB(priors=None, var_smoothing=1e-09))],\n", 75 | " final_estimator=LogisticRegression(C=1.0, class_weight=None,\n", 76 | " dual=False,\n", 77 | " fit_intercept=True,\n", 78 | " intercept_scaling=1,\n", 79 | " l1_ratio=None,\n", 80 | " max_iter=100,\n", 81 | " multi_class='auto',\n", 82 | " n_jobs=None, penalty='l2',\n", 83 | " random_state=None,\n", 84 | " solver='lbfgs',\n", 85 | " tol=0.0001, verbose=0,\n", 86 | " warm_start=False),\n", 87 | " n_jobs=None, passthrough=False, stack_method='auto',\n", 88 | " verbose=0)" 89 | ] 90 | }, 91 | "execution_count": 5, 92 | "metadata": {}, 93 | "output_type": "execute_result" 94 | } 95 | ], 96 | "source": [ 97 | "# 스태킹 학습\n", 98 | "from sklearn import svm\n", 99 | "from sklearn.naive_bayes import GaussianNB\n", 100 | "from sklearn.linear_model import LogisticRegression\n", 101 | "from sklearn.ensemble import StackingClassifier\n", 102 | "\n", 103 | "clf1 = svm.SVC(kernel='linear', random_state=1) \n", 104 | "clf2 = GaussianNB()\n", 105 | "\n", 106 | "clf_stkg = StackingClassifier(\n", 107 | " estimators=[\n", 108 | " ('svm', clf1), \n", 109 | " ('gnb', clf2)\n", 110 | " ],\n", 111 | " final_estimator=LogisticRegression())\n", 112 | "clf_stkg.fit(X_tn_std, y_tn)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 7, 118 | "metadata": {}, 119 | "outputs": [ 120 | { 121 | "name": "stdout", 122 | "output_type": "stream", 123 | "text": [ 124 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 125 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n", 126 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1\n", 127 | " 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n" 128 | ] 129 | } 130 | ], 131 | "source": [ 132 | "# 예측\n", 133 | "pred_stkg = clf_stkg.predict(X_te_std)\n", 134 | "print(pred_stkg)" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 8, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "name": "stdout", 144 | "output_type": "stream", 145 | "text": [ 146 | "0.965034965034965\n" 147 | ] 148 | } 149 | ], 150 | "source": [ 151 | "# 정확도\n", 152 | "from sklearn.metrics import accuracy_score\n", 153 | "accuracy = accuracy_score(y_te, pred_stkg)\n", 154 | "print(accuracy)" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 9, 160 | "metadata": {}, 161 | "outputs": [ 162 | { 163 | "name": "stdout", 164 | "output_type": "stream", 165 | "text": [ 166 | "[[50 3]\n", 167 | " [ 2 88]]\n" 168 | ] 169 | } 170 | ], 171 | "source": [ 172 | "# confusion matrix 확인 \n", 173 | "from sklearn.metrics import confusion_matrix\n", 174 | "conf_matrix = confusion_matrix(y_te, pred_stkg)\n", 175 | "print(conf_matrix)" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 10, 181 | "metadata": {}, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | " precision recall f1-score support\n", 188 | "\n", 189 | " 0 0.96 0.94 0.95 53\n", 190 | " 1 0.97 0.98 0.97 90\n", 191 | "\n", 192 | " accuracy 0.97 143\n", 193 | " macro avg 0.96 0.96 0.96 143\n", 194 | "weighted avg 0.96 0.97 0.96 143\n", 195 | "\n" 196 | ] 197 | } 198 | ], 199 | "source": [ 200 | "# 분류 레포트 확인\n", 201 | "from sklearn.metrics import classification_report\n", 202 | "class_report = classification_report(y_te, pred_stkg)\n", 203 | "print(class_report)" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "# 통합코드" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 11, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "[0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 1 0 1\n", 223 | " 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1 0\n", 224 | " 0 1 1 1 1 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 0 1\n", 225 | " 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0]\n", 226 | "0.965034965034965\n", 227 | "[[50 3]\n", 228 | " [ 2 88]]\n", 229 | " precision recall f1-score support\n", 230 | "\n", 231 | " 0 0.96 0.94 0.95 53\n", 232 | " 1 0.97 0.98 0.97 90\n", 233 | "\n", 234 | " accuracy 0.97 143\n", 235 | " macro avg 0.96 0.96 0.96 143\n", 236 | "weighted avg 0.96 0.97 0.96 143\n", 237 | "\n" 238 | ] 239 | } 240 | ], 241 | "source": [ 242 | "from sklearn import datasets\n", 243 | "from sklearn.model_selection import train_test_split\n", 244 | "from sklearn.preprocessing import StandardScaler\n", 245 | "\n", 246 | "from sklearn import svm\n", 247 | "from sklearn.naive_bayes import GaussianNB\n", 248 | "from sklearn.linear_model import LogisticRegression\n", 249 | "from sklearn.ensemble import StackingClassifier\n", 250 | "\n", 251 | "from sklearn.metrics import accuracy_score\n", 252 | "from sklearn.metrics import confusion_matrix\n", 253 | "from sklearn.metrics import classification_report\n", 254 | "\n", 255 | "\n", 256 | "# 데이터 불러오기\n", 257 | "raw_breast_cancer = datasets.load_breast_cancer()\n", 258 | "\n", 259 | "# 피쳐, 타겟 데이터 지정\n", 260 | "X = raw_breast_cancer.data\n", 261 | "y = raw_breast_cancer.target\n", 262 | "\n", 263 | "# 트레이닝/테스트 데이터 분할\n", 264 | "X_tn, X_te, y_tn, y_te=train_test_split(X,y,random_state=0)\n", 265 | "\n", 266 | "# 데이터 표준화\n", 267 | "std_scale = StandardScaler()\n", 268 | "std_scale.fit(X_tn)\n", 269 | "X_tn_std = std_scale.transform(X_tn)\n", 270 | "X_te_std = std_scale.transform(X_te)\n", 271 | "\n", 272 | "# 스태킹 학습\n", 273 | "clf1 = svm.SVC(kernel='linear', random_state=1) \n", 274 | "clf2 = GaussianNB()\n", 275 | "\n", 276 | "clf_stkg = StackingClassifier(\n", 277 | " estimators=[\n", 278 | " ('svm', clf1), \n", 279 | " ('gnb', clf2)\n", 280 | " ],\n", 281 | " final_estimator=LogisticRegression())\n", 282 | "clf_stkg.fit(X_tn_std, y_tn)\n", 283 | "\n", 284 | "# 예측\n", 285 | "pred_stkg = clf_stkg.predict(X_te_std)\n", 286 | "print(pred_stkg)\n", 287 | "\n", 288 | "# 정확도\n", 289 | "accuracy = accuracy_score(y_te, pred_stkg)\n", 290 | "print(accuracy)\n", 291 | "\n", 292 | "# confusion matrix 확인 \n", 293 | "conf_matrix = confusion_matrix(y_te, pred_stkg)\n", 294 | "print(conf_matrix)\n", 295 | "\n", 296 | "# 분류 레포트 확인\n", 297 | "class_report = classification_report(y_te, pred_stkg)\n", 298 | "print(class_report)" 299 | ] 300 | }, 301 | { 302 | "cell_type": "code", 303 | "execution_count": null, 304 | "metadata": {}, 305 | "outputs": [], 306 | "source": [] 307 | } 308 | ], 309 | "metadata": { 310 | "kernelspec": { 311 | "display_name": "Python 3", 312 | "language": "python", 313 | "name": "python3" 314 | }, 315 | "language_info": { 316 | "codemirror_mode": { 317 | "name": "ipython", 318 | "version": 3 319 | }, 320 | "file_extension": ".py", 321 | "mimetype": "text/x-python", 322 | "name": "python", 323 | "nbconvert_exporter": "python", 324 | "pygments_lexer": "ipython3", 325 | "version": "3.7.6" 326 | } 327 | }, 328 | "nbformat": 4, 329 | "nbformat_minor": 4 330 | } 331 | -------------------------------------------------------------------------------- /12장_딥러닝_2절_1_퍼셉트론.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 개별코드" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "[[2 3]\n", 20 | " [5 1]]\n", 21 | "[2 3 5 1]\n" 22 | ] 23 | } 24 | ], 25 | "source": [ 26 | "import numpy as np\n", 27 | "\n", 28 | "# 입력층\n", 29 | "input_data = np.array([[2,3], [5,1]])\n", 30 | "print(input_data)\n", 31 | "x = input_data.reshape(-1)\n", 32 | "print(x)" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 2, 38 | "metadata": {}, 39 | "outputs": [], 40 | "source": [ 41 | "# 가중치 및 편향\n", 42 | "w1 = np.array([2,1,-3,3])\n", 43 | "w2 = np.array([1,-3,1,3])\n", 44 | "b1 = 3\n", 45 | "b2 = 3" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 3, 51 | "metadata": {}, 52 | "outputs": [ 53 | { 54 | "name": "stdout", 55 | "output_type": "stream", 56 | "text": [ 57 | "[[ 2 1 -3 3]\n", 58 | " [ 1 -3 1 3]]\n", 59 | "[3 3]\n", 60 | "[-2 4]\n" 61 | ] 62 | } 63 | ], 64 | "source": [ 65 | "# 가중합\n", 66 | "W = np.array([w1, w2])\n", 67 | "print(W)\n", 68 | "b = np.array([b1, b2])\n", 69 | "print(b)\n", 70 | "weight_sum = np.dot(W, x) + b\n", 71 | "print(weight_sum)" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 5, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "name": "stdout", 81 | "output_type": "stream", 82 | "text": [ 83 | "[0.11920292 0.98201379]\n" 84 | ] 85 | } 86 | ], 87 | "source": [ 88 | "# 출력층\n", 89 | "res = 1/(1+np.exp(-weight_sum))\n", 90 | "print(res)" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": {}, 96 | "source": [ 97 | "# 통합 코드" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 6, 103 | "metadata": {}, 104 | "outputs": [], 105 | "source": [ 106 | "import numpy as np\n", 107 | "\n", 108 | "# 입력층\n", 109 | "input_data = np.array([[2,3], [5,1]])\n", 110 | "x = input_data.reshape(-1)\n", 111 | "\n", 112 | "# 가중치 및 편향\n", 113 | "w1 = np.array([2,1,-3,3])\n", 114 | "w2 = np.array([1,-3,1,3])\n", 115 | "b1 = 3\n", 116 | "b2 = 3\n", 117 | "\n", 118 | "# 가중합\n", 119 | "W = np.array([w1, w2])\n", 120 | "b = np.array([b1, b2])\n", 121 | "weight_sum = np.dot(W, x) + b\n", 122 | "\n", 123 | "# 출력층\n", 124 | "res = 1/(1+np.exp(-weight_sum))" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": {}, 131 | "outputs": [], 132 | "source": [] 133 | } 134 | ], 135 | "metadata": { 136 | "kernelspec": { 137 | "display_name": "Python 3", 138 | "language": "python", 139 | "name": "python3" 140 | }, 141 | "language_info": { 142 | "codemirror_mode": { 143 | "name": "ipython", 144 | "version": 3 145 | }, 146 | "file_extension": ".py", 147 | "mimetype": "text/x-python", 148 | "name": "python", 149 | "nbconvert_exporter": "python", 150 | "pygments_lexer": "ipython3", 151 | "version": "3.7.6" 152 | } 153 | }, 154 | "nbformat": 4, 155 | "nbformat_minor": 4 156 | } 157 | -------------------------------------------------------------------------------- /12장_딥러닝_3절_7_텐서플로_소개.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Sequential API를 활용한 딥러닝 모형 생성" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from tensorflow.keras.models import Sequential\n", 17 | "from tensorflow.keras.layers import Dense\n", 18 | "\n", 19 | "model = Sequential()\n", 20 | "model.add(Dense(100, activation='relu', \n", 21 | " input_shape=(32,32,1)))\n", 22 | "model.add(Dense(50, activation='relu'))\n", 23 | "model.add(Dense(5, activation='softmax'))" 24 | ] 25 | }, 26 | { 27 | "cell_type": "code", 28 | "execution_count": 2, 29 | "metadata": {}, 30 | "outputs": [ 31 | { 32 | "name": "stdout", 33 | "output_type": "stream", 34 | "text": [ 35 | "Model: \"sequential\"\n", 36 | "_________________________________________________________________\n", 37 | "Layer (type) Output Shape Param # \n", 38 | "=================================================================\n", 39 | "dense (Dense) (None, 32, 32, 100) 200 \n", 40 | "_________________________________________________________________\n", 41 | "dense_1 (Dense) (None, 32, 32, 50) 5050 \n", 42 | "_________________________________________________________________\n", 43 | "dense_2 (Dense) (None, 32, 32, 5) 255 \n", 44 | "=================================================================\n", 45 | "Total params: 5,505\n", 46 | "Trainable params: 5,505\n", 47 | "Non-trainable params: 0\n", 48 | "_________________________________________________________________\n" 49 | ] 50 | } 51 | ], 52 | "source": [ 53 | "model.summary()" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "# 함수형 API를 활용한 딥러닝 모형 생성" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 3, 66 | "metadata": {}, 67 | "outputs": [], 68 | "source": [ 69 | "from tensorflow.keras.layers import Input, Dense\n", 70 | "from tensorflow.keras.models import Model\n", 71 | "\n", 72 | "input_layer = Input(shape=(32,32,1))\n", 73 | "\n", 74 | "x = Dense(units=100, activation = 'relu')(input_layer)\n", 75 | "x = Dense(units=50, activation = 'relu')(x)\n", 76 | "\n", 77 | "output_layer = Dense(units=5, activation='softmax')(x)\n", 78 | "\n", 79 | "model2 = Model(input_layer, output_layer)\n" 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 4, 85 | "metadata": {}, 86 | "outputs": [ 87 | { 88 | "name": "stdout", 89 | "output_type": "stream", 90 | "text": [ 91 | "Model: \"model\"\n", 92 | "_________________________________________________________________\n", 93 | "Layer (type) Output Shape Param # \n", 94 | "=================================================================\n", 95 | "input_1 (InputLayer) [(None, 32, 32, 1)] 0 \n", 96 | "_________________________________________________________________\n", 97 | "dense_3 (Dense) (None, 32, 32, 100) 200 \n", 98 | "_________________________________________________________________\n", 99 | "dense_4 (Dense) (None, 32, 32, 50) 5050 \n", 100 | "_________________________________________________________________\n", 101 | "dense_5 (Dense) (None, 32, 32, 5) 255 \n", 102 | "=================================================================\n", 103 | "Total params: 5,505\n", 104 | "Trainable params: 5,505\n", 105 | "Non-trainable params: 0\n", 106 | "_________________________________________________________________\n" 107 | ] 108 | } 109 | ], 110 | "source": [ 111 | "model2.summary()" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "# 활성화 함수 사용" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "x = Dense(units=100)(x)\n", 128 | "x = Activation('relu')(x)" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": null, 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "x = Dense(units=100, activation='relu')(x)" 138 | ] 139 | } 140 | ], 141 | "metadata": { 142 | "kernelspec": { 143 | "display_name": "Python 3", 144 | "language": "python", 145 | "name": "python3" 146 | }, 147 | "language_info": { 148 | "codemirror_mode": { 149 | "name": "ipython", 150 | "version": 3 151 | }, 152 | "file_extension": ".py", 153 | "mimetype": "text/x-python", 154 | "name": "python", 155 | "nbconvert_exporter": "python", 156 | "pygments_lexer": "ipython3", 157 | "version": "3.7.6" 158 | } 159 | }, 160 | "nbformat": 4, 161 | "nbformat_minor": 4 162 | } 163 | -------------------------------------------------------------------------------- /12장_딥러닝_7절_1_자연어처리.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 단어의 토큰화" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 55, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from tensorflow.keras.preprocessing.text import Tokenizer\n", 17 | "\n", 18 | "paper = ['많은 것을 바꾸고 싶다면 많은 것을 받아들여라']\n", 19 | "\n", 20 | "tknz = Tokenizer()\n", 21 | "tknz.fit_on_texts(paper)" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 57, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "name": "stdout", 31 | "output_type": "stream", 32 | "text": [ 33 | "{'많은': 1, '것을': 2, '바꾸고': 3, '싶다면': 4, '받아들여라': 5}\n", 34 | "OrderedDict([('많은', 2), ('것을', 2), ('바꾸고', 1), ('싶다면', 1), ('받아들여라', 1)])\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "print(tknz.word_index)\n", 40 | "print(tknz.word_counts)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "# 원 핫 인코딩" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 70, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "from tensorflow.keras.utils import to_categorical\n", 57 | "from tensorflow.keras.preprocessing.text import Tokenizer\n", 58 | "\n", 59 | "\n", 60 | "paper = ['많은 것을 바꾸고 싶다면 많은 것을 받아들여라']\n", 61 | "tknz = Tokenizer()\n", 62 | "tknz.fit_on_texts(paper)\n", 63 | "\n", 64 | "idx_paper = tknz.texts_to_sequences(paper)\n", 65 | "n = len(tknz.word_index)+1\n", 66 | "idx_onehot = to_categorical(idx_paper, num_classes=n)" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 71, 72 | "metadata": {}, 73 | "outputs": [ 74 | { 75 | "name": "stdout", 76 | "output_type": "stream", 77 | "text": [ 78 | "[[1, 2, 3, 4, 1, 2, 5]]\n", 79 | "6\n", 80 | "[[[0. 1. 0. 0. 0. 0.]\n", 81 | " [0. 0. 1. 0. 0. 0.]\n", 82 | " [0. 0. 0. 1. 0. 0.]\n", 83 | " [0. 0. 0. 0. 1. 0.]\n", 84 | " [0. 1. 0. 0. 0. 0.]\n", 85 | " [0. 0. 1. 0. 0. 0.]\n", 86 | " [0. 0. 0. 0. 0. 1.]]]\n" 87 | ] 88 | } 89 | ], 90 | "source": [ 91 | "print(idx_paper)\n", 92 | "print(n)\n", 93 | "print(idx_onehot)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "# 단어 임베딩" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 76, 106 | "metadata": {}, 107 | "outputs": [], 108 | "source": [ 109 | "from tensorflow.keras.models import Sequential\n", 110 | "from tensorflow.keras.layers import Embedding\n", 111 | "\n", 112 | "model = Sequential()\n", 113 | "model.add(Embedding(input_dim=n, output_dim=3))\n", 114 | "model.compile(optimizer='rmsprop', loss='mse')\n", 115 | "embedding = model.predict(idx_paper)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 77, 121 | "metadata": {}, 122 | "outputs": [ 123 | { 124 | "name": "stdout", 125 | "output_type": "stream", 126 | "text": [ 127 | "[[[-0.02796837 -0.03958071 -0.03936887]\n", 128 | " [-0.02087821 -0.02005102 0.0131931 ]\n", 129 | " [-0.00142742 -0.03759698 0.02437944]\n", 130 | " [ 0.01546348 -0.00769221 -0.01694027]\n", 131 | " [-0.02796837 -0.03958071 -0.03936887]\n", 132 | " [-0.02087821 -0.02005102 0.0131931 ]\n", 133 | " [ 0.024049 -0.03488786 0.02603838]]]\n" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "print(embedding)" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [] 147 | } 148 | ], 149 | "metadata": { 150 | "kernelspec": { 151 | "display_name": "Python 3", 152 | "language": "python", 153 | "name": "python3" 154 | }, 155 | "language_info": { 156 | "codemirror_mode": { 157 | "name": "ipython", 158 | "version": 3 159 | }, 160 | "file_extension": ".py", 161 | "mimetype": "text/x-python", 162 | "name": "python", 163 | "nbconvert_exporter": "python", 164 | "pygments_lexer": "ipython3", 165 | "version": "3.7.6" 166 | } 167 | }, 168 | "nbformat": 4, 169 | "nbformat_minor": 4 170 | } 171 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 선형대수와 통계학으로 배우는 머신러닝 with 파이썬 2 | ![선형대수와_통계학_머신러닝_파이썬](https://user-images.githubusercontent.com/21074282/105607749-043d8780-5de4-11eb-8811-73d80cf8f3e5.jpg) 3 | 4 | - 부제: 최적화 개념부터 텐서플로를 활용한 딥러닝까지 5 | - 저자: 장철원 6 | - 출간일: 2021년 1월 26일 7 | - 페이지수: 624쪽 8 | 9 |


10 | ## 오탈자 정오표 11 | https://cafe.naver.com/aifromstat/28 12 | 13 |


14 | ## 온라인 서점 구매 링크 15 | - [yes24](http://www.yes24.com/Product/Goods/97032765) 16 | - [교보문고](http://www.kyobobook.co.kr/product/detailViewKor.laf?ejkGb=KOR&mallGb=KOR&barcode=9791165920395&orderClick=LAG&Kc=) 17 | - [알라딘](https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=262038358) 18 | 19 |


20 | ## 책 소개 21 |

머신러닝에 필요한 선형대수, 통계학, 최적화 이론부터

22 |

파이썬, 사이킷런, 텐서플로를 활용한 실습까지

23 | 24 | 『선형대수와 통계학으로 배우는 머신러닝 with 파이썬』은 머신러닝의 기본적인 사용 방법뿐만 아니라 통계학, 선형대수, 최적화 이론 등 머신러닝에 필요한 배경 이론까지 다룬다. 머신러닝 알고리즘을 소개하는 것에 그치지 않고 이론적으로 이해가 필요한 부분은 수학 수식을 통해 자세히 설명함으로써, 해당 머신러닝 알고리즘의 작동 방식을 파악할 수 있다. 25 | 프로그래밍 실습은 머신러닝 파트에서는 사이킷런 라이브러리를, 딥러닝 파트에서는 텐서플로 라이브러리를 사용한다. 각 코드의 라인별 부가 설명을 통해 해당 코드의 역할을 이해할 수 있으며, 각 장 마지막의 전체 코드로 전체 흐름 또한 파악할 수 있다. 26 | 머신러닝의 배경 이론 이해를 바탕으로 실습하는 이 책을 통해, 머신러닝 기본기를 다지는 것을 넘어 자신의 분야에 응용할 수 있을 것이다. 27 | 28 |


29 | ## 이 책의 특징 30 | - 머신러닝 수학 수식 전개 과정을 상세히 표현한다. 31 | - 머신러닝 알고리즘 개념을 쉬운 그림으로 알기 쉽게 설명한다. 32 | - 복잡한 수학 수식과 프로그래밍 코드를 자세하게 설명한다. 33 | 34 |


35 | ## 이 책이 필요한 독자 36 | - 머신러닝 분야에 관심이 있고 머신러닝을 배우고 싶은 분 37 | - 머신러닝을 공부한 경험이 있지만 실제 사용에 어려움을 느끼는 분 38 | - 머신러닝 알고리즘의 원리를 이해하고 싶은 분 39 | 40 |


41 | ## 저자 소개_장철원 42 | 43 |

공부한 내용을 기록하고 나누는 것을 좋아하는 프리랜서

44 | 45 | 충북대학교에서 통계학을 전공하고 고려대학교에서 통계학 석사를 졸업했다. 이후 플로리다 주립 대학교(Florida State University) 통계학 박사 과정 중 휴학 후 취업 전선에 뛰어들었다. 어렸을 때부터 게임을 좋아해 크래프톤(구 블루홀) 데이터 분석실에서 일했다. 주로 머신러닝을 이용한 이탈률 예측과 고객 분류 업무를 수행했다. 배틀그라운드 핵 관련 업무를 계기로 IT 보안에 흥미를 느껴, 이후 NHN IT보안실에서 일하며 머신러닝을 이용한 매크로 자동 탐지 시스템을 개발하고 특허를 출원했다. 현재는 머신러닝 관련 책을 쓰고 강의를 하는 프리랜서다. 공부한 내용을 공유하는 데 보람을 느껴 블로그와 카페를 운영하고 있다. 관심 분야는 인공지능, 머신러닝, 통계학, 선형대수, 커널, 임베디드, IT보안, 사물인터넷, 물리학, 철학이다. 46 | 47 | - 프리랜서 48 | - 한국정보통신기술협회 외부교수 49 | - 패스트캠퍼스 강사 50 | - 前) NHN IT 보안실 51 | - 前) 크래프톤(구 블루홀) 데이터 분석실 52 | 53 | 저자 운영 카페 https://cafe.naver.com/aifromstat 54 | 55 |


56 | ## 출판사 리뷰 57 |

머신러닝과 필연적 관계인 ‘수학’

58 |

수식이 어려운 당신에게 꼭 필요한 책!

59 | 60 | 머신러닝을 이해하기 위해서는 머신러닝을 근본적으로 떠받치고 있는 선형대수와 통계학, 최적화 개념에서부터 출발해야 한다. 『선형대수와 통계학으로 배우는 머신러닝 with 파이썬』은 이러한 개념을 다룰 때 수식 표현을 사용하고 코드보다 수학적인 지식을 먼저 서술함으로써, 머신러닝 알고리즘마다 원리를 이해하는 것을 목적으로 한다. 또한 ‘책에 쓰인 수학 기호’를 정리한 표를 통해 수식 이해에 어려움을 느끼는 독자의 진입 장벽을 낮추었다. 따라서 선형대수나 통계학에 대한 지식이 부족한 분들도 수학적 원리를 이해하며 기초를 탄탄히 쌓기에 큰 도움이 될 것이다. 61 | 62 |


63 | ## 관련도서 64 | - [알고리즘 구현으로 배우는 선형대수 with 파이썬](http://www.yes24.com/Product/Goods/105772247) 65 | 66 |


67 | ## 오탈자 정오표 68 | https://cafe.naver.com/aifromstat/28 69 | 70 |


71 | ![선형대수와 통계학으로 배우는 머신러닝 with 파이썬](https://user-images.githubusercontent.com/21074282/105607959-6a2a0f00-5de4-11eb-827c-b08e197ab7f4.jpg) 72 | 73 | 74 | -------------------------------------------------------------------------------- /data/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bjpublic/MachineLearning/f6e04995802b49c5841a917a178dbf43f58a9f44/data/.DS_Store -------------------------------------------------------------------------------- /data/eng-fra/_about.txt: -------------------------------------------------------------------------------- 1 | ** Info ** 2 | 3 | Check for newest version here: 4 | http://www.manythings.org/anki/ 5 | Date of this file: 6 | 2020-08-23 7 | 8 | This data is from the sentences_detailed.csv file from tatoeba.org. 9 | http://tatoeba.org/files/downloads/sentences_detailed.csv 10 | 11 | 12 | 13 | ** Terms of Use ** 14 | 15 | See the terms of use. 16 | These files have been released under the same license as the 17 | source. 18 | 19 | http://tatoeba.org/eng/terms_of_use 20 | http://creativecommons.org/licenses/by/2.0 21 | 22 | Attribution: www.manythings.org/anki and tatoeba.org 23 | 24 | 25 | 26 | ** Warnings ** 27 | 28 | The data from the Tatoeba Project contains errors. 29 | 30 | To lower the number of errors you are likely to see, only 31 | sentences by native speakers and proofread sentences have 32 | been included. 33 | 34 | For the non-English language, I made these (possibly wrong) 35 | assumptions. 36 | Assumption 1: Sentences written by native speakers can be 37 | trusted. 38 | Assumption 2: Contributors to the Tatoeba Project are honest 39 | about what their native language is. 40 | 41 | For English, I used the sentences that I have proofread 42 | and thought were OK. 43 | Of course, I may have missed a few errors. 44 | 45 | 46 | 47 | ** Downloading Anki ** 48 | 49 | See http://ankisrs.net/ 50 | 51 | 52 | 53 | ** Importing into Anki ** 54 | 55 | Information is at http://ankisrs.net/docs/manual.html#importing 56 | 57 | Of particular interest may be about "duplicates" at http://ankisrs.net/docs/manual.html#duplicates-and-updating. 58 | You can choose: 59 | 1. not to allow duplicates (alternate translations) as cards. 60 | 2. allow duplicates (alternate translations) as cards. 61 | -------------------------------------------------------------------------------- /소스코드전체.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/bjpublic/MachineLearning/f6e04995802b49c5841a917a178dbf43f58a9f44/소스코드전체.zip --------------------------------------------------------------------------------