├── .gitignore
├── Dockerfile
├── docker-compose.yml
├── input
├── submit-files
│ └── README.md
├── titanic
│ └── README.md
└── home-credit-default-risk
│ └── README.md
├── errata.md
├── README_EN.md
├── README.md
├── ch03
├── ch03_03.ipynb
├── ch03_02.ipynb
└── ch03_01.ipynb
├── footnote.md
└── ch02
├── ch02_08.ipynb
├── ch02_05.ipynb
├── ch02_01.ipynb
└── ch02_02.ipynb
/.gitignore:
--------------------------------------------------------------------------------
1 | *.csv
2 | .ipynb_checkpoints
3 | data
4 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM gcr.io/kaggle-images/python:v68
2 |
--------------------------------------------------------------------------------
/docker-compose.yml:
--------------------------------------------------------------------------------
1 | version: "3"
2 | services:
3 | jupyter:
4 | build: .
5 | volumes:
6 | - $PWD:/tmp/working
7 | working_dir: /tmp/working
8 | ports:
9 | - 8888:8888
10 | command: jupyter notebook --ip=0.0.0.0 --allow-root --no-browser
11 |
--------------------------------------------------------------------------------
/input/submit-files/README.md:
--------------------------------------------------------------------------------
1 | # 入力データ
2 |
3 | [データ](https://www.kaggle.com/sishihara/submit-files)をダウンロードし、このディレクトリ内に配置してください。
4 |
5 | ```
6 | input
7 | └── submit-files
8 | |── submission_lightgbm_holdout.csv
9 | |── submission_lightgbm_skfold.csv
10 | └── submission_randomforest.csv
11 | ```
12 |
--------------------------------------------------------------------------------
/input/titanic/README.md:
--------------------------------------------------------------------------------
1 | # 入力データ
2 |
3 | Kaggleの[Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic) から[データ](https://www.kaggle.com/c/titanic/data)をダウンロードし、このディレクトリ内に配置してください。
4 |
5 | ```
6 | input
7 | └── titanic
8 | |── train.csv
9 | |── test.csv
10 | └── gender_submission.csv
11 | ```
12 |
--------------------------------------------------------------------------------
/input/home-credit-default-risk/README.md:
--------------------------------------------------------------------------------
1 | # 入力データ
2 |
3 | Kaggleの[Home Credit Default Risk](https://www.kaggle.com/c/home-credit-default-risk) から[データ](https://www.kaggle.com/c/home-credit-default-risk/data)をダウンロードし、このディレクトリ内に配置してください。
4 | ダウンロードしたファイルの拡張子は「.csv.zip」になっています。
5 | 解凍してcsvファイルに変換してください。
6 |
7 | ```
8 | input
9 | └── home-credit-default-risk
10 | |── application_train.csv
11 | └── bureau.csv
12 | ```
13 |
--------------------------------------------------------------------------------
/errata.md:
--------------------------------------------------------------------------------
1 | # 正誤表
2 |
3 | 現時点で判明している誤植などを掲載しています。
4 | 次回重版時に修正予定です。
5 |
6 | | 該当箇所 | 誤 | 正 | 対応 |
7 | | -- | -- | -- | -- |
8 | | p. 13 | Kaggleのコンペの概要は図1.1の通りで,次のような流れになっています. | Kaggleのコンペの概要は図1.1の通りで,次のような流れになっています.図1.1は参考文献[16]を参考に作成しました.| 紙版第3刷で対応 |
9 | | p. 13 | 「R」[16] | 「R」 | 紙版第3刷で対応 |
10 | | p. 13 | [16] R: The R Project for Statistical Computing, https://www.r-project.org/ (Accessed: 30 November 2019). | [16] Kaggleで描く成長戦略 〜個人編・組織編〜, https://www2.slideshare.net/HaradaKei/devsumi-2018summer (Accessed: 24 December 2020). | 紙版第3刷で対応 |
11 | | p. 27 | つまづき | つまずき | 紙版第3刷で対応 |
12 | | p. 48 | プログ | ブログ | 紙版第3刷で対応 |
13 | | p. 58 | 習う | 倣う | 紙版第3刷で対応 |
14 | | p. 71 | 深堀り | 深掘り | 紙版第3刷で対応 |
15 | | p. 84 | Fale | Fare | 紙版第3刷で対応 |
16 | | p. 138 | 脚注[138]の欠落 | 「日本語版Wikipediaで事前に学習済のモデル[103]を用いて」
[103] ja.text8 https://github.com/Hironsan/ja.text8 (Accessed: 30 November 2019).| 紙版第2刷で対応 |
17 | | p. 149 | ベンチーマーク | ベンチマーク | 紙版第3刷で対応 |
18 | | p. 149 | Disscussion | Discussion | 紙版第3刷で対応 |
19 | | p. 154 | [107] Kaggleの画像コンペのためのGCPインスタンス作成手順(2019年10月版), https://currypurin.qrunch.io/entries/T9iGWHdsiI6o2wke (Accessed: 30 November 2019). | [107] Kaggleの画像コンペのためのGCPインスタンス作成手順(2019年10月版), https://www.currypurin.com/entry/2019/10/10/094133 (Accessed: 24 December 2020). | 紙版第3刷で対応 |
20 | | p. 166, 168, 169 | Suvivied, Surviviedなど | Survived | 紙版第3刷で対応 |
21 |
--------------------------------------------------------------------------------
/README_EN.md:
--------------------------------------------------------------------------------
1 | # Supplemental materials for *"Python Kaggle Start Book"*
2 |
3 | - This repository contains sample code for a Japanese book entitled "Python Kaggle Start Book (PythonではじめるKaggleスタートブック)".
4 | - You can also see a list of [footnote](footnote.md) and [errata](errata.md).
5 | - If you have any questions or comments, please create an [issue](https://github.com/upura/python-kaggle-start-book/issues).
6 |
7 | ## Directories
8 |
9 | | Directory | Description |
10 | |:----|:-------|
11 | | input | input data |
12 | | ch02 | sample code for chapter 2 |
13 | | ch03 | sample code for chapter 3 |
14 |
15 | ## Kaggle Notebooks
16 |
17 | - [ch02_01.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-01)
18 | - [ch02_02.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-02)
19 | - [ch02_03.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-03)
20 | - [ch02_04.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-04)
21 | - [ch02_05.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-05)
22 | - [ch02_06.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-06)
23 | - [ch02_07.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-07)
24 | - [ch02_08.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-08)
25 | - [ch03_01.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-01)
26 | - [ch03_02.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-02)
27 | - [ch03_03.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-03)
28 |
29 | ## FAQ
30 |
31 | You can refer to the [past questions](https://github.com/upura/python-kaggle-start-book/issues?q=is%3Aissue) and FAQ.
32 |
33 | ### Is there an electronic version available?
34 |
35 | An electronic version is [available](https://bookclub.kodansha.co.jp/buy?item=0000325172) in reflowable formats including Kindle and Kobo from 26 May 2020.
36 |
37 | ### p.41: There is no "Commit" in Kaggle Notebook
38 |
39 | The design of the Kaggle Notebook is constantly updated, and you can find the [tutorial video](https://youtu.be/lU_VY79vJfk) on YouTube.
40 |
41 | ### Is there the other languages version available?
42 |
43 | [Traditional Chinese](http://books.gotop.com.tw/v_ACD021100) version and [Korean](https://jpub.tistory.com/1147) version were published in April 2021.
44 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | [](README_EN.md)
2 |
3 | # サンプルコード・脚注・正誤表
4 |
5 | - 『PythonではじめるKaggleスタートブック』のサンプルコードです。
6 | - 書籍内の脚注は[こちら](footnote.md)にまとめています。
7 | - 正誤表は[こちら](errata.md)です。
8 | - ご感想・ご質問は、[issue](https://github.com/upura/python-kaggle-start-book/issues)にてお願いします。よくある質問(FAQ)は、[下記](https://github.com/upura/python-kaggle-start-book#%E3%82%88%E3%81%8F%E3%81%82%E3%82%8B%E8%B3%AA%E5%95%8Ffaq)にまとめています。
9 |
10 | ## 各ディレクトリの内容
11 |
12 | - 「Kaggle (py36)」は、書籍の第1刷に合わせて公開したファイルです。Pythonバージョンは3.6です。ディレクトリ名`ch02`と`ch03`のファイルをアップロードしました。
13 | - 「Kaggle (py310)」は、書籍の第5刷に合わせて公開したファイルです。Pythonバージョンは3.10です。
14 |
15 | | ディレクトリ | 内容 | ファイル名 | Kaggle (py36) | Kaggle (py310) |
16 | |:---|:---|:---|:---|:---|
17 | | input | 入力ファイル | - | - | - |
18 | | ch02 | 第2章のサンプルコード | ch02_01.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-01) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-01) |
19 | | | | ch02_02.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-02) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-02) |
20 | | | | ch02_03.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-03) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-03) |
21 | | | | ch02_04.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-04) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-04) |
22 | | | | ch02_05.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-05) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-05) |
23 | | | | ch02_06.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-06) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-06) |
24 | | | | ch02_07.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-07) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-07) |
25 | | | | ch02_08.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-08) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-08) |
26 | | ch03 | 第3章のサンプルコード | ch03_01.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-01) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch03-01) |
27 | | | | ch03_02.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-02) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch03-02) |
28 | | | | ch03_03.ipynb | [](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-03) | [](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch03-03) |
29 |
30 | ### 「Kaggle (py310)」での主な変更点
31 |
32 | - 注:乱数やライブラリの仕様変更などの問題で、初版と合わせた公開したファイルと実行結果が一致しない場合があります。
33 | - 第5刷:LightGBMの[仕様変更](https://github.com/microsoft/LightGBM/pull/4908)に伴い`early_stopping_rounds`引数を削除しました。
34 | - 第5刷:ch03_02.ipynbで`dataiter.next()`を`next(dataiter)`に変更しました。
35 | - 第5刷:ch03_03.ipynbで`word2vec.Word2Vec`の引数の`size`を`vector_size`に変更しました。
36 | - 第6刷:ch02_03.ipynbでimportするライブラリ名を`pandas_profiling`から`ydata_profiling`に変更しました。
37 |
38 | ## よくある質問(FAQ)
39 |
40 | 過去の全質問・やり取りは[こちら](https://github.com/upura/python-kaggle-start-book/issues?q=is%3Aissue)で公開しています。
41 |
42 | ### 電子版の販売はありますか?
43 |
44 | 2020年5月26日から、kindle版やkobo版などリフロー型で配信しています。[オンライン書店](https://bookclub.kodansha.co.jp/buy?item=0000325172)にてご確認ください。
45 |
46 | ### p.41: Kaggle Notebookの「Commit」が存在しません
47 |
48 | Kaggle Notebookのデザインは随時更新されています。2023年5月時点のCommit方法は、[動画](https://youtu.be/u6Bc0jiWu38)で解説しています。
49 |
50 | ### 他言語版はありますか?
51 |
52 | 中国語繁体字版の『[Kaggle大師教您用Python玩資料科學,比賽拿獎金](http://books.gotop.com.tw/v_ACD021100)』と、韓国語版の『[파이썬으로 시작하는 캐글: 입문에서 컴피티션까지](https://jpub.tistory.com/1147)』が、2021年4月に出版されました。
53 |
--------------------------------------------------------------------------------
/ch03/ch03_03.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "This notebook is a sample code with Japanese comments.\n",
8 | "\n",
9 | "# 3.3 Titanicの先へ行く③! テキストデータに触れてみよう"
10 | ]
11 | },
12 | {
13 | "cell_type": "code",
14 | "execution_count": 1,
15 | "metadata": {
16 | "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0",
17 | "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a"
18 | },
19 | "outputs": [],
20 | "source": [
21 | "import pandas as pd"
22 | ]
23 | },
24 | {
25 | "cell_type": "code",
26 | "execution_count": 2,
27 | "metadata": {},
28 | "outputs": [
29 | {
30 | "data": {
31 | "text/html": [
32 | "
\n",
33 | "\n",
46 | "
\n",
47 | " \n",
48 | " \n",
49 | " | \n",
50 | " text | \n",
51 | "
\n",
52 | " \n",
53 | " \n",
54 | " \n",
55 | " | 0 | \n",
56 | " I like kaggle very much | \n",
57 | "
\n",
58 | " \n",
59 | " | 1 | \n",
60 | " I do not like kaggle | \n",
61 | "
\n",
62 | " \n",
63 | " | 2 | \n",
64 | " I do really love machine learning | \n",
65 | "
\n",
66 | " \n",
67 | "
\n",
68 | "
"
69 | ],
70 | "text/plain": [
71 | " text\n",
72 | "0 I like kaggle very much\n",
73 | "1 I do not like kaggle\n",
74 | "2 I do really love machine learning"
75 | ]
76 | },
77 | "execution_count": 2,
78 | "metadata": {},
79 | "output_type": "execute_result"
80 | }
81 | ],
82 | "source": [
83 | "df = pd.DataFrame({'text': ['I like kaggle very much',\n",
84 | " 'I do not like kaggle',\n",
85 | " 'I do really love machine learning']})\n",
86 | "df"
87 | ]
88 | },
89 | {
90 | "cell_type": "markdown",
91 | "metadata": {},
92 | "source": [
93 | "# Bag of Words"
94 | ]
95 | },
96 | {
97 | "cell_type": "code",
98 | "execution_count": 3,
99 | "metadata": {},
100 | "outputs": [
101 | {
102 | "data": {
103 | "text/plain": [
104 | "array([[0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1],\n",
105 | " [1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0],\n",
106 | " [1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0]], dtype=int64)"
107 | ]
108 | },
109 | "execution_count": 3,
110 | "metadata": {},
111 | "output_type": "execute_result"
112 | }
113 | ],
114 | "source": [
115 | "from sklearn.feature_extraction.text import CountVectorizer\n",
116 | "\n",
117 | "\n",
118 | "vectorizer = CountVectorizer(token_pattern=u'(?u)\\\\b\\\\w+\\\\b')\n",
119 | "bag = vectorizer.fit_transform(df['text'])\n",
120 | "bag.toarray()"
121 | ]
122 | },
123 | {
124 | "cell_type": "code",
125 | "execution_count": 4,
126 | "metadata": {},
127 | "outputs": [
128 | {
129 | "name": "stdout",
130 | "output_type": "stream",
131 | "text": [
132 | "{'i': 1, 'like': 4, 'kaggle': 2, 'very': 10, 'much': 7, 'do': 0, 'not': 8, 'really': 9, 'love': 5, 'machine': 6, 'learning': 3}\n"
133 | ]
134 | }
135 | ],
136 | "source": [
137 | "print(vectorizer.vocabulary_)"
138 | ]
139 | },
140 | {
141 | "cell_type": "markdown",
142 | "metadata": {},
143 | "source": [
144 | "# TF-IDF"
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 5,
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "name": "stdout",
154 | "output_type": "stream",
155 | "text": [
156 | "[[0. 0.31544415 0.40619178 0. 0.40619178 0.\n",
157 | " 0. 0.53409337 0. 0. 0.53409337]\n",
158 | " [0.43306685 0.33631504 0.43306685 0. 0.43306685 0.\n",
159 | " 0. 0. 0.56943086 0. 0. ]\n",
160 | " [0.34261996 0.26607496 0. 0.45050407 0. 0.45050407\n",
161 | " 0.45050407 0. 0. 0.45050407 0. ]]\n"
162 | ]
163 | }
164 | ],
165 | "source": [
166 | "from sklearn.feature_extraction.text import CountVectorizer\n",
167 | "from sklearn.feature_extraction.text import TfidfTransformer\n",
168 | "\n",
169 | "\n",
170 | "vectorizer = CountVectorizer(token_pattern=u'(?u)\\\\b\\\\w+\\\\b')\n",
171 | "transformer = TfidfTransformer()\n",
172 | "\n",
173 | "tf = vectorizer.fit_transform(df['text'])\n",
174 | "tfidf = transformer.fit_transform(tf)\n",
175 | "print(tfidf.toarray())"
176 | ]
177 | },
178 | {
179 | "cell_type": "code",
180 | "execution_count": 6,
181 | "metadata": {},
182 | "outputs": [
183 | {
184 | "name": "stdout",
185 | "output_type": "stream",
186 | "text": [
187 | "{'i': 1, 'like': 4, 'kaggle': 2, 'very': 10, 'much': 7, 'do': 0, 'not': 8, 'really': 9, 'love': 5, 'machine': 6, 'learning': 3}\n"
188 | ]
189 | }
190 | ],
191 | "source": [
192 | "print(vectorizer.vocabulary_)"
193 | ]
194 | },
195 | {
196 | "cell_type": "markdown",
197 | "metadata": {},
198 | "source": [
199 | "# Word2vec"
200 | ]
201 | },
202 | {
203 | "cell_type": "code",
204 | "execution_count": 7,
205 | "metadata": {},
206 | "outputs": [],
207 | "source": [
208 | "from gensim.models import word2vec\n",
209 | "\n",
210 | "\n",
211 | "sentences = [d.split() for d in df['text']]\n",
212 | "model = word2vec.Word2Vec(sentences, size=10, min_count=1, window=2, seed=7)"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": 8,
218 | "metadata": {},
219 | "outputs": [
220 | {
221 | "data": {
222 | "text/plain": [
223 | "array([-0.04932676, -0.01171829, 0.04239148, 0.01735417, -0.04764815,\n",
224 | " -0.03205363, -0.02873827, 0.04682567, 0.04185081, 0.00795709],\n",
225 | " dtype=float32)"
226 | ]
227 | },
228 | "execution_count": 8,
229 | "metadata": {},
230 | "output_type": "execute_result"
231 | }
232 | ],
233 | "source": [
234 | "model.wv['like']"
235 | ]
236 | },
237 | {
238 | "cell_type": "code",
239 | "execution_count": 9,
240 | "metadata": {},
241 | "outputs": [
242 | {
243 | "data": {
244 | "text/plain": [
245 | "[('much', 0.31108221411705017),\n",
246 | " ('really', 0.11813490092754364),\n",
247 | " ('not', 0.07177764177322388),\n",
248 | " ('learning', -0.014833025634288788),\n",
249 | " ('very', -0.03584161400794983),\n",
250 | " ('do', -0.11829414963722229),\n",
251 | " ('machine', -0.12069450318813324),\n",
252 | " ('kaggle', -0.532151997089386),\n",
253 | " ('love', -0.5468614101409912),\n",
254 | " ('I', -0.7641928195953369)]"
255 | ]
256 | },
257 | "execution_count": 9,
258 | "metadata": {},
259 | "output_type": "execute_result"
260 | }
261 | ],
262 | "source": [
263 | "model.wv.most_similar('like')"
264 | ]
265 | },
266 | {
267 | "cell_type": "code",
268 | "execution_count": 10,
269 | "metadata": {},
270 | "outputs": [
271 | {
272 | "data": {
273 | "text/plain": [
274 | "['I', 'like', 'kaggle', 'very', 'much']"
275 | ]
276 | },
277 | "execution_count": 10,
278 | "metadata": {},
279 | "output_type": "execute_result"
280 | }
281 | ],
282 | "source": [
283 | "df['text'][0].split()"
284 | ]
285 | },
286 | {
287 | "cell_type": "code",
288 | "execution_count": 11,
289 | "metadata": {},
290 | "outputs": [
291 | {
292 | "data": {
293 | "text/plain": [
294 | "array([[-0.00070634, 0.04390315, -0.03669089, 0.02026465, 0.04046954,\n",
295 | " 0.02365695, 0.020924 , -0.03109757, -0.04436051, -0.00691835],\n",
296 | " [-0.04932676, -0.01171829, 0.04239148, 0.01735417, -0.04764815,\n",
297 | " -0.03205363, -0.02873827, 0.04682567, 0.04185081, 0.00795709],\n",
298 | " [ 0.03825201, -0.04983004, -0.03085005, -0.0421443 , 0.04703034,\n",
299 | " -0.01274201, 0.00586073, -0.02872854, 0.01241979, -0.03893603],\n",
300 | " [-0.01407814, -0.03944685, 0.01979917, -0.00788147, 0.03230685,\n",
301 | " 0.04465036, -0.01564248, 0.04261149, -0.04766037, 0.03080159],\n",
302 | " [ 0.0160588 , -0.04853946, 0.02299253, 0.00940678, -0.04020066,\n",
303 | " 0.00423941, 0.00689822, 0.02838706, -0.02563218, 0.02724046]],\n",
304 | " dtype=float32)"
305 | ]
306 | },
307 | "execution_count": 11,
308 | "metadata": {},
309 | "output_type": "execute_result"
310 | }
311 | ],
312 | "source": [
313 | "import numpy as np\n",
314 | "\n",
315 | "\n",
316 | "wordvec = np.array([model.wv[word] for word in df['text'][0].split()])\n",
317 | "wordvec"
318 | ]
319 | },
320 | {
321 | "cell_type": "code",
322 | "execution_count": 12,
323 | "metadata": {},
324 | "outputs": [
325 | {
326 | "data": {
327 | "text/plain": [
328 | "array([-0.00196009, -0.0211263 , 0.00352845, -0.00060003, 0.00639158,\n",
329 | " 0.00555022, -0.00213956, 0.01159962, -0.01267649, 0.00402895],\n",
330 | " dtype=float32)"
331 | ]
332 | },
333 | "execution_count": 12,
334 | "metadata": {},
335 | "output_type": "execute_result"
336 | }
337 | ],
338 | "source": [
339 | "np.mean(wordvec, axis=0)"
340 | ]
341 | },
342 | {
343 | "cell_type": "code",
344 | "execution_count": 13,
345 | "metadata": {},
346 | "outputs": [
347 | {
348 | "data": {
349 | "text/plain": [
350 | "array([0.03825201, 0.04390315, 0.04239148, 0.02026465, 0.04703034,\n",
351 | " 0.04465036, 0.020924 , 0.04682567, 0.04185081, 0.03080159],\n",
352 | " dtype=float32)"
353 | ]
354 | },
355 | "execution_count": 13,
356 | "metadata": {},
357 | "output_type": "execute_result"
358 | }
359 | ],
360 | "source": [
361 | "np.max(wordvec, axis=0)"
362 | ]
363 | },
364 | {
365 | "cell_type": "code",
366 | "execution_count": null,
367 | "metadata": {},
368 | "outputs": [],
369 | "source": []
370 | }
371 | ],
372 | "metadata": {
373 | "file_extension": ".py",
374 | "kernelspec": {
375 | "display_name": "Python 3",
376 | "language": "python",
377 | "name": "python3"
378 | },
379 | "language_info": {
380 | "codemirror_mode": {
381 | "name": "ipython",
382 | "version": 3
383 | },
384 | "file_extension": ".py",
385 | "mimetype": "text/x-python",
386 | "name": "python",
387 | "nbconvert_exporter": "python",
388 | "pygments_lexer": "ipython3",
389 | "version": "3.6.6"
390 | },
391 | "mimetype": "text/x-python",
392 | "name": "python",
393 | "npconvert_exporter": "python",
394 | "pygments_lexer": "ipython3",
395 | "version": 3
396 | },
397 | "nbformat": 4,
398 | "nbformat_minor": 2
399 | }
400 |
--------------------------------------------------------------------------------
/footnote.md:
--------------------------------------------------------------------------------
1 | # 書籍内の脚注
2 |
3 | ## はじめに
4 |
5 | - [1] Welcome to Python.org, https://www.python.org/ (Accessed: 30 November 2019).
6 | - [2] Kaggle: Your Home for Data Science, https://www.kaggle.com/ (Accessed: 30 November 2019).
7 | - [3] Titanic: Machine Learning from Disaster, https://www.kaggle.com/c/titanic (Accessed: 30 November 2019).
8 | - [4] Qiita, https://qiita.com/ (Accessed: 30 November 2019).
9 | - [5] Kaggleに登録したら次にやること ~ これだけやれば十分闘える!Titanicの先へ行く入門 10 Kernel ~], https://qiita.com/upura/items/3c10ff6fed4e7c3d70f0 (Accessed: 30 November 2019).
10 | - [6] Kaggle - Qiita, https://qiita.com/tags/kaggle (Accessed: 30 November 2019).
11 | - [7] 村田秀樹, 『Kaggleのチュートリアル』, https://note.mu/currypurin/n/nf390914c721e (Accessed: 30 November 2019).
12 | - [8] GitHub, http://github.com (Accessed: 30 November 2019).
13 | - [9] Docker: Enterprise Container Platform, https://www.docker.com/ (Accessed: 30 November 2019).
14 | - [10] Container Registry - Google Cloud Platform, https://console.cloud.google.com/gcr/images/kaggle-images/GLOBAL/python (Accessed: 30 November 2019).
15 | - [11] PetFinder.my Adoption Prediction, https://www.kaggle.com/c/petfinder-adoption-prediction (Accessed: 30 November 2019).
16 | - [12] Kaggle Days Tokyo, https://www.kaggle.com/c/kaggle-days-tokyo (Accessed: 10 March 2024).
17 | - [13] 機械学習を用いた日経電子版Proのユーザ分析 データドリブンチームの知られざる取り組み, https://logmi.jp/tech/articles/321077 (Accessed: 30 November 2019).
18 | - [14] Santander Value Prediction Challenge, https://www.kaggle.com/c/santander-value-prediction-challenge (Accessed: 30 November 2019).
19 | - [15] LANL Earthquake Prediction, https://www.kaggle.com/c/LANL-Earthquake-Prediction (Accessed: 30 November 2019).
20 |
21 | ## 第1章
22 |
23 | - [16] Kaggleで描く成長戦略 〜個人編・組織編〜, https://www2.slideshare.net/HaradaKei/devsumi-2018summer (Accessed: 24 December 2020).
24 | - [17] Kaggle Progression System, https://www.kaggle.com/progression (Accessed: 30 November 2019).
25 | - [18] KaggleのGrandmasterやmasterの条件や人数について調べたので、詳細に書きとめます。, http://www.currypurin.com/entry/2018/02/21/011316 (Accessed: 30 November 2019).
26 | - [19] SIGNATE, https://signate.jp/ (Accessed: 30 November 2019).
27 | - [20] 杉山将, 『イラストで学ぶ 機械学習』, 講談社, 2013
28 | - [21] AtCoder:競技プログラミングコンテストを開催する国内最大のサイト, https://atcoder.jp/ (Accessed: 30 November 2019).
29 | - [22] AtCoder に登録したら次にやること ~ これだけ解けば十分闘える!過去問精選 10 問 ~, https://qiita.com/drken/items/fd4e5e3630d0f5859067 (Accessed: 30 November 2019).
30 | - [23] TalkingData AdTracking Fraud Detection Challenge, https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection (Accessed: 30 November 2019).
31 |
32 | ## 第2章
33 |
34 | - [24] Kaggle API, https://github.com/Kaggle/kaggle-api (Accessed: 30 November 2019).
35 | - [25] kaggle-apiというKaggle公式のapiの使い方をまとめます, http://www.currypurin.com/entry/2018/kaggle-api (Accessed: 30 November 2019).
36 | - [26] NumPy, https://numpy.org/ (Accessed: 30 November 2019).
37 | - [27] Pandas, https://pandas.pydata.org/ (Accessed: 30 November 2019).
38 | - [28] sklearn.preprocessing.StandardScaler, https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html (Accessed: 30 November 2019).
39 | - [29] sklearn.ensemble.RandomForestClassifier, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (Accessed: 30 November 2019).
40 | - [30] LightGBM, https://lightgbm.readthedocs.io/en/latest/ (Accessed: 30 November 2019).
41 | - [31] Pandas Profiling, https://github.com/pandas-profiling/pandas-profiling (Accessed: 30 November 2019).
42 | - [32] Santander Customer Transaction Prediction, https://www.kaggle.com/c/santander-customer-transaction-prediction (Accessed: 30 November 2019).
43 | - [33] IEEE-CIS Fraud Detection, https://www.kaggle.com/c/ieee-fraud-detection (Accessed: 30 November 2019).
44 | - [34] Home Credit Default Risk, https://www.kaggle.com/c/home-credit-default-risk (Accessed: 30 November 2019).
45 | - [35] Deterministic neural networks using PyTorch, https://www.kaggle.com/bminixhofer/deterministic-neural-networks-using-pytorch (Accessed: 30 November 2019).
46 | - [36] 門脇大輔・阪田隆司・保坂桂佑・平松雄司,『Kaggleで勝つデータ分析の技術』, 技術評論社, 2019
47 | - [37] 著:Alice Zheng, Amanda Casari, 訳:株式会社ホクソエム, 『機械学習のための特徴量エンジニアリング』, オライリージャパン, 2019
48 | - [38] 最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング, https://www.slideshare.net/mlm_kansai/kaggle-138546659 (Accessed: 30 November 2019).
49 | - [39] 【随時更新】Kaggleテーブルデータコンペできっと役立つTipsまとめ, https://naotaka1128.hatenadiary.jp/entry/kaggle-compe-tips (Accessed: 30 November 2019).
50 | - [40] nejumi/kaggle_memo, https://github.com/nejumi/kaggle_memo (Accessed: 30 November 2019).
51 | - [41] 本橋智光,『前処理大全』, 技術評論社, 2018
52 | - [42] Instacart Market Basket Analysis, https://www.kaggle.com/c/instacart-market-basket-analysis (Accessed: 30 November 2019).
53 | - [43] 第2回:「Kaggle」の面白さとは--食品宅配サービスの購買予測コンペで考える -, https://japan.zdnet.com/article/35124706/ (Accessed: 30 November 2019).
54 | - [44] PLAsTiCC Astronomical Classification, https://www.kaggle.com/c/PLAsTiCC-2018 (Accessed: 30 November 2019).
55 | - [45] 半田利弘, 『基礎からわかる天文学』, 誠文堂新光社, 2011
56 | - [46] Python-package Introduction, https://lightgbm.readthedocs.io/en/latest/Python-Intro.html (Accessed: 30 November 2019).
57 | - [47] Supervised learning, https://scikit-learn.org/stable/supervised_learning.html (Accessed: 30 November 2019).
58 | - [48] lightgbm カテゴリカル変数と欠損値の扱いについて+α, https://tebasakisan.hatenadiary.com/entry/2019/01/27/222102 (Accessed: 30 November 2019).
59 | - [49] XGBoost, https://xgboost.readthedocs.io/en/latest/ (Accessed: 30 November 2019).
60 | - [50] CatBoost, https://catboost.ai/ (Accessed: 30 November 2019).
61 | - [51] PyTorch, https://pytorch.org/ (Accessed: 30 November 2019).
62 | - [52] TensorFlow, https://www.tensorflow.org/ (Accessed: 30 November 2019).
63 | - [53] LightGBM Parameters, https://lightgbm.readthedocs.io/en/latest/Parameters.html (Accessed: 30 November 2019).
64 | - [54] LightGBM Parameters-Tuning, https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html (Accessed: 30 November 2019).
65 | - [55] sklearn.model_selection.GridSearchCV, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html (Accessed: 30 November 2019).
66 | - [56] Bayesian Optimization, https://github.com/fmfn/BayesianOptimization (Accessed: 30 November 2019).
67 | - [57] Hyperopt, https://github.com/hyperopt/hyperopt (Accessed: 30 November 2019).
68 | - [58] Optuna, https://optuna.org/ (Accessed: 30 November 2019).
69 | - [59] Optuna Trial, https://optuna.readthedocs.io/en/latest/reference/trial.html (Accessed: 30 November 2019).
70 | - [60] Optunaでrandomのseedを固定する方法, https://qiita.com/phorizon20/items/1b795beb202c2dc378ed (Accessed: 30 November 2019).
71 | - [61] 勾配ブースティングで大事なパラメータの気持ち, https://nykergoto.hatenablog.jp/entry/2019/03/29/勾配ブースティングで大事なパラメータの気持ち (Accessed: 30 November 2019).
72 | - [62] 有名ライブラリと比較したLightGBMの現在, https://alphaimpact.jp/downloads/pydata20190927.pdf (Accessed: 30 November 2019).
73 | - [63] Recruit Restaurant Visitor Forecasting, https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting (Accessed: 30 November 2019).
74 | - [64] Neko kin, https://www.slideshare.net/ShotaOkubo/neko-kin-96769953 (Accessed: 30 November 2019).
75 | - [65] sklearn.model_selection.TimeSeriesSplit, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html (Accessed: 30 November 2019).
76 | - [66] State Farm Distracted Driver Detection, https://www.kaggle.com/c/state-farm-distracted-driver-detection (Accessed: 30 November 2019).
77 | - [67] Kaggle State Farm Distracted Driver Detection, https://speakerdeck.com/iwiwi/kaggle-state-farm-distracted-driver-detection (Accessed: 30 November 2019).
78 | - [68] sklearn.model_selection.GroupKFold, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html (Accessed: 30 November 2019).
79 | - [69] Profiling Top Kagglers: Bestfitting, Currently #1 in the World, https://medium.com/kaggle-blog/profiling-top-kagglers-bestfitting-currently-1-in-the-world-58cc0e187b (Accessed: 30 November 2019).
80 | - [70] Kaggle Ensembling Guide, http://web.archive.org/web/20210727094233/https://mlwave.com/kaggle-ensembling-guide/ (Accessed: 14 May 2023).
81 | - [71] Avito Demand Prediction Challenge, https://www.kaggle.com/c/avito-demand-prediction (Accessed: 30 November 2019).
82 | - [72] Kaggle Avito Demand Prediction Challenge 9th Place Solution, https://www.slideshare.net/JinZhan/kaggle-avito-demand-prediction-challenge-9th-place-solution-124500050 (Accessed: 30 November 2019).
83 | - [73] The BigChaos Solution to the Netflix Grand Prize, https://www.asc.ohio-state.edu/statistics/statgen/joul_aut2009/BigChaos.pdf (Accessed: 14 May 2023).
84 |
85 | ## 第3章
86 |
87 | - [74] Introduction to Manual Feature Engineering, https://www.kaggle.com/willkoehrsen/introduction-to-manual-feature-engineering (Accessed: 30 November 2019).
88 | - [75] 第9回:Kaggleの「画像コンペ」とは--取り組み方と面白さを読み解く, https://japan.zdnet.com/article/35140207/ (Accessed: 30 November 2019).
89 | - [76] Adversarial Example, https://arxiv.org/abs/1312.6199 (Accessed: 30 November 2019).
90 | - [77] Generative Adversarial Network(GAN), https://arxiv.org/abs/1406.2661 (Accessed: 30 November 2019).
91 | - [78] CS231n: Convolutional Neural Networks for Visual Recognition, http://cs231n.stanford.edu/ (Accessed: 30 November 2019).
92 | - [79] Lecture 11: Detection and Segmentation, http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture11.pdf (Accessed: 30 November 2019).
93 | - [80] Neural Information Processing Systems (NeurIPS), https://nips.cc/ (Accessed: 30 November 2019).
94 | - [81] NIPS 2017: Non-targeted Adversarial Attack, https://www.kaggle.com/c/nips-2017-non-targeted-adversarial-attack/ (Accessed: 30 November 2019).
95 | - [82] NIPS’17 Adversarial Learning Competition に参戦しました, https://research.preferred.jp/2018/04/nips17-adversarial-learning-competition/ (Accessed: 30 November 2019).
96 | - [83] Explaining and Harnessing Adversarial Examples, https://arxiv.org/abs/1412.6572 (Accessed: 30 November 2019).
97 | - [84] Generative Dog Images, https://www.kaggle.com/c/generative-dog-images (Accessed: 30 November 2019).
98 | - [85] An intuitive introduction to Generative Adversarial Networks (GANs), https://www.freecodecamp.org/news/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394/ (Accessed: 30 November 2019).
99 | - [86] Generative Dog Images, https://speakerdeck.com/hirune924/generative-dog-images (Accessed: 30 November 2019).
100 | - [87] TRAINING A CLASSIFIER, https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html (Accessed: 30 November 2019).
101 | - [88] CIFAR10, https://www.cs.toronto.edu/~kriz/cifar.html (Accessed: 30 November 2019).
102 | - [89] 原田達也, 『画像認識』, 講談社, 2017
103 | - [90] Distinctive Image Features from Scale-Invariant Keypoints, https://www.robots.ox.ac.uk/~vgg/research/affine/det_eval_files/lowe_ijcv2004.pdf (Accessed: 30 November 2019).
104 | - [91] iMet 7th place solution & my approach to image data competition, https://speakerdeck.com/phalanx/imet-7th-place-solution-and-my-approach-to-image-data-competition?slide=30 (Accessed: 30 November 2019).
105 | - [92] Convolutional Neural Network (CNN), https://www.deeplearningbook.org/front_matter.pdf (Accessed: 30 November 2019).
106 | - [93] APTOS 2019 Blindness Detection, https://www.kaggle.com/c/aptos2019-blindness-detection (Accessed: 30 November 2019).
107 | - [94] TensorFlow 2.0 Question Answering, https://www.kaggle.com/c/tensorflow2-question-answering (Accessed: 30 November 2019).
108 | - [95] Quora Insincere Questions Classification, https://www.kaggle.com/c/quora-insincere-questions-classification/ (Accessed: 30 November 2019).
109 | - [96] Jigsaw Unintended Bias in Toxicity Classification, https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification (Accessed: 30 November 2019).
110 | - [97] 絵で理解するWord2vecの仕組み, https://qiita.com/Hironsan/items/11b388575a058dc8a46a (Accessed: 30 November 2019).
111 | - [98] word2vec(Skip-Gram Model)の仕組みを恐らく日本一簡潔にまとめてみたつもり, https://www.randpy.tokyo/entry/word2vec_skip_gram_model (Accessed: 30 November 2019).
112 | - [99] Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, https://arxiv.org/abs/1805.09843 (Accessed: 30 November 2019).
113 | - [100] Approaching (Almost) Any NLP Problem on Kaggle, https://www.kaggle.com/abhishek/approaching-almost-any-nlp-problem-on-kaggle (Accessed: 30 November 2019).
114 | - [101] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, https://arxiv.org/abs/1810.04805 (Accessed: 30 November 2019).
115 | - [102] XLNet: Generalized Autoregressive Pretraining for Language Understanding, https://arxiv.org/abs/1906.08237 (Accessed: 30 November 2019).
116 | - [103] ja.text8, https://github.com/Hironsan/ja.text8 (Accessed: 30 November 2019).
117 | - [104] 日本語版text8コーパスを作って分散表現を学習する, https://hironsan.hatenablog.com/entry/japanese-text8-corpus (Accessed: 30 November 2019).
118 |
119 | ## 第4章
120 |
121 | - [105] GCPとDockerでKaggle用計算環境構築, https://qiita.com/lain21/items/a33a39d465cd08b662f1 (Accessed: 30 November 2019).
122 | - [106] Kaggle用のGCP環境を手軽に構築, https://qiita.com/hiromu166/items/2a738f7be49d88d8b599 (Accessed: 30 November 2019).
123 | - [107] Kaggleの画像コンペのためのGCPインスタンス作成手順(2019年10月版), https://www.currypurin.com/entry/2019/10/10/094133 (Accessed: 24 December 2020).
124 |
125 | ### 4.4 お勧めの資料・文献・リンク
126 |
127 | - 4.4.1 kaggler-ja slack, https://yutori-datascience.hatenablog.com/entry/2017/08/23/143146
128 | - 4.4.2 kaggler-ja wiki, https://kaggler-ja.wiki/
129 | - 4.4.3 門脇大輔ら,『Kaggleで勝つデータ分析の技術』, 技術評論社, 2019, https://gihyo.jp/book/2019/978-4-297-10843-4
130 | - 4.4.4 Kaggle Tokyo Meetupの資料・動画
131 |
132 | | 回 | URL |
133 | | -- | -- |
134 | | 第1回 | https://kaggler-ja.wiki/5e82184687ef5e0040104d40 |
135 | | 第2回 | https://kaggler-ja.wiki/5e82190787ef5e0040104d45 |
136 | | 第3回 | http://yutori-datascience.hatenablog.com/entry/2017/10/29/205433 |
137 | | 第4回 | https://connpass.com/event/82458/presentation/ |
138 | | 第4回(動画) | https://www.youtube.com/watch?v=VMjnhGW2MgU&list=PLkBjLQIG@{}EjJlciM9lEz1AsuZZ8lDgyxDu |
139 | | 第5回 | https://connpass.com/event/105298/presentation/ |
140 | | 第6回 | https://connpass.com/event/132935/ |
141 |
--------------------------------------------------------------------------------
/ch02/ch02_08.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "_cell_guid": "e12020f7-4f94-4ecc-9007-9b7a6e7458a6",
7 | "_uuid": "1fecb0980d8d422ec0f005c4bfd6225385c2c60f"
8 | },
9 | "source": [
10 | "This notebook is a sample code with Japanese comments.\n",
11 | "\n",
12 | "# 2.8 三人寄れば文殊の知恵! アンサンブルを体験しよう"
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": 1,
18 | "metadata": {},
19 | "outputs": [
20 | {
21 | "name": "stdout",
22 | "output_type": "stream",
23 | "text": [
24 | "README.md submission_lightgbm_skfold.csv\r\n",
25 | "submission_lightgbm_holdout.csv submission_randomforest.csv\r\n"
26 | ]
27 | }
28 | ],
29 | "source": [
30 | "ls ../input/submit-files"
31 | ]
32 | },
33 | {
34 | "cell_type": "code",
35 | "execution_count": 2,
36 | "metadata": {},
37 | "outputs": [],
38 | "source": [
39 | "import pandas as pd\n",
40 | "\n",
41 | "\n",
42 | "sub_lgbm_sk = pd.read_csv('../input/submit-files/submission_lightgbm_skfold.csv')\n",
43 | "sub_lgbm_ho = pd.read_csv('../input/submit-files/submission_lightgbm_holdout.csv')\n",
44 | "sub_rf = pd.read_csv('../input/submit-files/submission_randomforest.csv')"
45 | ]
46 | },
47 | {
48 | "cell_type": "code",
49 | "execution_count": 3,
50 | "metadata": {
51 | "scrolled": true
52 | },
53 | "outputs": [
54 | {
55 | "data": {
56 | "text/html": [
57 | "\n",
58 | "\n",
71 | "
\n",
72 | " \n",
73 | " \n",
74 | " | \n",
75 | " PassengerId | \n",
76 | " Survived | \n",
77 | "
\n",
78 | " \n",
79 | " \n",
80 | " \n",
81 | " | 0 | \n",
82 | " 892 | \n",
83 | " 0 | \n",
84 | "
\n",
85 | " \n",
86 | " | 1 | \n",
87 | " 893 | \n",
88 | " 0 | \n",
89 | "
\n",
90 | " \n",
91 | " | 2 | \n",
92 | " 894 | \n",
93 | " 0 | \n",
94 | "
\n",
95 | " \n",
96 | " | 3 | \n",
97 | " 895 | \n",
98 | " 0 | \n",
99 | "
\n",
100 | " \n",
101 | " | 4 | \n",
102 | " 896 | \n",
103 | " 0 | \n",
104 | "
\n",
105 | " \n",
106 | "
\n",
107 | "
"
108 | ],
109 | "text/plain": [
110 | " PassengerId Survived\n",
111 | "0 892 0\n",
112 | "1 893 0\n",
113 | "2 894 0\n",
114 | "3 895 0\n",
115 | "4 896 0"
116 | ]
117 | },
118 | "execution_count": 3,
119 | "metadata": {},
120 | "output_type": "execute_result"
121 | }
122 | ],
123 | "source": [
124 | "sub_lgbm_sk.head()"
125 | ]
126 | },
127 | {
128 | "cell_type": "code",
129 | "execution_count": 4,
130 | "metadata": {
131 | "scrolled": true
132 | },
133 | "outputs": [
134 | {
135 | "data": {
136 | "text/html": [
137 | "\n",
138 | "\n",
151 | "
\n",
152 | " \n",
153 | " \n",
154 | " | \n",
155 | " sub_lgbm_sk | \n",
156 | " sub_lgbm_ho | \n",
157 | " sub_rf | \n",
158 | "
\n",
159 | " \n",
160 | " \n",
161 | " \n",
162 | " | 0 | \n",
163 | " 0 | \n",
164 | " 0 | \n",
165 | " 0 | \n",
166 | "
\n",
167 | " \n",
168 | " | 1 | \n",
169 | " 0 | \n",
170 | " 0 | \n",
171 | " 1 | \n",
172 | "
\n",
173 | " \n",
174 | " | 2 | \n",
175 | " 0 | \n",
176 | " 0 | \n",
177 | " 0 | \n",
178 | "
\n",
179 | " \n",
180 | " | 3 | \n",
181 | " 0 | \n",
182 | " 0 | \n",
183 | " 0 | \n",
184 | "
\n",
185 | " \n",
186 | " | 4 | \n",
187 | " 0 | \n",
188 | " 0 | \n",
189 | " 1 | \n",
190 | "
\n",
191 | " \n",
192 | "
\n",
193 | "
"
194 | ],
195 | "text/plain": [
196 | " sub_lgbm_sk sub_lgbm_ho sub_rf\n",
197 | "0 0 0 0\n",
198 | "1 0 0 1\n",
199 | "2 0 0 0\n",
200 | "3 0 0 0\n",
201 | "4 0 0 1"
202 | ]
203 | },
204 | "execution_count": 4,
205 | "metadata": {},
206 | "output_type": "execute_result"
207 | }
208 | ],
209 | "source": [
210 | "df = pd.DataFrame({'sub_lgbm_sk': sub_lgbm_sk['Survived'].values,\n",
211 | " 'sub_lgbm_ho': sub_lgbm_ho['Survived'].values,\n",
212 | " 'sub_rf': sub_rf['Survived'].values})\n",
213 | "df.head()"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": 5,
219 | "metadata": {},
220 | "outputs": [
221 | {
222 | "data": {
223 | "text/html": [
224 | "\n",
225 | "\n",
238 | "
\n",
239 | " \n",
240 | " \n",
241 | " | \n",
242 | " sub_lgbm_sk | \n",
243 | " sub_lgbm_ho | \n",
244 | " sub_rf | \n",
245 | "
\n",
246 | " \n",
247 | " \n",
248 | " \n",
249 | " | sub_lgbm_sk | \n",
250 | " 1.000000 | \n",
251 | " 0.883077 | \n",
252 | " 0.796033 | \n",
253 | "
\n",
254 | " \n",
255 | " | sub_lgbm_ho | \n",
256 | " 0.883077 | \n",
257 | " 1.000000 | \n",
258 | " 0.731329 | \n",
259 | "
\n",
260 | " \n",
261 | " | sub_rf | \n",
262 | " 0.796033 | \n",
263 | " 0.731329 | \n",
264 | " 1.000000 | \n",
265 | "
\n",
266 | " \n",
267 | "
\n",
268 | "
"
269 | ],
270 | "text/plain": [
271 | " sub_lgbm_sk sub_lgbm_ho sub_rf\n",
272 | "sub_lgbm_sk 1.000000 0.883077 0.796033\n",
273 | "sub_lgbm_ho 0.883077 1.000000 0.731329\n",
274 | "sub_rf 0.796033 0.731329 1.000000"
275 | ]
276 | },
277 | "execution_count": 5,
278 | "metadata": {},
279 | "output_type": "execute_result"
280 | }
281 | ],
282 | "source": [
283 | "df.corr()"
284 | ]
285 | },
286 | {
287 | "cell_type": "code",
288 | "execution_count": 6,
289 | "metadata": {
290 | "scrolled": true
291 | },
292 | "outputs": [
293 | {
294 | "data": {
295 | "text/html": [
296 | "\n",
297 | "\n",
310 | "
\n",
311 | " \n",
312 | " \n",
313 | " | \n",
314 | " PassengerId | \n",
315 | " Survived | \n",
316 | "
\n",
317 | " \n",
318 | " \n",
319 | " \n",
320 | " | 0 | \n",
321 | " 892 | \n",
322 | " 0 | \n",
323 | "
\n",
324 | " \n",
325 | " | 1 | \n",
326 | " 893 | \n",
327 | " 1 | \n",
328 | "
\n",
329 | " \n",
330 | " | 2 | \n",
331 | " 894 | \n",
332 | " 0 | \n",
333 | "
\n",
334 | " \n",
335 | " | 3 | \n",
336 | " 895 | \n",
337 | " 0 | \n",
338 | "
\n",
339 | " \n",
340 | " | 4 | \n",
341 | " 896 | \n",
342 | " 1 | \n",
343 | "
\n",
344 | " \n",
345 | "
\n",
346 | "
"
347 | ],
348 | "text/plain": [
349 | " PassengerId Survived\n",
350 | "0 892 0\n",
351 | "1 893 1\n",
352 | "2 894 0\n",
353 | "3 895 0\n",
354 | "4 896 1"
355 | ]
356 | },
357 | "execution_count": 6,
358 | "metadata": {},
359 | "output_type": "execute_result"
360 | }
361 | ],
362 | "source": [
363 | "sub = pd.read_csv('../input/titanic/gender_submission.csv')\n",
364 | "sub['Survived'] = sub_lgbm_sk['Survived'] + sub_lgbm_ho['Survived'] + sub_rf['Survived']\n",
365 | "sub.head()"
366 | ]
367 | },
368 | {
369 | "cell_type": "code",
370 | "execution_count": 7,
371 | "metadata": {
372 | "scrolled": true
373 | },
374 | "outputs": [
375 | {
376 | "data": {
377 | "text/html": [
378 | "\n",
379 | "\n",
392 | "
\n",
393 | " \n",
394 | " \n",
395 | " | \n",
396 | " PassengerId | \n",
397 | " Survived | \n",
398 | "
\n",
399 | " \n",
400 | " \n",
401 | " \n",
402 | " | 0 | \n",
403 | " 892 | \n",
404 | " 0 | \n",
405 | "
\n",
406 | " \n",
407 | " | 1 | \n",
408 | " 893 | \n",
409 | " 0 | \n",
410 | "
\n",
411 | " \n",
412 | " | 2 | \n",
413 | " 894 | \n",
414 | " 0 | \n",
415 | "
\n",
416 | " \n",
417 | " | 3 | \n",
418 | " 895 | \n",
419 | " 0 | \n",
420 | "
\n",
421 | " \n",
422 | " | 4 | \n",
423 | " 896 | \n",
424 | " 0 | \n",
425 | "
\n",
426 | " \n",
427 | "
\n",
428 | "
"
429 | ],
430 | "text/plain": [
431 | " PassengerId Survived\n",
432 | "0 892 0\n",
433 | "1 893 0\n",
434 | "2 894 0\n",
435 | "3 895 0\n",
436 | "4 896 0"
437 | ]
438 | },
439 | "execution_count": 7,
440 | "metadata": {},
441 | "output_type": "execute_result"
442 | }
443 | ],
444 | "source": [
445 | "sub['Survived'] = (sub['Survived'] >= 2).astype(int)\n",
446 | "sub.to_csv('submission_lightgbm_ensemble.csv', index=False)\n",
447 | "sub.head()"
448 | ]
449 | },
450 | {
451 | "cell_type": "code",
452 | "execution_count": null,
453 | "metadata": {},
454 | "outputs": [],
455 | "source": []
456 | }
457 | ],
458 | "metadata": {
459 | "kernelspec": {
460 | "display_name": "Python 3",
461 | "language": "python",
462 | "name": "python3"
463 | },
464 | "language_info": {
465 | "codemirror_mode": {
466 | "name": "ipython",
467 | "version": 3
468 | },
469 | "file_extension": ".py",
470 | "mimetype": "text/x-python",
471 | "name": "python",
472 | "nbconvert_exporter": "python",
473 | "pygments_lexer": "ipython3",
474 | "version": "3.6.6"
475 | }
476 | },
477 | "nbformat": 4,
478 | "nbformat_minor": 1
479 | }
480 |
--------------------------------------------------------------------------------
/ch02/ch02_05.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "_cell_guid": "e12020f7-4f94-4ecc-9007-9b7a6e7458a6",
7 | "_uuid": "1fecb0980d8d422ec0f005c4bfd6225385c2c60f"
8 | },
9 | "source": [
10 | "This notebook is a sample code with Japanese comments.\n",
11 | "\n",
12 | "# 2.5 勾配ブースティングが最強?! いろいろな機械学習アルゴリズムを使ってみよう"
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": 1,
18 | "metadata": {},
19 | "outputs": [],
20 | "source": [
21 | "import numpy as np\n",
22 | "import pandas as pd\n",
23 | "\n",
24 | "\n",
25 | "train = pd.read_csv('../input/titanic/train.csv')\n",
26 | "test = pd.read_csv('../input/titanic/test.csv')\n",
27 | "gender_submission = pd.read_csv('../input/titanic/gender_submission.csv')\n",
28 | "\n",
29 | "data = pd.concat([train, test], sort=False)\n",
30 | "\n",
31 | "data['Sex'].replace(['male', 'female'], [0, 1], inplace=True)\n",
32 | "data['Embarked'].fillna(('S'), inplace=True)\n",
33 | "data['Embarked'] = data['Embarked'].map({'S': 0, 'C': 1, 'Q': 2}).astype(int)\n",
34 | "data['Fare'].fillna(np.mean(data['Fare']), inplace=True)\n",
35 | "data['Age'].fillna(data['Age'].median(), inplace=True)\n",
36 | "data['FamilySize'] = data['Parch'] + data['SibSp'] + 1\n",
37 | "data['IsAlone'] = 0\n",
38 | "data.loc[data['FamilySize'] == 1, 'IsAlone'] = 1"
39 | ]
40 | },
41 | {
42 | "cell_type": "code",
43 | "execution_count": 2,
44 | "metadata": {},
45 | "outputs": [
46 | {
47 | "data": {
48 | "text/html": [
49 | "\n",
50 | "\n",
63 | "
\n",
64 | " \n",
65 | " \n",
66 | " | \n",
67 | " PassengerId | \n",
68 | " Survived | \n",
69 | " Pclass | \n",
70 | " Name | \n",
71 | " Sex | \n",
72 | " Age | \n",
73 | " SibSp | \n",
74 | " Parch | \n",
75 | " Ticket | \n",
76 | " Fare | \n",
77 | " Cabin | \n",
78 | " Embarked | \n",
79 | " FamilySize | \n",
80 | " IsAlone | \n",
81 | "
\n",
82 | " \n",
83 | " \n",
84 | " \n",
85 | " | 0 | \n",
86 | " 1 | \n",
87 | " 0.0 | \n",
88 | " 3 | \n",
89 | " Braund, Mr. Owen Harris | \n",
90 | " 0 | \n",
91 | " 22.0 | \n",
92 | " 1 | \n",
93 | " 0 | \n",
94 | " A/5 21171 | \n",
95 | " 7.2500 | \n",
96 | " NaN | \n",
97 | " 0 | \n",
98 | " 2 | \n",
99 | " 0 | \n",
100 | "
\n",
101 | " \n",
102 | " | 1 | \n",
103 | " 2 | \n",
104 | " 1.0 | \n",
105 | " 1 | \n",
106 | " Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
107 | " 1 | \n",
108 | " 38.0 | \n",
109 | " 1 | \n",
110 | " 0 | \n",
111 | " PC 17599 | \n",
112 | " 71.2833 | \n",
113 | " C85 | \n",
114 | " 1 | \n",
115 | " 2 | \n",
116 | " 0 | \n",
117 | "
\n",
118 | " \n",
119 | " | 2 | \n",
120 | " 3 | \n",
121 | " 1.0 | \n",
122 | " 3 | \n",
123 | " Heikkinen, Miss. Laina | \n",
124 | " 1 | \n",
125 | " 26.0 | \n",
126 | " 0 | \n",
127 | " 0 | \n",
128 | " STON/O2. 3101282 | \n",
129 | " 7.9250 | \n",
130 | " NaN | \n",
131 | " 0 | \n",
132 | " 1 | \n",
133 | " 1 | \n",
134 | "
\n",
135 | " \n",
136 | " | 3 | \n",
137 | " 4 | \n",
138 | " 1.0 | \n",
139 | " 1 | \n",
140 | " Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
141 | " 1 | \n",
142 | " 35.0 | \n",
143 | " 1 | \n",
144 | " 0 | \n",
145 | " 113803 | \n",
146 | " 53.1000 | \n",
147 | " C123 | \n",
148 | " 0 | \n",
149 | " 2 | \n",
150 | " 0 | \n",
151 | "
\n",
152 | " \n",
153 | " | 4 | \n",
154 | " 5 | \n",
155 | " 0.0 | \n",
156 | " 3 | \n",
157 | " Allen, Mr. William Henry | \n",
158 | " 0 | \n",
159 | " 35.0 | \n",
160 | " 0 | \n",
161 | " 0 | \n",
162 | " 373450 | \n",
163 | " 8.0500 | \n",
164 | " NaN | \n",
165 | " 0 | \n",
166 | " 1 | \n",
167 | " 1 | \n",
168 | "
\n",
169 | " \n",
170 | "
\n",
171 | "
"
172 | ],
173 | "text/plain": [
174 | " PassengerId Survived Pclass \\\n",
175 | "0 1 0.0 3 \n",
176 | "1 2 1.0 1 \n",
177 | "2 3 1.0 3 \n",
178 | "3 4 1.0 1 \n",
179 | "4 5 0.0 3 \n",
180 | "\n",
181 | " Name Sex Age SibSp Parch \\\n",
182 | "0 Braund, Mr. Owen Harris 0 22.0 1 0 \n",
183 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1 38.0 1 0 \n",
184 | "2 Heikkinen, Miss. Laina 1 26.0 0 0 \n",
185 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1 35.0 1 0 \n",
186 | "4 Allen, Mr. William Henry 0 35.0 0 0 \n",
187 | "\n",
188 | " Ticket Fare Cabin Embarked FamilySize IsAlone \n",
189 | "0 A/5 21171 7.2500 NaN 0 2 0 \n",
190 | "1 PC 17599 71.2833 C85 1 2 0 \n",
191 | "2 STON/O2. 3101282 7.9250 NaN 0 1 1 \n",
192 | "3 113803 53.1000 C123 0 2 0 \n",
193 | "4 373450 8.0500 NaN 0 1 1 "
194 | ]
195 | },
196 | "execution_count": 2,
197 | "metadata": {},
198 | "output_type": "execute_result"
199 | }
200 | ],
201 | "source": [
202 | "data.head()"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": 3,
208 | "metadata": {},
209 | "outputs": [],
210 | "source": [
211 | "delete_columns = ['Name', 'PassengerId', 'Ticket', 'Cabin']\n",
212 | "data.drop(delete_columns, axis=1, inplace=True)\n",
213 | "\n",
214 | "train = data[:len(train)]\n",
215 | "test = data[len(train):]\n",
216 | "\n",
217 | "y_train = train['Survived']\n",
218 | "X_train = train.drop('Survived', axis=1)\n",
219 | "X_test = test.drop('Survived', axis=1)"
220 | ]
221 | },
222 | {
223 | "cell_type": "code",
224 | "execution_count": 4,
225 | "metadata": {},
226 | "outputs": [
227 | {
228 | "data": {
229 | "text/html": [
230 | "\n",
231 | "\n",
244 | "
\n",
245 | " \n",
246 | " \n",
247 | " | \n",
248 | " Pclass | \n",
249 | " Sex | \n",
250 | " Age | \n",
251 | " SibSp | \n",
252 | " Parch | \n",
253 | " Fare | \n",
254 | " Embarked | \n",
255 | " FamilySize | \n",
256 | " IsAlone | \n",
257 | "
\n",
258 | " \n",
259 | " \n",
260 | " \n",
261 | " | 0 | \n",
262 | " 3 | \n",
263 | " 0 | \n",
264 | " 22.0 | \n",
265 | " 1 | \n",
266 | " 0 | \n",
267 | " 7.2500 | \n",
268 | " 0 | \n",
269 | " 2 | \n",
270 | " 0 | \n",
271 | "
\n",
272 | " \n",
273 | " | 1 | \n",
274 | " 1 | \n",
275 | " 1 | \n",
276 | " 38.0 | \n",
277 | " 1 | \n",
278 | " 0 | \n",
279 | " 71.2833 | \n",
280 | " 1 | \n",
281 | " 2 | \n",
282 | " 0 | \n",
283 | "
\n",
284 | " \n",
285 | " | 2 | \n",
286 | " 3 | \n",
287 | " 1 | \n",
288 | " 26.0 | \n",
289 | " 0 | \n",
290 | " 0 | \n",
291 | " 7.9250 | \n",
292 | " 0 | \n",
293 | " 1 | \n",
294 | " 1 | \n",
295 | "
\n",
296 | " \n",
297 | " | 3 | \n",
298 | " 1 | \n",
299 | " 1 | \n",
300 | " 35.0 | \n",
301 | " 1 | \n",
302 | " 0 | \n",
303 | " 53.1000 | \n",
304 | " 0 | \n",
305 | " 2 | \n",
306 | " 0 | \n",
307 | "
\n",
308 | " \n",
309 | " | 4 | \n",
310 | " 3 | \n",
311 | " 0 | \n",
312 | " 35.0 | \n",
313 | " 0 | \n",
314 | " 0 | \n",
315 | " 8.0500 | \n",
316 | " 0 | \n",
317 | " 1 | \n",
318 | " 1 | \n",
319 | "
\n",
320 | " \n",
321 | "
\n",
322 | "
"
323 | ],
324 | "text/plain": [
325 | " Pclass Sex Age SibSp Parch Fare Embarked FamilySize IsAlone\n",
326 | "0 3 0 22.0 1 0 7.2500 0 2 0\n",
327 | "1 1 1 38.0 1 0 71.2833 1 2 0\n",
328 | "2 3 1 26.0 0 0 7.9250 0 1 1\n",
329 | "3 1 1 35.0 1 0 53.1000 0 2 0\n",
330 | "4 3 0 35.0 0 0 8.0500 0 1 1"
331 | ]
332 | },
333 | "execution_count": 4,
334 | "metadata": {},
335 | "output_type": "execute_result"
336 | }
337 | ],
338 | "source": [
339 | "X_train.head()"
340 | ]
341 | },
342 | {
343 | "cell_type": "markdown",
344 | "metadata": {},
345 | "source": [
346 | "# sklearn"
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": 5,
352 | "metadata": {},
353 | "outputs": [],
354 | "source": [
355 | "from sklearn.linear_model import LogisticRegression\n",
356 | "\n",
357 | "\n",
358 | "clf = LogisticRegression(penalty='l2', solver='sag', random_state=0)"
359 | ]
360 | },
361 | {
362 | "cell_type": "code",
363 | "execution_count": 6,
364 | "metadata": {},
365 | "outputs": [],
366 | "source": [
367 | "from sklearn.ensemble import RandomForestClassifier\n",
368 | "\n",
369 | "\n",
370 | "clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)"
371 | ]
372 | },
373 | {
374 | "cell_type": "code",
375 | "execution_count": 7,
376 | "metadata": {},
377 | "outputs": [],
378 | "source": [
379 | "clf.fit(X_train, y_train)\n",
380 | "y_pred = clf.predict(X_test)"
381 | ]
382 | },
383 | {
384 | "cell_type": "code",
385 | "execution_count": 8,
386 | "metadata": {},
387 | "outputs": [
388 | {
389 | "data": {
390 | "text/plain": [
391 | "array([0., 1., 0., 0., 1., 0., 1., 0., 1., 0.])"
392 | ]
393 | },
394 | "execution_count": 8,
395 | "metadata": {},
396 | "output_type": "execute_result"
397 | }
398 | ],
399 | "source": [
400 | "y_pred[:10]"
401 | ]
402 | },
403 | {
404 | "cell_type": "code",
405 | "execution_count": 9,
406 | "metadata": {},
407 | "outputs": [],
408 | "source": [
409 | "sub = pd.read_csv('../input/titanic/gender_submission.csv')"
410 | ]
411 | },
412 | {
413 | "cell_type": "code",
414 | "execution_count": 10,
415 | "metadata": {},
416 | "outputs": [
417 | {
418 | "data": {
419 | "text/html": [
420 | "\n",
421 | "\n",
434 | "
\n",
435 | " \n",
436 | " \n",
437 | " | \n",
438 | " PassengerId | \n",
439 | " Survived | \n",
440 | "
\n",
441 | " \n",
442 | " \n",
443 | " \n",
444 | " | 0 | \n",
445 | " 892 | \n",
446 | " 0 | \n",
447 | "
\n",
448 | " \n",
449 | " | 1 | \n",
450 | " 893 | \n",
451 | " 1 | \n",
452 | "
\n",
453 | " \n",
454 | " | 2 | \n",
455 | " 894 | \n",
456 | " 0 | \n",
457 | "
\n",
458 | " \n",
459 | " | 3 | \n",
460 | " 895 | \n",
461 | " 0 | \n",
462 | "
\n",
463 | " \n",
464 | " | 4 | \n",
465 | " 896 | \n",
466 | " 1 | \n",
467 | "
\n",
468 | " \n",
469 | "
\n",
470 | "
"
471 | ],
472 | "text/plain": [
473 | " PassengerId Survived\n",
474 | "0 892 0\n",
475 | "1 893 1\n",
476 | "2 894 0\n",
477 | "3 895 0\n",
478 | "4 896 1"
479 | ]
480 | },
481 | "execution_count": 10,
482 | "metadata": {},
483 | "output_type": "execute_result"
484 | }
485 | ],
486 | "source": [
487 | "sub['Survived'] = list(map(int, y_pred))\n",
488 | "sub.to_csv('submission_randomforest.csv', index=False)\n",
489 | "sub.head()"
490 | ]
491 | },
492 | {
493 | "cell_type": "markdown",
494 | "metadata": {},
495 | "source": [
496 | "# LightGBM"
497 | ]
498 | },
499 | {
500 | "cell_type": "code",
501 | "execution_count": 11,
502 | "metadata": {},
503 | "outputs": [],
504 | "source": [
505 | "from sklearn.model_selection import train_test_split\n",
506 | "\n",
507 | "\n",
508 | "X_train, X_valid, y_train, y_valid = \\\n",
509 | " train_test_split(X_train, y_train, test_size=0.3,\n",
510 | " random_state=0, stratify=y_train)"
511 | ]
512 | },
513 | {
514 | "cell_type": "code",
515 | "execution_count": 12,
516 | "metadata": {},
517 | "outputs": [],
518 | "source": [
519 | "categorical_features = ['Embarked', 'Pclass', 'Sex']"
520 | ]
521 | },
522 | {
523 | "cell_type": "code",
524 | "execution_count": 13,
525 | "metadata": {},
526 | "outputs": [
527 | {
528 | "name": "stdout",
529 | "output_type": "stream",
530 | "text": [
531 | "Training until validation scores don't improve for 10 rounds\n",
532 | "[10]\ttraining's binary_logloss: 0.425241\tvalid_1's binary_logloss: 0.478975\n",
533 | "[20]\ttraining's binary_logloss: 0.344972\tvalid_1's binary_logloss: 0.444039\n",
534 | "[30]\ttraining's binary_logloss: 0.301357\tvalid_1's binary_logloss: 0.436304\n",
535 | "[40]\ttraining's binary_logloss: 0.265535\tvalid_1's binary_logloss: 0.438139\n",
536 | "Early stopping, best iteration is:\n",
537 | "[38]\ttraining's binary_logloss: 0.271328\tvalid_1's binary_logloss: 0.435633\n"
538 | ]
539 | },
540 | {
541 | "name": "stderr",
542 | "output_type": "stream",
543 | "text": [
544 | "/opt/conda/lib/python3.6/site-packages/lightgbm/basic.py:1243: UserWarning: Using categorical_feature in Dataset.\n",
545 | " warnings.warn('Using categorical_feature in Dataset.')\n"
546 | ]
547 | }
548 | ],
549 | "source": [
550 | "import lightgbm as lgb\n",
551 | "\n",
552 | "\n",
553 | "lgb_train = lgb.Dataset(X_train, y_train,\n",
554 | " categorical_feature=categorical_features)\n",
555 | "lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train,\n",
556 | " categorical_feature=categorical_features)\n",
557 | "\n",
558 | "params = {\n",
559 | " 'objective': 'binary'\n",
560 | "}\n",
561 | "\n",
562 | "model = lgb.train(params, lgb_train,\n",
563 | " valid_sets=[lgb_train, lgb_eval],\n",
564 | " verbose_eval=10,\n",
565 | " num_boost_round=1000,\n",
566 | " early_stopping_rounds=10)\n",
567 | "\n",
568 | "y_pred = model.predict(X_test, num_iteration=model.best_iteration)"
569 | ]
570 | },
571 | {
572 | "cell_type": "code",
573 | "execution_count": 14,
574 | "metadata": {},
575 | "outputs": [
576 | {
577 | "data": {
578 | "text/plain": [
579 | "array([0.0320592 , 0.34308916, 0.09903007, 0.05723199, 0.39919906,\n",
580 | " 0.22299318, 0.55036246, 0.0908458 , 0.78109016, 0.01881392])"
581 | ]
582 | },
583 | "execution_count": 14,
584 | "metadata": {},
585 | "output_type": "execute_result"
586 | }
587 | ],
588 | "source": [
589 | "y_pred[:10]"
590 | ]
591 | },
592 | {
593 | "cell_type": "code",
594 | "execution_count": 15,
595 | "metadata": {},
596 | "outputs": [
597 | {
598 | "data": {
599 | "text/plain": [
600 | "array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0])"
601 | ]
602 | },
603 | "execution_count": 15,
604 | "metadata": {},
605 | "output_type": "execute_result"
606 | }
607 | ],
608 | "source": [
609 | "y_pred = (y_pred > 0.5).astype(int)\n",
610 | "y_pred[:10]"
611 | ]
612 | },
613 | {
614 | "cell_type": "code",
615 | "execution_count": 16,
616 | "metadata": {
617 | "scrolled": true
618 | },
619 | "outputs": [
620 | {
621 | "data": {
622 | "text/html": [
623 | "\n",
624 | "\n",
637 | "
\n",
638 | " \n",
639 | " \n",
640 | " | \n",
641 | " PassengerId | \n",
642 | " Survived | \n",
643 | "
\n",
644 | " \n",
645 | " \n",
646 | " \n",
647 | " | 0 | \n",
648 | " 892 | \n",
649 | " 0 | \n",
650 | "
\n",
651 | " \n",
652 | " | 1 | \n",
653 | " 893 | \n",
654 | " 0 | \n",
655 | "
\n",
656 | " \n",
657 | " | 2 | \n",
658 | " 894 | \n",
659 | " 0 | \n",
660 | "
\n",
661 | " \n",
662 | " | 3 | \n",
663 | " 895 | \n",
664 | " 0 | \n",
665 | "
\n",
666 | " \n",
667 | " | 4 | \n",
668 | " 896 | \n",
669 | " 0 | \n",
670 | "
\n",
671 | " \n",
672 | "
\n",
673 | "
"
674 | ],
675 | "text/plain": [
676 | " PassengerId Survived\n",
677 | "0 892 0\n",
678 | "1 893 0\n",
679 | "2 894 0\n",
680 | "3 895 0\n",
681 | "4 896 0"
682 | ]
683 | },
684 | "execution_count": 16,
685 | "metadata": {},
686 | "output_type": "execute_result"
687 | }
688 | ],
689 | "source": [
690 | "sub['Survived'] = y_pred\n",
691 | "sub.to_csv('submission_lightgbm.csv', index=False)\n",
692 | "\n",
693 | "sub.head()"
694 | ]
695 | },
696 | {
697 | "cell_type": "code",
698 | "execution_count": null,
699 | "metadata": {},
700 | "outputs": [],
701 | "source": []
702 | }
703 | ],
704 | "metadata": {
705 | "file_extension": ".py",
706 | "kernelspec": {
707 | "display_name": "Python 3",
708 | "language": "python",
709 | "name": "python3"
710 | },
711 | "language_info": {
712 | "codemirror_mode": {
713 | "name": "ipython",
714 | "version": 3
715 | },
716 | "file_extension": ".py",
717 | "mimetype": "text/x-python",
718 | "name": "python",
719 | "nbconvert_exporter": "python",
720 | "pygments_lexer": "ipython3",
721 | "version": "3.6.6"
722 | },
723 | "mimetype": "text/x-python",
724 | "name": "python",
725 | "npconvert_exporter": "python",
726 | "pygments_lexer": "ipython3",
727 | "version": 3
728 | },
729 | "nbformat": 4,
730 | "nbformat_minor": 2
731 | }
732 |
--------------------------------------------------------------------------------
/ch03/ch03_02.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "This notebook is a sample code with Japanese comments.\n",
8 | "\n",
9 | "Ref: [TRAINING A CLASSIFIER](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)\n",
10 | "\n",
11 | "# 3.2 Titanicの先へ行く②! 画像データに触れてみよう"
12 | ]
13 | },
14 | {
15 | "cell_type": "code",
16 | "execution_count": 1,
17 | "metadata": {
18 | "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
19 | "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5"
20 | },
21 | "outputs": [],
22 | "source": [
23 | "import torch\n",
24 | "import torchvision\n",
25 | "import torchvision.transforms as transforms"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 2,
31 | "metadata": {
32 | "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0",
33 | "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a"
34 | },
35 | "outputs": [
36 | {
37 | "name": "stderr",
38 | "output_type": "stream",
39 | "text": [
40 | "\r",
41 | "0it [00:00, ?it/s]"
42 | ]
43 | },
44 | {
45 | "name": "stdout",
46 | "output_type": "stream",
47 | "text": [
48 | "Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz\n"
49 | ]
50 | },
51 | {
52 | "name": "stderr",
53 | "output_type": "stream",
54 | "text": [
55 | "170500096it [01:30, 4515126.28it/s] "
56 | ]
57 | },
58 | {
59 | "name": "stdout",
60 | "output_type": "stream",
61 | "text": [
62 | "Files already downloaded and verified\n"
63 | ]
64 | }
65 | ],
66 | "source": [
67 | "transform = transforms.Compose(\n",
68 | " [transforms.ToTensor(),\n",
69 | " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])\n",
70 | "\n",
71 | "trainset = torchvision.datasets.CIFAR10(root='./data', train=True,\n",
72 | " download=True, transform=transform)\n",
73 | "trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,\n",
74 | " shuffle=True, num_workers=2)\n",
75 | "\n",
76 | "testset = torchvision.datasets.CIFAR10(root='./data', train=False,\n",
77 | " download=True, transform=transform)\n",
78 | "testloader = torch.utils.data.DataLoader(testset, batch_size=4,\n",
79 | " shuffle=False, num_workers=2)\n",
80 | "\n",
81 | "classes = ('plane', 'car', 'bird', 'cat',\n",
82 | " 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 3,
88 | "metadata": {},
89 | "outputs": [
90 | {
91 | "data": {
92 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAB6CAYAAACvHqiXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJztfWmQZFd15ndzX6uy9q7qXepudQuhDUkgMEKBwIjFCGMwm7FmTFgRXsL2hCMwNjFhmHGE7ZgJbI/DZqyxMTDhYMdGgbExFqvNINQISa1Wt9TdUq+1r5mV+3Lnxzn3nZNVWdWlKtHVVb5fREdl3/vyvbu89/Kc853FWGvh4eHh4bH1EdrsAXh4eHh4vDjwL3QPDw+PbQL/Qvfw8PDYJvAvdA8PD49tAv9C9/Dw8Ngm8C90Dw8Pj20C/0L38PDw2CbY0AvdGHOvMeYZY8xpY8yHXqxBeXh4eHi8cJj1BhYZY8IAngXwegAXATwK4D3W2qdfvOF5eHh4eKwVkQ189w4Ap621zwGAMeazAO4DsOILPZVK2Vwut4FLenh4ePzHw9jY2LS1duByx23khb4TwAX1/4sAXr7aF3K5HB544IENXNLDw8PjPx4++tGPnlvLcRuxoZsObcvsN8aYB4wxR40xR0ul0gYu5+Hh4eGxGjbyQr8IYLf6/y4Ao0sPstY+aK29zVp7WyqV2sDlPDw8PDxWw0Ze6I8COGiM2W+MiQF4N4CHXpxheXh4eHi8UKzbhm6tbRhjfh3A1wGEAXzCWnv8hZ7nD58iqd1qC44Jtf+l/wAAWiFqs6rPsqXH6lOY5cdJn9Gn5HOsEUu/a5ePEWipE6vPK51SXTwYUqvTiORc//XgVFvPRz7ykeDz2KmTAIBoRCbYvXs/AKBmwkFbk7vDfN5YpRD0zU5PAwByA8NBWyQWAwBYdQ7LMoFhb6lOdrjOWH6k5XFox6twiI4L6b3ifr20Lfef4P5oP/PyK9L//uC//fdl4/js1/4KANCVSQRtE5NkLpwvlGVsvPeJKJ8xKvfCzp5+AMC+3r6g7dT58wCA0ZmFoG2xRePoSqcBAK94yaGgr1JvAABOnns+aCsUqwCAaFINOEHXjfDQatWmjJE/xrtiQVuT1+/OfdcEbXNlmt+Tl2iMqUg86MskswCAkZy0lStFulazEbS94d4PQGNu9KIMMUFrWa1WgrZGi/YsGovKeCORtuOTCdmDBB8XUjdDi5+TZlPm/PxzZ2jcKVrTO+64I+hbXFyk41tyfKlCbUk+HgDqNRrb5AQ9B+mUrB9QBwDYkOx3KErPxPETzwRtjz95GgAQi8WXzb2rq5vOmxarRTxO17j55luxXmyEFIW19msAvraRc3h4eHh4vDjY0Av9xUDT/cgZ+dV10rrRIpVxffxXHx9IzbZDm76aWfZdQQeJuONxpv1vB0nw8udoP9y2aQruHGb5gWu0kFXnZ+nv4mTQlusmSSDSMxi0hUFSSrRF0ln+knicPvcsOTC99NVvDNpiLC1pbarF8wvxJNYsoatJu1iIcNjNry5zqZI0W69Xg7ZImOaSiIsLbChE0pt1glcn7UurQlhZc8r0kKTUqssxpkTib0RJdiGWJg1rQsWyjHs+xMcPySPWlaL1m2cpEQDyJTrfwWHySNs9IBL9hVnaP9sSKbhRoWskE0qqjUV4njS/ek2Ob/HaxloylxBLltN50fLOjvI9w5JpMifz5Gkine5S56BrtApyraVIZbPqeKcOyj1s6jwXxa1lMhkAQJS1wbCSgqsVWtOmml+zSeMNqedspkDaw6nzl6ghIZL3wACtc0vtYyJOE1wsiQTdYs0pmaE5VNX9hxAdHw6L1B7i8WZ75fkyoed47jSHvr7+oG9wkI6zVu/L2vXbleBD/z08PDy2CfwL3cPDw2ObYNNNLjYSW97mTCNtfCO3OVVdmzICNa6TyqJOElhL1qjarMXk0hF2+WfboW01V/4NaF+ZJKnj5YtCSk18n0woqR5Rmw2TTHNlUjXnz48Ffclu8kiNRjvcImpdzFLz2BpTSTSUGSHKOn25OAcAOH78+0Hf2bNPAQBOnRayKWxIhb7u4C1B2+233wUAGNm9BwDQaqoxOnNMm1lvZaQSxDgmFBGW3UVjHJ0UQrPCRFxukMaTWhCTSyywEcqVwiH6nEqLueTI0A4AwA0HiKCsiyUAFSYeQxFl+nFzUA9HMk6EXNmRyk3pi4Spr6Weg1qNxnlqalbaKnThcJifs6rsT7E+DwAoZMU0UqhU2o7vhNHxieBzjE0SmYyYYSIxNuGF5B5znytsVjF6zwICVNa0XCFTiDbNJFJstmHz1JnzEv8Y5+v39qiIdcskpzLDNPlaCSZK+1JiLkGYr6+uWefvZrPdQdv+/eSIMDI8QteOC6lc4fUbGxNP76kpMrEdOnQY64WX0D08PDy2Ca5KCT0gL1cjCbT0HOogtXeSpN3H0Bp/x8yyD4rMW5uEboxzxevg3mgby0/lvtrRbXFtYI4MDTsftC2cJkl3ekxc4Jpu7VNEFDWMSE/pG0lCj0RFmgwSuXWYul2j26IjgUxIpM6nnnoMAHDs8X8HAJw5IxJ6uUhS3viYaA8z0yQlnzn5w6Dtx0f/GQDwrvf9EgDg0GHJQmEsayVWSYKrDDQdp73qSmaCtq40uW+mYzKOfKUGAFho5AEAB4aE0GyWmaCsi9TuBOeerEhqLz2wFwAw0EPfnS6I62ilQVJcvaXvJ/q7uFgL2uoR7mdN1RGnABBlLSNsFcnJH0slkcJtg8lQdkmNKg0ny0RloVQM2lqsnbRWWcewcn0MR2kcVUVohrhNuydH2cWvzBJso6Y1OToupaR8d96Gcp90nwOysyDPweilC219AGD5+EhY7vV4nLSRdIYkbmOFFG3WaWyZLtF2i7zmg/09QduO17wGAFAoEAm+qMhw55Z5/rxE9DupfSPwErqHh4fHNoF/oXt4eHhsE2y6yaUToSk+5B3MJejQ18E0E5B1HcwrnSJFO8IRsW2NfD5nfdC+70uHCoCyIgDxqkpMVid/2iqr9Fb9rkrEY4fhrNEKE+fIwWqoqNpIrawlRa1013LBeLWmqOo9vaTWhsISFepcmdstRGsxtcjAna/t+QvPBW0f+9gfAwCKefIbvvFGRVg1SV0e7hVf4mSIzA39A3LedJwi+r7+tb8FANQbYmK48abX0yiUyWU1YrzB67Kg9qw7TqRYQt1Pew4SefXk6VM0t6acv84mjnpTTEsNNp0M94mvco4jBReLfC21yY5oq+vA4xj1V+elscwRpdEs+0frZ4n7mipqOMEk8XCvrPPoLBGk1m1yVPa90GTicVGu2cV2vVJN7pmlSCj/8kaDjkulJcQ1kaTP4bCOPKZxZp3/d01MHc7k4nzVAWBhgcxvrbJE8LZ4Ds4sVFWmjNIimbQiYdmrNEdttprLSdEFl1CwqfzQm2wKU+Rsdw+ZWnqVH/qlS2QunJmZAQB0KRONG7c2w0QiG38dewndw8PDY5tg8yV0lgTavN2MizrU7nEEl5tFS94ub0snoSu0moR+OXQiWx2bZl3UqSKsXPShzs3Cv/TJ098O2iIlkoYat76dDo9IvopQa+XIu7VK6PVFIu4WxoVwmSqShDRy4EjQduw4ScnVSyS17OpV5FTU5ccR6clBE7zLyUWtsTBxpqIyQ9z26A+/E7R9/9//DQAwPEBSTvFaIZZChtZmZETcxopBRJ+sVa6LrnHy1BMAgO88/I9B3zX7bwMAJLNDMkzlHrgUoSjn3qgJ8ViN0fqlk0LIRXltbr2OJPViPh/0NViK6+2WuRzYSy6Ke3aMBG0VXsuZRZIAJy+Ji12tRG1xtQc1Xkt9Wxt2a3SugTaiCEKW8sMhISgbvGnDQ7Kmw4MkpT5xmnK5zJVl7m6xcla0u1KVxlZb5X6NxeV4x3tmu9T6MeGuycCF+bm2Ni29u1wnBU0c83FRRd5HmSits3YSUY4XqTRdP5EQTaHAUnKrJlK+0/BsnI6PR+UcjQqtd59av4OH6Lk6efJk0DY5OdU2trLSIp59lrQ6LaFnMhvPRusldA8PD49tAv9C9/Dw8Ngm2HSTS4h9P61OlhRwltrk4swqnAyqg8lFQ3jVVSJFdRRaQHIuJ2LbrEFLThGKaJKR/0L5s86SWaN/5mjQNtBPZoSTINJyMSLRZWY1//M1+qYXF6axdOS5XlLzQ4osjPPah2K0RueVuhi/5lkAQN/LRKU2rHIbq81MjhRdbsZqubTGahwXxsik8KPHZD0anBDq2DFaq5GdvUFff47OO9YUNXtqjgm2jE49SqRpiMf2b99+OOgb2UkRpW9/l6R3baxickGwj3LvlJpkgsimxGQwNjkOALhmD0Wn7t6/I+irVjjd7oKYLnYNEGG2f9fOoO3pixQd+NxFMnVcmBaTS5r91XNq2ysLtB/1kDSmMrwvbguUlay7i9NTqye9nCdzyVR+Jmjby2t+cA/FJDw3Oh30NdlEE1XOB+EYXSQT6hRHQkio1LcuSlKnuW00OMGXShzm4hlqbO7SRexTnNys7RHl/2ii1BXSceYMHaHpknnNzMj8ZsYponqH8iEvF+nZzFfZbBIXUj7B50smxWxTZRPU6dOngzYXg+DqKF9Ukdsl9unXyblcmt2NwEvoHh4eHtsEl5XQjTGfAPAWAJPW2hu4rRfA5wDsA3AWwM9ba+fWNQBXFECxayZwF9RtLKGHXQGD5YRpu7ugaftL53OXCi/rCwo0XIYwbbL0EyqRO1322Dfl/F0kgVWHbwjauuZPAAB250TSzXMq1mieJLxw9x65QGOVghhrjR6Nk3tUIybSZFeatICFM0KUdofpWnMFkkKKk5JutzRFBJ9pKAk95EQ/pR0F7CKT2x20qpo6xxm+/uio5BFxaWfn87Quzz0v0ZiRa0i6WVwQ4mx6ho5Lp0QCHBniNLE853pN5vLoI7RH97zhLTL3HincsRR1Hm9dRR/mGzTewR3XyXEtkiJn50jSHcqJhB5i0m0qL259Lj1rSJGcjz1FNWF+wBrLkQMyrniGoyZVjpFDB0mKnFORok57KJf53lEuis6FtbdXJNjxGj2qxaqQdKeZwNvB7qHXR4VAzpdoDvmiaEmOH+3qEsl1KSYmJJdLp+fRSc7anc8VfHBEoiY740yyaqI0m6Xvzs9LNKgjh53U3ElTmJ6S1MEp1jZ2qkjfCxdor2YX6Lxz80J4R5k0P3HiRNDmXBOLRXEVnuJrTPJzNaWu6Y7TFoQr5bb4SQD3Lmn7EICHrbUHATzM//fw8PDw2ERc9ifBWvtdY8y+Jc33AbibP38KwLcB/M66BhB2v9zK8OfKx2n3Pw4EcBnrtBzd4kTzbfkMOU+KCWkpnH+/Qu32eAAIuwINbbXO3B8VGMPjjYEDb5QtLnSM7LaxwX1BW40l+ePKpS3O+SES+6itHJHf1UawDp0Spixv6gSTIkkj1CVBDml215qPipSVZsGlWCNppAYlxY3TceWCSMaxLBd+0GvKknk44CD0QGhei4siCZ47R9nl5uelrcx5MFwQ0/S0rNWB/SQpNpWfYbVKF3n+rEhDfT30eW6Oznvo4H45f4ECli6cE4nKcQqd0LTL7496iKS9mbpInakBWq+ZMZK8Lo6LPbm3i/qMyqHiClv86w8fD9pOnGW7aoKkvoVFWe8Eu/Q2VPZEcPCXSrUSBB65QLGmypNTYpt1Vmt3rBXXrDxzDc6uOM+BQomq9BWLPCblrlrhQKGJuZUVc20vd1K1lq7dZ53vpsHkhrOr15TrqOW11HZ1199QWmB3Nz1fN998M4D2fZzlAKrJSdnH215KWtdeVZLP8DulFiEtOqY0Ipf10WkAerwjI3JfXbpE952zq7fPk8YbU+X3NJewXqzXhj5krR0DAP47eJnjPTw8PDx+wviJk6LGmAeMMUeNMUdLpdLlv+Dh4eHhsS6s1wo/YYwZttaOGWOGAUyudKC19kEADwLAyMjIMqNBjIkArWwEB4VEPYtNUSXv0InvAQDCDVGBmiPXAgAa10rBg3BuFx2vCwzwmVuOFFXXDDLraqI01OQ+lWuC1dRIikivzJv+U9C38OX/AQCoTUiK2uA3UyXxzw2RGcHsJvK0ERf3u2aD1LimValNpSgm1oLkAM19Jikkz+wcqY5F9RMeiRKh1UpQxJtR/nE/+C5Fb+65U0jf236GIlurytWqyfMLgmp16mBHiqpiCfNzZBqJxsTla9duStX7HOdEKZZFle3p59qLLaXejhIB9dx5UZujYbrG8CCTZHNiCuvrI2LVKtdHu4r9qlqleysWVYQV3xcLBTnvVI3U9ygf94NnTgV9/Wm6ZiKpcoakaGxzC0IIT02TSS7dS+uRb4laXl4gAahaFtOFdQSyqnEZ5mt099A5Gkq1dylyi4qcTXEul0RWyMImaM4VLhhRqekoT06pK1uGGt8DXau8QrS7oCMqdZsjBjXx6YS+6WlaZ20ucYSpNlO4aNCsql/qTBfuuRkeFrLaEaTz82IqGt5JTgkDw7uCtlEu/uFML1alz+3v7+PxiInSmX4SypXRmVVcZKt2c3Ru19rM4ojVjWC9EvpDAO7nz/cD+MqGR+Lh4eHhsSGsxW3xMyACtN8YcxHA7wP4IwCfN8Z8AMB5AO9c7wBisfbshfSRg4diQjJFp4lYKD/zXQBAVuXI2B8iKafZFOlpYg9J6+ERcSG07G7UDIoxyK9/kPNRSZ9RLjBgE6raOffb6bMAgOLZJ4I+49zLdKINlm5aygVucYZItMQT/wQAyN1wjxw/eJjnIk0u+KCtSMbKKTQQSdPaRDOSa2L+/JN07YYQjvMTlPGtzNLbpWlZP1MgaeHxLz4YtCWTNI59t79K2rpZureu8rwO+KK/xaIQoPk8SWVhJZXd/nLKtTIwQOt8/Lis6eg4Bdrs2ydSVppLocWUtFcq8Z6GqK9RExexZIwX0wrhuJqE7ta7WBAtMMpl73oGRSqbY/I2xK60kaScM84FFLq7JEiqO0Pz271DKKdjJ0jznOG9SO8QbS3EJGqzqvTXBl1L3x8uji2QdCsiTSZBz9ANw5JZcQ+XXwur5EAtVkIX+LsXJuQ+meKsk9msyomSpy9kdRTTEmjp033W+UwcqTg8LK6art9J6jpgyEm8WqJ3Lo9NldUyxM9ckwlkV2ACAJJJ2seBftmD6WmSxh/66jeCttlZuv8jXOxkevx80BdhxwyrCG/nCj2tpOzCImkBLq+PMaIRieahHDPCG7eAr8XL5T0rdN2zQruHh4eHxybAR4p6eHh4bBNsei6XNCfK75Qq1+ik/EdeBgDIzFC+j59Ni/r3EjYVNKYlV8L3uc7i98bOBG39t7wBAGC7iZRsqAg8R+rFrZBHyQUiM5qqJuHEcap7WWWf5mhZyJWpSRpH304xD1x7808BAMam5Lj8RTIfFU5RLc34RSn20PvqnwMAhA+9ImhrOXOGSqi/msnF+fc2m4uqlb7bUElMpmeJy770DEeKTog6fGQfqbKV558K2r7+J38IALjjHW8K2m5947sAAKmhwzwslYKX/+rcL/OcxhdRiTCMJ2gvX3kn1QHt7xXyyID2YFaZgwb72Yyh1sBytORCkdT4UEPGMT1D5oNqTUjR0CoEc5QjB0slMdEUOLo3V5Vx9w+Q6eLceVrHhIr66+L7s6hMDItM9h7ZI+p+qpuOG+NozOSimEsSrkaoMuFFEkzoK6LUlRQd5nXcPSj+1Ht7icAb6JZx17hohFszQFIzR1t0rfiArE+qygReXa5ZKpPpbLaxch1MnRbX+aRr8rKHi0LkVYyGM0X095MpT5OonXzZM1wIQ/u8Oz/1Vou+q+/5Jps/3LUBYG6ea9SeEmeGSoX2rbuHzCRVFVW7WKDx5lUa30xXz7I2Nxc3Xp1/yo1Xz6+uiO71wkvoHh4eHtsEmy6h98Q6uBAGeR9Esov0k2tR7nVk0k+e/UHQV1wkaWE+Lb9252coInFGHReZJGKj//bXAQCGrrk+6MsyYdZnxFc+1UVtExcl/8n5538MALgrR5LPvrSQWJ/j7IV77nl70NZ3hCp/Z1W0n10kEqbE7o2lC5KhLRYiSS0elxVptdi1U2kUWM2ln8tltSriHpfrpvFme/YGbfOzpDVMNCii7cCAkL89OdqXvHIbq16k4878y2eCtgQn/r/1536D5pYU8s0VuNDFSCbnaU2z/eIiNj1KGkqIx51MivY1NEARn/l5IZvGx0kirigJ2mUmHGU3wJ19Qgi3DBenUATbai6gjqjXZezqHDF7Mi9aYLaPix+w22BIEd9FJjTPTIiWNMsRl4mwnDfVxd+dpb/Jlowr2UX3k+ZEs0xyHt4hZKsrz3cta4a5lGTvbLFbn1WFKMoVR9LJed2IXF6YZEa0pJEcfY6pyMg4R+tOL0i07lLo6E1HinaKFI0p5wfnVphOp9u+p6HbnEugjh51/U761ZKxKyIRDsu9UOaydK94pZD9TouZnaO8QoM7hLg1hvZqsSRSu7vW/v0Sodys0UP6/HN0f+s8Ly73kc7f0t6/PngJ3cPDw2ObwL/QPTw8PLYJNt/kEnH1Q5dXhg+pytwhl2R/aB8A4HhdVPDvXCDi7vRZSbtaLLuEQqL2TV98GgCwbzeRUocGxUST4ARZ1bKoyJNskjh1UpI6VVlt3h0l9airKSTqK3/2/TSXa0V1q3Il+O6wjCPURwl8ugeo0IE9fGfQ5yIim20qpOG2tf3+hji3aSYhank2R9+N9Ujb7Dn6HOKiBk2VCCneS2sTa2nfajYZKFV66slHAADn9lFk6d5XSWJOy6ppWBUBCbOJxqUlBYAy2xRaTF7qFLzFC2ReyamoxmqDxpTJiWkBTGY7M8k5VUzg+uspJiESEUJuNUT4ntSRkfk5Wpt0XcY2ztGoiQGa396sjGeR/bkL+YWgLczyU0IVMkjxGsXYTDDQLWavDBeu6ErJczAQI1NBX1ZMBt1MwMZ53WqqNmbVJbtSBGiVTS6Vqty7UU4mZhN0zVZcmUb42cwZMY3EdpN5p1wSc+RS7N0r5j1XbEInqHKmEW1ycaYTd3ynPm1Ccf2agHVmDHe8Tjni/NZ1Sl2XbCup68Vy8rFsDxfLUPtY5XXTxW0SXPgk2y33wFwfP2ts2yopgjyZIJOSJoQrlY2nRvESuoeHh8c2waZL6N1RltCVR56T0KMql4aLonIFFbLXvizoGzJELl648PdB2wLnoogp18fuLiLKDh2gdJnVRfnVneFfz3PnJSLs5I9/CADIq7wPcS6qULj2IABg4NWvD/p6riNJu16XyaSDyFP5NXeRiC561EaVNG6X57ZpsaTRbK0clacRjnKRgLjkcinMUi6XdEqIlwyv71APE1AxVXU8S+OoVUWyW2RhKVQRCT1ZIk3pzPcp6nXw8I1BX3qAiOzunEidvfz56WefCdr6mMyu8zxzPYrU4/wr+UXZgxoTzJm4SFkNlpxf9UqKOh0fk30slOm8Xd1CxOrUzEsR5UjDpJJSF8Ls6leWR2YoQ9efXKB7Z8rK2u7hwg+DPSL1uVwkfUqzGFqg9bh9P41t/7DsWR/nC6pXRLJzUam5jJYwaY+KRfobV9J7kzWXWl32kRUc1FRkYp012hLnkimXVKk4djWdVm6zQ0w6V6IrL+ScSq2rJe2l0CSnIxc75X5xx+n8Lp3aHKHq2nR0qtMQQiFZo0TKEaW6mAatbyjKZfIU4R2J0DkiStNyOV/0OA4ePAQAOPv8WQDAM8+KC7VzI65WRbPQmsd64SV0Dw8Pj20C/0L38PDw2CbYdJNLX9z5oas0tyHno6kTPXHUFaehjTZFVckOE+n1tncKqfClf6AkXnMzKg0t++KeeJIqxoxPjAd9JY7+mlXJdWpMjGjf2WSSVOTFOymVTeWmVwd9PRVSU5u6eLddbiaxTPC6ZFvtPrQc0al8zoNKNGstaMLqXzQrdSFrs+QL21Sq4wCnFc2GaMAzBZl7qUHmg3RKxlZkX2+rK9HUufbi6UcBAJeOPRL0HbqbfHe7lK/+vp1ESP/jlFyrO0KkbIqPm5wSwnSEq9Fr9b3AUZLZLlFvK5xWtsDmjxEVrXv4Oqpc09O7L2izq1SHmZ4kM08kImaCHl6HmYLcY40SnSMTayfhCNSXUPdwks2LmqDczYmyBrleZiou18wkSe2vKH/xOKdgzXWrdLHs71x395Uad1+EzDslRbjNGI6cVWmKExHa2ygXza2WxXxULNN4ZyKy77Ewm2ZW8efXZKRLHevITkDMKfo4t4YuHa2OAI0xCZlJp5cdb1Wq7TqbCRdLNAdt7nEmGm3mcWMLGXkdhiNcLYpJ5aaKD3DPoYtSpeOjy84b4jW9804yxS7kZe7T03Nt56Iv64xr64OX0D08PDy2CTZfQufcFC0loUechK5IG5cmI8LSsq1J38VTFBUaD+0O2mIcUTc8KDkbjryE8o1cukCE2dnnpCCBkzpjiuhwrk2RqCzTDS+jHCt33PlaAEBTR8OlWOLWKWTtcgkmkCpcwQ3V5ySSRsMsP14JgMsrOepUqPTd+IDUN8xP0lxqJZXPhOdneoiIizdlns0FcgGtWiU98UjL6riWI+lYoh99QiJzd99M2kuyV6Tln34NSSuf+8I/BG2P/YC+c90hykEyslc0i0qNa0tqTSfMEYBhkbzqNZr/o4+Qa+pLbpR74T3vvYvmp6JYWy2Rkpfi1HFyR+zuF0lwmNPalvOiGbp0KlEu6mmUdBbjFK+xjNwLLmdN04gkNjhAGkh+psVzknk6CTasyDInoesoT8Mk+AwfVlFui7u5eEo+LoRfmSV4E5N7xhGlEY5KDqvw1Cg7KVR17VtXt2UVoVJL166+piY5ndat64Y6l8MgWlyfj48LpWRfcrnl+WDKHNFqzPJnz0nQOoo1wWPS421yAZ0Qp8WNqyI09bp7RuX4KLvjJlOqiAU/L13XUgGeV98lGtHD3/wOAGB2RqK5q2Vx6VwvvITu4eHhsU2wlgIXuwF8GsAOkDD5oLX2z4wxvQA+B2AfgLMAft5au3IJ8BWQ4wTyLSWnRvmXNRpWBShYEnTJ7cMq38f3jh8DAJw9LRL3oQNkN02lxL3LJa0/z9JCRCXKj7FHG4jnAAAfbklEQVSrl5aCW/wrrvMtjF8kF8n8JbrWNUekgEa5xJJB2+9kBykhkLidhK4kJf7Rb7U6SOhKAly+0Pp4npMqAtLgQIawcv9bZOm3yXkraqowQpOlTs0AGN6DppHWFkuP2Qzb4VV2y/kJKk6RUBL6oWvIlfFdb3tj0PaHxyg/zqOP/D8AwB1x4SX2XEOSdqksa2TdbavG0T9EbnSlIi3g4SOSrfLgdbe4L8pcOkhvAdhoXZkR7aTCLp59Ubmf5tnFb5HzpaRVEJaT0ENKm4nH+LtNuXaEtSTnqtlUbmyWg9YaKnitzvOLGpEYC3yvHFuY5muK9DnDxSlq6llyhS2SqjxexNB4CzWXF0meDZczR5dznM+TphfGylyEDt5pNpeL8q5Nc1SuDJtuc0ix9qy5ijxzX426zDlfJFt1iDVrF0wEiDag86Z00hScxO/s72nFA7mMkdr273LKRKPyzDkpf3aeOKG+XnFJvfvuuwEAp54R991zZ3XpyvVhLRJ6A8BvW2uPAHgFgF8zxlwP4EMAHrbWHgTwMP/fw8PDw2OTcNkXurV2zFr7GH8uADgBYCeA+wB8ig/7FIC3/aQG6eHh4eFxebwgUtQYsw/ALQAeATBkrR0D6KVvjBlc5asrojvhIkCVGsoMaFSpXa5+5Ph5UksWZ8XlMOuKZKifp3yJTAtPHhMzjFPLHMmUURF1DY6CqzdEvW2yG5GONBu9QO5/n/74xwAA7/ulXw36br39DjqXIktagRq+PLotIEXb3BbdX3W8ixRdo1eTU41DCVET072UN6YwKuvR4sjTZp3mV62KK6GrJF+eVhGGvH5dg6LCRpx7F5skTGky6CvPEVldV3JDmF0wf+Gd9wVtF8+cBAB87E//HAAwMS5EUXcfmWtKRVHtG2UaRygkt29vLxGeBw7cBAD4uXf+56AvmyFVt82t0KwcdZvmeqB1lS525gKp1+mkmBFq7HPmTHJhZX0IhV20s3KFCzuyX64dZdc2xMisUW/J/Vep0Ann8kJk5zhfjDOXAcDpEpkYFgoLfE0xl6BK50+nhIy0zGRWlPtkiAtVzJXYJKEKaFRjNJcu5f4XdvPSLrpLoCMfuznHiTa9uHVzboMAsGMH7ff4OD3f2jRS5IejqvLBODOkJpMdYxzmv53S+Orzzs7S/abvDzc2t5YLCxJV3ttLZhVtynGmGW3JG52kOUS4T59/9x4yJWbSMvdGXd4z68WaSVFjTAbAlwD8lrU2f7nj1fceMMYcNcYc1TYnDw8PD48XF2uS0I0xUdDL/O+stV/m5gljzDBL58MAJjt911r7IIAHAWBkZGRZ4occu3KFOmRb1O5a0Sj9Au/dRa54J+Yngr5rjxwBADz97LNB2+NHKdDlAOdTAIBbb6U8HyZEv7ajF48FfSdO0S+2zsPigpisFWmozmThuUuUze//PPjnQd8vlu4HALz2ta+TCfIctNRuW+0BRZ2S8zd1YJFztVpbKpeAFdXZKjPdFLwzuig/qgkmx5zLVVRl+quVC23XBiQnS1wVEjFMisVZYsvEVPEBLrVnmipxP2eC7FX5Wj74wd+mczCJ9qjSqhw5VVPFLCz72GUzkjnyrrvINfHu1/w0AGDHjj1Bn5uDu6/cyFdCvUoS3WJD9iWdocXvysiajs3RcU0O0ElVpa/KgW/xsMwzzPuhJVcXNCYuuiqAhcU9rZjNF7n8mRXpeopJ094MEd+ZjLj1uWtaVVqxWKL7v6JKnrUq1D86SVKiCamiE1zjzqigp1qDxp1UQT5LUVDl2JzEqx0MXJCRLkt3003kzDAz8zCdPyVapgso0hK3uPmqbKbBM0dtmuysstalg41cfpeUulYuRxqfk9D1vePOp4n1gQHOWKo0/F27KD9PgQXZybMXgr5njv4IAHD40LVy/O6d2CguK6EbGvXfADhhrf2Y6noIwP38+X4AX9nwaDw8PDw81o21SOivAvB+AMeMMY9z2+8B+CMAnzfGfADAeQDv/MkM0cPDw8NjLbjsC91a+29YWT+9Z6MD6OYINq0NBx9VY5ij1Hqy5G8cufmWoG9slMwfhw8fCdpe/zoa2hve9Oagbc/ufXRajn4cvfDdoO+hr3wOAPCNb0p9z+lZUud0pGic/d+PDFOekh5FQH3rK18AACyqtLzvePs7AADptCrQUG2PCNMmnRabYzQB6vJI6HwSq8GpgjpiNcHrFo2KWlmZPAsASHFOirrKARLn8e7cKdGmkxeJPKq1mcfoGhE2EWViMsapk/T7v3BOfNP791MdV12vc3CAxvbhD38QAHDijBRNOHacCNPZqamgLZ2kNb/9ZTcFbYcOUC3HNM9Fq+CdfJpXwyJH7KkysMhxLcqGMoBEU1yIhVN0lBflmnOctyOR0GmTqb9YkfspsUgXSYXoHqhUlT/1IudcUb7p5Qzt0UJIzAh19js3nCo6oXyyQ5Uan0MIt8os3Z/np4UEt7ymCNM8I8rDoBpEVwZNyMW5QENJ7vWlmJgUs2gsxnlYVF3XBkdcXlTFSGamaZ/zTEImEmLeK7h00+oamSyZAWMqlfIi1wits2lE778jQ7W5xJlVdJEMZ6ZzfY78B+Se18c704828ZZ4HNMc/zIxKev9/Bl6Jq7dL6bBYX6nYJlheu3wkaIeHh4e2wSbnssl5Sqmq19M9zmkXMtc5XgX5TYyIFXdd/LnLuWmt3sPERLDimhw5EfEkBvbwPDdQd+b30zk1aFrfxi0ff1fKXPg0adEgviZG6mAw11cXb41JRJKc5DG8fBjco5PLNCv8vvf+wtBW083XV+7SDoE0aMtkZabTZchbq0/3S5XjPxehxIsyaSEgJobJ7dC9NMapXolyi1So7Hlq9NyDtaYlLCCGLuMBiXgqsrNbIFcTCfPCMnZfw2tXyisIz9pfqkkSWO3vPRw0HfTDfS5pSR6V+wkrDQ4IZbprybfXigsE/BhpTlNz5NWF06pfB/smlhn0bWyKMefHyfpuntY3NKinJnS5XQBgDzndXHFJuqKhJ5oEoEX7hXiscDzujAmPghToyTVxliLLas9DvE9sKtH9jbZonVu5IUULbEraLYn2XYuACiwZuEcE6jfFWJZ2Zc2pfKaONfEqpJqUxm6J6MqT8o8lyZMsWSuAlyxyFJ7TD3nZUOaR0lFObdaLi+Ty5goqoVzQdb3hyNltQOAI+OdRqGfPYlwFe1rcpKek6eflsjPnhyR9kl2GS0VxSFh/z6SzPV6OF48ofLuvFB4Cd3Dw8Njm8C/0D08PDy2CTbf5MIFLkKKd3X+50YlrXfmF+fCqyOyXJKtgwevCdqcyhTSCZP4fI4jyakK3TC3AwCuve7lQdOrXv2zAICPf+LTQVs/J0dKXCTCpwHlE8tqXF0VB/jiFz8PACirxP6/+iu/BgDoYTVYq4QCpeKxuaHRXKPJhdempX6vbYjVVCPq3PS5swCA8+fILLTnJS8N+noSPF6rIgZdEi+1pkmOGnVu0S11fKxBx5dnOyT7bSPB3flcmKyc37ApJRxSc3c1WVWNVWOW+nGvn1lyZ9Lpm+cXSTUOKyuZjbQnkQurGqQTBboHUhGJwRvmAiEhIwU8mhUyv7jEU0iJOabKZPxCWdTymXnal6kZMfVVXBpffkYmykKYztXou+msxBi02JySUpGOVb4H3Trqer4Jjh3Iq1qyjXkmqYsrRzcOZiUZlfO3T6tkea4mZ1OZzhq8f60IpwlWsRTJKJk/dMrqmTlay7yO5OwjH/JQKMpzUr79bC6pq2hT55uuI1Yd4ekiVnWE6zwn23J+5oD4sJ87J4R+JkPvl54EnVeTs3v37uFxyF45v/1EXExmLxReQvfw8PDYJth0CT3B7kzaIc+VoDNtRCm38U9QWElPYZaMY31SwKDMUnK1LEREl5bIAYTD4hLlJAeriNjePkpP83sfkmiuHz96FADwRPZfAADPPnk86Du/QG59zysXOyddf+2fvha0jXDpt1/+5Qfc7NSoXHk6IWiMcUTpGhPgs1Sry/pZXrieAUm5c/1LKfXvD39ErpqnnpDI2Z++l6Jqo90SjTlx6TFqU6SNS6dS4Qhbq3OXcARjqyBSqpOt2uTn4D+ODFfrEUjmWotZLoW3l3/bGJqsKej8Qi59s1VakiMwI90caauiSHu6SMqKq9wvTUdaGp1ylqu/szYTVXU3WhyVPDkvEZcVltYTSdmDZBddI85DU0oS4nyPTzVEko6yZtGl8oiEyi76kdMhK6k5UafPSZVMOQza20Ra3BCX4sAuRc6ypK2jv0ucg0ZHYR4YINe9IrtvzhVk3FMFmvvCgqxHs+7GrUhzPp+TwvV7pFOqXOeGqCV0R5q6dL66z+V10UU1hoaoKIvO73L+HBXSOX2aItgbDSFuDd/XYcX6FgobT43iJXQPDw+PbQL/Qvfw8PDYJth0k4tTr7XKLGSoNhmwGcGp41ot58/aDBNEXSksL1KjfWhZ3VcVWFqsXue6xJRz9z1US3T3tRSZOP7Jvw36zn71qwCAyqKQok6dqyoiZyHfnqyyk0po7fLf2ojOz7oKXMKskDLbuJS6kZio6t379wIA9i7QOpx8/MdBX5nzosYyomrmel09S5W+NMrXsGR2aDTFjFVhlbpakHS4TbfmLZXSlOuFukpPbdvklkGp6ljic/5iI2zdvaAu6VIeG3VP8m2azpLJI9On0sAOEiHoKjkBQIqjTcNqb11q5BqbM5o61qBIn3u61DVz5JOeTMp5m7yW8xyhbOsqPW+e+kqq6lGak2wl1DkafI5FTiqWKqkkdUyG9gyoSOIBMi2EaitH4UZDYhpJxGltKiolcS5G90JKPbdxrjFcidI1+1S8QoPNMPm6NrlQW6ZLJUFjs58zd+qIThf5qX3O3WdNWjqS0xGV2nHBRZvqFLzOhNPfPxC01TnFsTOvxNv8yzmyOisxBjpp23rhJXQPDw+PbYJNl9ADaUtL6Cx5GfWLFVQBZ8JDEymBVKt+dV09w7ZUpUEa1eWpSp3k34mIbVktGdPna/aQ29GvBMQmcNdP/RQA4HsPfytoW+Rf8b0HxKXybW+9r20cWlroVOsyFLiSrS2CzBG7tk3WZWK1JsRLjSNVm0xYJZXbneXiB/kFkW6KFS7yoPLS2DBJXhHnKqki9vKLdP5wQbnpNegc8YiQacF+O+1IV193+WuUu6VbItPhnrGrpMVdK0JRdwHlTsdSalvOIf5PeY6Iu6TSfqYjNOdSSaTgHo6+7R0Ujc/tqWHXwIhSGl2xi5Tig+N8jVxMzjs65qRIXkd1vzZZckyompgceIyikn7LHA1qqtQWy6r0sjXqG1WaZSpGhGdI1UeV2G1CMiktKUcqWpFq63w/VVQN1DK7e7papRGVRno4Q2NL7BKifny+yONX6W1Zard8H7Vp/x2eL/f86efQPZvuPaJTATtStKq0jd6+3mXXco4Nrr5su/sk9fX3yxrpfDHrhZfQPTw8PLYJNl1CD9nlduGws43qYBL+cXOSeZtnW2BfV79wgeC//Nd56d+ln6XNnbdtxADEJcqVzAKA/fvJrv76e6TARZ2lMm27dtqIs8tpLcJ21FiCAS0bYyc0neujdrtjXkIXDKhwzo/cdZTLZXZOJLAIZ/+zKvChzrbfekhsxcUyS8acC2RuQgI86izVnnzk34O2xl9RQZDb7pEsmD1DZHcMcdGBUFzsirGIXCuYn7tl9P4FfIhbt/XLKsbFYNXlHEl2a62WRCpzJRINu2xW5kWbmVskqb0/J657CdY25ipKI4vTNdJclGKgR9zeFjgoqFrTnAWtkQ7gKsxyAQXOL5RUkp6T+nKDYmN2JdxctkMAqLJ92rLdt6Bc7JxrcSspUvs8RzNFVgl2M0buNcscSzym94VLR6q3kOU51/ivCckYuzkbaCIh6xeP0f3xzAXhaUqOB2CX0aZy9+3kyuiChnRFNWcT1+6KDk5qbwMvQ1m5Sae5IIfLLVOtyrPkXDaffvpk0LabA5Vaa6PKOsJL6B4eHh7bBP6F7uHh4bFNcFmTizEmAeC7oPreEQBftNb+vjFmP4DPAugF8BiA91urCh2uEc7c0Mnk0eaV9gK5rk6Hr2ZyeaEIyCx1paDWoDKhxNhFTLs9LY1q7GzuUWYYR/itMRqy5Vww29wWWdVUbSVLxNrOV74VANCz52DQF5+imqzNBdnSaJbIvHxZ5cFoUJ6WBqeGLSiVs8Iml7lRyW/xyF/+GQAg/Pi/Bm27DpOpKtTDdRlTQ0FfI0b5bvbfeFvQ1r//Opqn0tVfDDLUIcduiM2KmC662eVwfFxU+y52ITQ894zKf1LnyMt55cLa103nyF+UvD4VcBGGDJmqpiOSFneMC4pUFHmZ5VS6cyoFb9Ux0k0ab70k91q6i6vRq/wuyQzduzNFMREVeE+jHIFaUXOpsOmzK6FqpnIq2L5VIkVLFZ3nhXO5KFfJMBdbiaq5OEK8xC6YiwUx4Tk3y1RamcLSNPeCMmM9f4miO5vx5S6KtZpLfSt7G9TxVWS8M80sBIU2lCNAh1rA7rM2n7po9U51SeNckKPZkLG5CNhstj2i/YVgLRJ6FcBrrbU3AbgZwL3GmFcA+GMAf2KtPQhgDsAH1j0KDw8PD48NYy0l6CwAJ1JE+Z8F8FoA7+X2TwH4CICPr3cg+tcucC9UQpdL1O+EWauDjvhvm4wWeJ4tJz5Xk8wvJ7Uv7e9Eunb65e40DjdP3be0UMNKx606Rv5uuCXEVrNAWePqE1IOrlWlbY32ExnTp9wWCzNUPq6ugmDyTJhFQ0LwRpnwmZ7m/BYzIglenKO+c4os3N9N343Nnw/aKs/Q50Q3SZ/1hkhsYyygzTwtJQfvfO+vAwC6998QtNX5/ggHLpDrR18vSZ2TE6JtlEMkbe06KBkEZ8ZJoioukDRn07JnLlimXFe5OubY7TMu87O8p9NjRGiOFkX7abJkHMroYC0nYSpikPdtaCe5zrWUNJ7g7csoQrPi3DIVTz8wSORthPPNxGNKgmWp89KMFDsxHLQzlFtZQtfZPmssiSbVziSTtN8xJaFbdmGM8vpFdfm4CrsLqmymMSZF+3vkHBOc3LNQIEk9ndbZC7mco5LGFxddThkZr8vl4p5HTZi643Q+mHg8znMSEtUycd3p+XUSv3Z9nJigDK4/aQkdxpgwF4ieBPANAGcAzFtXJBG4CGDnCt99wBhz1BhzVC+Kh4eHh8eLizW90K21TWvtzQB2AbgDwJFOh63w3QettbdZa29LKZc5Dw8PD48XFy/ID91aO2+M+TaAVwDIGWMiLKXvAjC6ngG0OjhdOpXG6rwWpu1PGxnZWb1en9K9ZrNGh+M6mVccOvmadzLRdLzWCxyby8jZLAuhdO5H3wEAVJ/6ftAWHaRo1ySrf/PjosqWmNxsqQi8TC+l3t01IKRlaYzNJfOckyQnKmeMCbxbDkkl9MO7KX9MJqtMVQ2q2dqcodqjkanxoO8lI6TSn3j+aND26D9/CQDwul8+ELTZCAkLjhCOdDDJrRV9wxS9l89L1fpkmuwTDRW5WGb/8Dqv0cKimEsSTGSmh6WW59Bu+lwqiQlqYYbNXry3mS4ReqoJOseiqj1bZR/rpFUmkUU6rlkiEjKsU73yd21BiNgQ18JMpOUc8RjdAw02E8SVqFfkc7TU7Rdi3/Tn54QklvIoPC4lL7pVK1dVCmhXtCai0jHDmUWpLxoTMjLN5O/8jKxffp7MKtoMOMRFV46fontT53jq6iKTWU359rt3kDahOJOMM6VopwZHqGpi1X12xwNSb7dTPphOz/5GnDQcLiuhG2MGjDE5/pwE8DoAJwB8C8A7+LD7AXxlw6Px8PDw8Fg31iKhDwP4lKGfzBCAz1trv2qMeRrAZ40xfwDgxwD+ZiMD6Ri1qcnCgAx1RRD0r1mnX7aOVOllcblfzKVta/1VXS1itaOErk4bCMlrrUDHmk1LSQQNzrZXyIlr4tA1dwEA0jGSpCN7JN9M/SwRpcVRIW32vpQsbZGM5NKYSpA0u+OmfQCAlx8U98LMEEXRZgckA12EJa+6rsJgiWy7+PUvAABmit+WLs7qlzSSSyNfINarXhWp07nAuQjbjeRhnLpE12qpKMjSHK3fzJRoPa2YIxdZ41J5TaLO9VFlZyw1q/w9mXs8Q98Jp+lRjKho5xYTj5moyrHj+lVBB8uSYLjOJevmlDSeI7KwnpCxdadJioyrEo8NJlurvC86inSB86WoRKSIc6k8XcptKaJx0RSaHPlZUrl+Ak1Vaa/OrTHqImIbMo4Wzz2dklw4dS744QpRAEAiTf1Oq5pfEC0imSSC1EBH09JxmuMrl0nbcVJ7XBHZLnujNiG7ddAZGBfY5ddJ5joXk4soDak9eDGKtKzFy+VJALd0aH8OZE/38PDw8LgK4CNFPTw8PLYJNj0512qJsnSNB+uc0t1f3dnJNIKVz+v+diIqO6k9a008v5ofusZq1wqOWdMVO6PFJpdwQvxvd9/yagDAyA2vDNrC2REAQI3VvnBSfKy7r3sNf9gdtMX7qLZqqk+KhwzdTn+TGTKrRGLil+wi49rmyWOLKJXXhsls03cDXTOaE9I1kiFV91Bd1NVGnFRqR4QCQMPFLgSXWv8KnnniAgAg0ydJwpzqX11Q5Ji7PJN10YQQYn07aC1bKghy9AIlP8uoiMsEJ5AqF8kUEdZ5n9ytviBmh1QvRx6r4iyNhEt/TOPtzcq6JLrYTNErbS4ddLip6r+6mqlsV5nXkaV8H4XCihCucFreiJrgEhSKYvqps8klFVPXdIVpQmXVZngukbbrAECLj08ok0uCI0pbdSGwZ6co2rY7Q/fizJQknZubJ3//vl4xA7rkWZq0dKYWd+82mzralPYqk9HFKejv4qKqn9to90PXScWc/3l3TnzOdfGU9cJL6B4eHh7bBObFrJZ+OYyMjNgHHnjg8gd6eHh4eAT46Ec/+iNr7W2XO85L6B4eHh7bBP6F7uHh4bFN4F/oHh4eHtsE/oXu4eHhsU1wRUlRY8wUgCKA6csde5WjH1t7Dlt9/MDWn8NWHz+w9eewlca/11o7cLmDrugLHQCMMUfXwtZezdjqc9jq4we2/hy2+viBrT+HrT7+TvAmFw8PD49tAv9C9/Dw8Ngm2IwX+oObcM0XG1t9Dlt9/MDWn8NWHz+w9eew1ce/DFfchu7h4eHh8ZOBN7l4eHh4bBNc0Re6MeZeY8wzxpjTxpgPXclrrwfGmN3GmG8ZY04YY44bY36T23uNMd8wxpzivz2XO9dmgot8/9gY81X+/35jzCM8/s8ZY1ZOmXcVwBiTM8Z80Rhzkvfizi24B/+F76GnjDGfMcYkruZ9MMZ8whgzaYx5SrV1XHND+F/8XD9pjLl180YuWGEO/4PvoyeNMX/vqrFx3+/yHJ4xxrxhc0a9MVyxFzpXPPoLAG8EcD2A9xhjrr9S118nGgB+21p7BFRH9dd4zB8C8LC19iCAh/n/VzN+E1Q20OGPAfwJj38OwAc2ZVRrx58B+Gdr7WEAN4HmsmX2wBizE8BvALjNWnsDgDCAd+Pq3odPArh3SdtKa/5GAAf53wMAPn6Fxng5fBLL5/ANADdYa28E8CyA3wUAfq7fDeAl/J2/NEaVj9oiuJIS+h0ATltrn7PW1gB8FsB9V/D6LxjW2jFr7WP8uQB6kewEjftTfNinALxtc0Z4eRhjdgF4M4C/5v8bAK8F8EU+5GoffxeAu8AlDq21NWvtPLbQHjAiAJLGmAiAFIAxXMX7YK39LoDZJc0rrfl9AD5tCT8AFZAfxiaj0xystf/Che0B4AegAvcAzeGz1tqqtfZ5AKexBSuyXckX+k4AF9T/L3LbloAxZh+oFN8jAIastWMAvfQBDG7eyC6LPwXwQUhFyD4A8+qmvtr34RoAUwD+ls1Gf22MSWML7YG19hKA/wngPOhFvgDgR9ha+wCsvOZb9dn+JQD/xJ+36hzacCVf6KtVcr6qYYzJAPgSgN+y1uYvd/zVAmPMWwBMWmt/pJs7HHo170MEwK0APm6tvQWUOuKqNa90Atua7wOwH8AIgDTITLEUV/M+rIatdk/BGPNhkEn171xTh8Ou6jl0wpV8oV8EsFv9fxeA0St4/XXBGBMFvcz/zlr7ZW6ecCol/53crPFdBq8C8FZjzFmQieu1IIk9x6o/cPXvw0UAF621j/D/vwh6wW+VPQCA1wF43lo7Za2tA/gygFdia+0DsPKab6ln2xhzP4C3AHifFb/tLTWHlXAlX+iPAjjIzH4MREA8dAWv/4LB9ua/AXDCWvsx1fUQgPv58/0AvnKlx7YWWGt/11q7y1q7D7Te37TWvg/AtwC8gw+7ascPANbacQAXjDHXcdM9AJ7GFtkDxnkArzDGpPiecnPYMvvAWGnNHwLwi+zt8goAC840c7XBGHMvgN8B8FZrbUl1PQTg3caYuDFmP4jg/eFmjHFDsNZesX8A3gRils8A+PCVvPY6x/tTILXrSQCP8783gezQDwM4xX97N3usa5jL3QC+yp+vAd2spwF8AUB8s8d3mbHfDOAo78M/AOjZansA4KMATgJ4CsD/BRC/mvcBwGdA9v46SHr9wEprDjJX/AU/18dA3jxX6xxOg2zl7nn+3+r4D/McngHwxs0e/3r++UhRDw8Pj20CHynq4eHhsU3gX+geHh4e2wT+he7h4eGxTeBf6B4eHh7bBP6F7uHh4bFN4F/oHh4eHtsE/oXu4eHhsU3gX+geHh4e2wT/H+LPG/IciHfZAAAAAElFTkSuQmCC\n",
93 | "text/plain": [
94 | ""
95 | ]
96 | },
97 | "metadata": {
98 | "needs_background": "light"
99 | },
100 | "output_type": "display_data"
101 | },
102 | {
103 | "name": "stdout",
104 | "output_type": "stream",
105 | "text": [
106 | "plane cat deer deer\n"
107 | ]
108 | }
109 | ],
110 | "source": [
111 | "import matplotlib.pyplot as plt\n",
112 | "import numpy as np\n",
113 | "\n",
114 | "\n",
115 | "def imshow(img):\n",
116 | " img = img / 2 + 0.5\n",
117 | " npimg = img.numpy()\n",
118 | " plt.imshow(np.transpose(npimg, (1, 2, 0)))\n",
119 | " plt.show()\n",
120 | "\n",
121 | "\n",
122 | "dataiter = iter(trainloader)\n",
123 | "images, labels = dataiter.next()\n",
124 | "imshow(torchvision.utils.make_grid(images))\n",
125 | "print(' '.join('%5s' % classes[labels[j]] for j in range(4)))"
126 | ]
127 | },
128 | {
129 | "cell_type": "code",
130 | "execution_count": 4,
131 | "metadata": {},
132 | "outputs": [
133 | {
134 | "data": {
135 | "text/plain": [
136 | "torch.Size([4, 3, 32, 32])"
137 | ]
138 | },
139 | "execution_count": 4,
140 | "metadata": {},
141 | "output_type": "execute_result"
142 | }
143 | ],
144 | "source": [
145 | "images.shape"
146 | ]
147 | },
148 | {
149 | "cell_type": "code",
150 | "execution_count": 5,
151 | "metadata": {},
152 | "outputs": [
153 | {
154 | "data": {
155 | "text/plain": [
156 | "tensor([[[-0.9451, -0.9529, -0.9529, ..., -0.9765, -0.9765, -0.9765],\n",
157 | " [-0.9686, -0.9686, -0.9765, ..., -0.9922, -0.9922, -0.9922],\n",
158 | " [-0.9765, -0.9765, -0.9843, ..., -0.9922, -0.9922, -0.9922],\n",
159 | " ...,\n",
160 | " [ 0.6863, 0.6863, 0.7020, ..., 0.8196, 0.8118, 0.7961],\n",
161 | " [ 0.6863, 0.6941, 0.7098, ..., 0.8196, 0.8039, 0.7961],\n",
162 | " [ 0.6941, 0.7020, 0.7176, ..., 0.8118, 0.8039, 0.7961]],\n",
163 | "\n",
164 | " [[-0.3490, -0.3490, -0.3412, ..., -0.2941, -0.2941, -0.2941],\n",
165 | " [-0.3333, -0.3255, -0.3255, ..., -0.2784, -0.2784, -0.2784],\n",
166 | " [-0.3020, -0.3020, -0.3020, ..., -0.2549, -0.2549, -0.2549],\n",
167 | " ...,\n",
168 | " [ 0.6392, 0.6392, 0.6549, ..., 0.8039, 0.7882, 0.7804],\n",
169 | " [ 0.6392, 0.6471, 0.6627, ..., 0.8118, 0.7961, 0.7804],\n",
170 | " [ 0.6471, 0.6549, 0.6706, ..., 0.8118, 0.7961, 0.7804]],\n",
171 | "\n",
172 | " [[ 0.0745, 0.0824, 0.0980, ..., 0.1765, 0.1765, 0.1608],\n",
173 | " [ 0.0745, 0.0902, 0.1059, ..., 0.1843, 0.1765, 0.1765],\n",
174 | " [ 0.1216, 0.1216, 0.1294, ..., 0.2078, 0.2078, 0.2078],\n",
175 | " ...,\n",
176 | " [ 0.6392, 0.6392, 0.6549, ..., 0.8118, 0.7961, 0.7882],\n",
177 | " [ 0.6392, 0.6471, 0.6627, ..., 0.8118, 0.7961, 0.7882],\n",
178 | " [ 0.6471, 0.6549, 0.6706, ..., 0.8118, 0.7961, 0.7882]]])"
179 | ]
180 | },
181 | "execution_count": 5,
182 | "metadata": {},
183 | "output_type": "execute_result"
184 | },
185 | {
186 | "name": "stderr",
187 | "output_type": "stream",
188 | "text": [
189 | "\r",
190 | "170500096it [01:50, 4515126.28it/s]"
191 | ]
192 | }
193 | ],
194 | "source": [
195 | "images[0]"
196 | ]
197 | },
198 | {
199 | "cell_type": "code",
200 | "execution_count": null,
201 | "metadata": {},
202 | "outputs": [],
203 | "source": []
204 | }
205 | ],
206 | "metadata": {
207 | "kernelspec": {
208 | "display_name": "Python 3",
209 | "language": "python",
210 | "name": "python3"
211 | },
212 | "language_info": {
213 | "codemirror_mode": {
214 | "name": "ipython",
215 | "version": 3
216 | },
217 | "file_extension": ".py",
218 | "mimetype": "text/x-python",
219 | "name": "python",
220 | "nbconvert_exporter": "python",
221 | "pygments_lexer": "ipython3",
222 | "version": "3.6.6"
223 | }
224 | },
225 | "nbformat": 4,
226 | "nbformat_minor": 1
227 | }
228 |
--------------------------------------------------------------------------------
/ch03/ch03_01.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "This notebook is a sample code with Japanese comments.\n",
8 | "\n",
9 | "Ref: [Introduction to Manual Feature Engineering](https://www.kaggle.com/willkoehrsen/introduction-to-manual-feature-engineering)\n",
10 | "\n",
11 | "# 3.1 Titanicの先へ行く①! 複数テーブルを結合してみよう"
12 | ]
13 | },
14 | {
15 | "cell_type": "code",
16 | "execution_count": 1,
17 | "metadata": {
18 | "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19",
19 | "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5"
20 | },
21 | "outputs": [
22 | {
23 | "data": {
24 | "text/html": [
25 | "\n",
26 | "\n",
39 | "
\n",
40 | " \n",
41 | " \n",
42 | " | \n",
43 | " SK_ID_CURR | \n",
44 | " TARGET | \n",
45 | " NAME_CONTRACT_TYPE | \n",
46 | " CODE_GENDER | \n",
47 | " FLAG_OWN_CAR | \n",
48 | " FLAG_OWN_REALTY | \n",
49 | " CNT_CHILDREN | \n",
50 | " AMT_INCOME_TOTAL | \n",
51 | " AMT_CREDIT | \n",
52 | " AMT_ANNUITY | \n",
53 | " ... | \n",
54 | " FLAG_DOCUMENT_18 | \n",
55 | " FLAG_DOCUMENT_19 | \n",
56 | " FLAG_DOCUMENT_20 | \n",
57 | " FLAG_DOCUMENT_21 | \n",
58 | " AMT_REQ_CREDIT_BUREAU_HOUR | \n",
59 | " AMT_REQ_CREDIT_BUREAU_DAY | \n",
60 | " AMT_REQ_CREDIT_BUREAU_WEEK | \n",
61 | " AMT_REQ_CREDIT_BUREAU_MON | \n",
62 | " AMT_REQ_CREDIT_BUREAU_QRT | \n",
63 | " AMT_REQ_CREDIT_BUREAU_YEAR | \n",
64 | "
\n",
65 | " \n",
66 | " \n",
67 | " \n",
68 | " | 0 | \n",
69 | " 100002 | \n",
70 | " 1 | \n",
71 | " Cash loans | \n",
72 | " M | \n",
73 | " N | \n",
74 | " Y | \n",
75 | " 0 | \n",
76 | " 202500.0 | \n",
77 | " 406597.5 | \n",
78 | " 24700.5 | \n",
79 | " ... | \n",
80 | " 0 | \n",
81 | " 0 | \n",
82 | " 0 | \n",
83 | " 0 | \n",
84 | " 0.0 | \n",
85 | " 0.0 | \n",
86 | " 0.0 | \n",
87 | " 0.0 | \n",
88 | " 0.0 | \n",
89 | " 1.0 | \n",
90 | "
\n",
91 | " \n",
92 | " | 1 | \n",
93 | " 100003 | \n",
94 | " 0 | \n",
95 | " Cash loans | \n",
96 | " F | \n",
97 | " N | \n",
98 | " N | \n",
99 | " 0 | \n",
100 | " 270000.0 | \n",
101 | " 1293502.5 | \n",
102 | " 35698.5 | \n",
103 | " ... | \n",
104 | " 0 | \n",
105 | " 0 | \n",
106 | " 0 | \n",
107 | " 0 | \n",
108 | " 0.0 | \n",
109 | " 0.0 | \n",
110 | " 0.0 | \n",
111 | " 0.0 | \n",
112 | " 0.0 | \n",
113 | " 0.0 | \n",
114 | "
\n",
115 | " \n",
116 | " | 2 | \n",
117 | " 100004 | \n",
118 | " 0 | \n",
119 | " Revolving loans | \n",
120 | " M | \n",
121 | " Y | \n",
122 | " Y | \n",
123 | " 0 | \n",
124 | " 67500.0 | \n",
125 | " 135000.0 | \n",
126 | " 6750.0 | \n",
127 | " ... | \n",
128 | " 0 | \n",
129 | " 0 | \n",
130 | " 0 | \n",
131 | " 0 | \n",
132 | " 0.0 | \n",
133 | " 0.0 | \n",
134 | " 0.0 | \n",
135 | " 0.0 | \n",
136 | " 0.0 | \n",
137 | " 0.0 | \n",
138 | "
\n",
139 | " \n",
140 | " | 3 | \n",
141 | " 100006 | \n",
142 | " 0 | \n",
143 | " Cash loans | \n",
144 | " F | \n",
145 | " N | \n",
146 | " Y | \n",
147 | " 0 | \n",
148 | " 135000.0 | \n",
149 | " 312682.5 | \n",
150 | " 29686.5 | \n",
151 | " ... | \n",
152 | " 0 | \n",
153 | " 0 | \n",
154 | " 0 | \n",
155 | " 0 | \n",
156 | " NaN | \n",
157 | " NaN | \n",
158 | " NaN | \n",
159 | " NaN | \n",
160 | " NaN | \n",
161 | " NaN | \n",
162 | "
\n",
163 | " \n",
164 | " | 4 | \n",
165 | " 100007 | \n",
166 | " 0 | \n",
167 | " Cash loans | \n",
168 | " M | \n",
169 | " N | \n",
170 | " Y | \n",
171 | " 0 | \n",
172 | " 121500.0 | \n",
173 | " 513000.0 | \n",
174 | " 21865.5 | \n",
175 | " ... | \n",
176 | " 0 | \n",
177 | " 0 | \n",
178 | " 0 | \n",
179 | " 0 | \n",
180 | " 0.0 | \n",
181 | " 0.0 | \n",
182 | " 0.0 | \n",
183 | " 0.0 | \n",
184 | " 0.0 | \n",
185 | " 0.0 | \n",
186 | "
\n",
187 | " \n",
188 | "
\n",
189 | "
5 rows × 122 columns
\n",
190 | "
"
191 | ],
192 | "text/plain": [
193 | " SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR \\\n",
194 | "0 100002 1 Cash loans M N \n",
195 | "1 100003 0 Cash loans F N \n",
196 | "2 100004 0 Revolving loans M Y \n",
197 | "3 100006 0 Cash loans F N \n",
198 | "4 100007 0 Cash loans M N \n",
199 | "\n",
200 | " FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY \\\n",
201 | "0 Y 0 202500.0 406597.5 24700.5 \n",
202 | "1 N 0 270000.0 1293502.5 35698.5 \n",
203 | "2 Y 0 67500.0 135000.0 6750.0 \n",
204 | "3 Y 0 135000.0 312682.5 29686.5 \n",
205 | "4 Y 0 121500.0 513000.0 21865.5 \n",
206 | "\n",
207 | " ... FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 \\\n",
208 | "0 ... 0 0 0 0 \n",
209 | "1 ... 0 0 0 0 \n",
210 | "2 ... 0 0 0 0 \n",
211 | "3 ... 0 0 0 0 \n",
212 | "4 ... 0 0 0 0 \n",
213 | "\n",
214 | " AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY \\\n",
215 | "0 0.0 0.0 \n",
216 | "1 0.0 0.0 \n",
217 | "2 0.0 0.0 \n",
218 | "3 NaN NaN \n",
219 | "4 0.0 0.0 \n",
220 | "\n",
221 | " AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON \\\n",
222 | "0 0.0 0.0 \n",
223 | "1 0.0 0.0 \n",
224 | "2 0.0 0.0 \n",
225 | "3 NaN NaN \n",
226 | "4 0.0 0.0 \n",
227 | "\n",
228 | " AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR \n",
229 | "0 0.0 1.0 \n",
230 | "1 0.0 0.0 \n",
231 | "2 0.0 0.0 \n",
232 | "3 NaN NaN \n",
233 | "4 0.0 0.0 \n",
234 | "\n",
235 | "[5 rows x 122 columns]"
236 | ]
237 | },
238 | "execution_count": 1,
239 | "metadata": {},
240 | "output_type": "execute_result"
241 | }
242 | ],
243 | "source": [
244 | "import pandas as pd\n",
245 | "\n",
246 | "\n",
247 | "application_train = \\\n",
248 | " pd.read_csv('../input/home-credit-default-risk/application_train.csv')\n",
249 | "application_train.head()"
250 | ]
251 | },
252 | {
253 | "cell_type": "code",
254 | "execution_count": 2,
255 | "metadata": {},
256 | "outputs": [
257 | {
258 | "data": {
259 | "text/html": [
260 | "\n",
261 | "\n",
274 | "
\n",
275 | " \n",
276 | " \n",
277 | " | \n",
278 | " SK_ID_CURR | \n",
279 | " SK_ID_BUREAU | \n",
280 | " CREDIT_ACTIVE | \n",
281 | " CREDIT_CURRENCY | \n",
282 | " DAYS_CREDIT | \n",
283 | " CREDIT_DAY_OVERDUE | \n",
284 | " DAYS_CREDIT_ENDDATE | \n",
285 | " DAYS_ENDDATE_FACT | \n",
286 | " AMT_CREDIT_MAX_OVERDUE | \n",
287 | " CNT_CREDIT_PROLONG | \n",
288 | " AMT_CREDIT_SUM | \n",
289 | " AMT_CREDIT_SUM_DEBT | \n",
290 | " AMT_CREDIT_SUM_LIMIT | \n",
291 | " AMT_CREDIT_SUM_OVERDUE | \n",
292 | " CREDIT_TYPE | \n",
293 | " DAYS_CREDIT_UPDATE | \n",
294 | " AMT_ANNUITY | \n",
295 | "
\n",
296 | " \n",
297 | " \n",
298 | " \n",
299 | " | 0 | \n",
300 | " 215354 | \n",
301 | " 5714462 | \n",
302 | " Closed | \n",
303 | " currency 1 | \n",
304 | " -497 | \n",
305 | " 0 | \n",
306 | " -153.0 | \n",
307 | " -153.0 | \n",
308 | " NaN | \n",
309 | " 0 | \n",
310 | " 91323.0 | \n",
311 | " 0.0 | \n",
312 | " NaN | \n",
313 | " 0.0 | \n",
314 | " Consumer credit | \n",
315 | " -131 | \n",
316 | " NaN | \n",
317 | "
\n",
318 | " \n",
319 | " | 1 | \n",
320 | " 215354 | \n",
321 | " 5714463 | \n",
322 | " Active | \n",
323 | " currency 1 | \n",
324 | " -208 | \n",
325 | " 0 | \n",
326 | " 1075.0 | \n",
327 | " NaN | \n",
328 | " NaN | \n",
329 | " 0 | \n",
330 | " 225000.0 | \n",
331 | " 171342.0 | \n",
332 | " NaN | \n",
333 | " 0.0 | \n",
334 | " Credit card | \n",
335 | " -20 | \n",
336 | " NaN | \n",
337 | "
\n",
338 | " \n",
339 | " | 2 | \n",
340 | " 215354 | \n",
341 | " 5714464 | \n",
342 | " Active | \n",
343 | " currency 1 | \n",
344 | " -203 | \n",
345 | " 0 | \n",
346 | " 528.0 | \n",
347 | " NaN | \n",
348 | " NaN | \n",
349 | " 0 | \n",
350 | " 464323.5 | \n",
351 | " NaN | \n",
352 | " NaN | \n",
353 | " 0.0 | \n",
354 | " Consumer credit | \n",
355 | " -16 | \n",
356 | " NaN | \n",
357 | "
\n",
358 | " \n",
359 | " | 3 | \n",
360 | " 215354 | \n",
361 | " 5714465 | \n",
362 | " Active | \n",
363 | " currency 1 | \n",
364 | " -203 | \n",
365 | " 0 | \n",
366 | " NaN | \n",
367 | " NaN | \n",
368 | " NaN | \n",
369 | " 0 | \n",
370 | " 90000.0 | \n",
371 | " NaN | \n",
372 | " NaN | \n",
373 | " 0.0 | \n",
374 | " Credit card | \n",
375 | " -16 | \n",
376 | " NaN | \n",
377 | "
\n",
378 | " \n",
379 | " | 4 | \n",
380 | " 215354 | \n",
381 | " 5714466 | \n",
382 | " Active | \n",
383 | " currency 1 | \n",
384 | " -629 | \n",
385 | " 0 | \n",
386 | " 1197.0 | \n",
387 | " NaN | \n",
388 | " 77674.5 | \n",
389 | " 0 | \n",
390 | " 2700000.0 | \n",
391 | " NaN | \n",
392 | " NaN | \n",
393 | " 0.0 | \n",
394 | " Consumer credit | \n",
395 | " -21 | \n",
396 | " NaN | \n",
397 | "
\n",
398 | " \n",
399 | "
\n",
400 | "
"
401 | ],
402 | "text/plain": [
403 | " SK_ID_CURR SK_ID_BUREAU CREDIT_ACTIVE CREDIT_CURRENCY DAYS_CREDIT \\\n",
404 | "0 215354 5714462 Closed currency 1 -497 \n",
405 | "1 215354 5714463 Active currency 1 -208 \n",
406 | "2 215354 5714464 Active currency 1 -203 \n",
407 | "3 215354 5714465 Active currency 1 -203 \n",
408 | "4 215354 5714466 Active currency 1 -629 \n",
409 | "\n",
410 | " CREDIT_DAY_OVERDUE DAYS_CREDIT_ENDDATE DAYS_ENDDATE_FACT \\\n",
411 | "0 0 -153.0 -153.0 \n",
412 | "1 0 1075.0 NaN \n",
413 | "2 0 528.0 NaN \n",
414 | "3 0 NaN NaN \n",
415 | "4 0 1197.0 NaN \n",
416 | "\n",
417 | " AMT_CREDIT_MAX_OVERDUE CNT_CREDIT_PROLONG AMT_CREDIT_SUM \\\n",
418 | "0 NaN 0 91323.0 \n",
419 | "1 NaN 0 225000.0 \n",
420 | "2 NaN 0 464323.5 \n",
421 | "3 NaN 0 90000.0 \n",
422 | "4 77674.5 0 2700000.0 \n",
423 | "\n",
424 | " AMT_CREDIT_SUM_DEBT AMT_CREDIT_SUM_LIMIT AMT_CREDIT_SUM_OVERDUE \\\n",
425 | "0 0.0 NaN 0.0 \n",
426 | "1 171342.0 NaN 0.0 \n",
427 | "2 NaN NaN 0.0 \n",
428 | "3 NaN NaN 0.0 \n",
429 | "4 NaN NaN 0.0 \n",
430 | "\n",
431 | " CREDIT_TYPE DAYS_CREDIT_UPDATE AMT_ANNUITY \n",
432 | "0 Consumer credit -131 NaN \n",
433 | "1 Credit card -20 NaN \n",
434 | "2 Consumer credit -16 NaN \n",
435 | "3 Credit card -16 NaN \n",
436 | "4 Consumer credit -21 NaN "
437 | ]
438 | },
439 | "execution_count": 2,
440 | "metadata": {},
441 | "output_type": "execute_result"
442 | }
443 | ],
444 | "source": [
445 | "bureau = pd.read_csv('../input/home-credit-default-risk/bureau.csv')\n",
446 | "bureau.head()"
447 | ]
448 | },
449 | {
450 | "cell_type": "code",
451 | "execution_count": 3,
452 | "metadata": {},
453 | "outputs": [
454 | {
455 | "data": {
456 | "text/html": [
457 | "\n",
458 | "\n",
471 | "
\n",
472 | " \n",
473 | " \n",
474 | " | \n",
475 | " SK_ID_CURR | \n",
476 | " previous_loan_counts | \n",
477 | "
\n",
478 | " \n",
479 | " \n",
480 | " \n",
481 | " | 0 | \n",
482 | " 100001 | \n",
483 | " 7 | \n",
484 | "
\n",
485 | " \n",
486 | " | 1 | \n",
487 | " 100002 | \n",
488 | " 8 | \n",
489 | "
\n",
490 | " \n",
491 | " | 2 | \n",
492 | " 100003 | \n",
493 | " 4 | \n",
494 | "
\n",
495 | " \n",
496 | " | 3 | \n",
497 | " 100004 | \n",
498 | " 2 | \n",
499 | "
\n",
500 | " \n",
501 | " | 4 | \n",
502 | " 100005 | \n",
503 | " 3 | \n",
504 | "
\n",
505 | " \n",
506 | "
\n",
507 | "
"
508 | ],
509 | "text/plain": [
510 | " SK_ID_CURR previous_loan_counts\n",
511 | "0 100001 7\n",
512 | "1 100002 8\n",
513 | "2 100003 4\n",
514 | "3 100004 2\n",
515 | "4 100005 3"
516 | ]
517 | },
518 | "execution_count": 3,
519 | "metadata": {},
520 | "output_type": "execute_result"
521 | }
522 | ],
523 | "source": [
524 | "previous_loan_counts = \\\n",
525 | " bureau.groupby('SK_ID_CURR', as_index=False)['SK_ID_BUREAU'].count().rename(\n",
526 | " columns={'SK_ID_BUREAU': 'previous_loan_counts'})\n",
527 | "previous_loan_counts.head()"
528 | ]
529 | },
530 | {
531 | "cell_type": "code",
532 | "execution_count": 4,
533 | "metadata": {
534 | "scrolled": true
535 | },
536 | "outputs": [
537 | {
538 | "data": {
539 | "text/html": [
540 | "\n",
541 | "\n",
554 | "
\n",
555 | " \n",
556 | " \n",
557 | " | \n",
558 | " SK_ID_CURR | \n",
559 | " TARGET | \n",
560 | " NAME_CONTRACT_TYPE | \n",
561 | " CODE_GENDER | \n",
562 | " FLAG_OWN_CAR | \n",
563 | " FLAG_OWN_REALTY | \n",
564 | " CNT_CHILDREN | \n",
565 | " AMT_INCOME_TOTAL | \n",
566 | " AMT_CREDIT | \n",
567 | " AMT_ANNUITY | \n",
568 | " ... | \n",
569 | " FLAG_DOCUMENT_19 | \n",
570 | " FLAG_DOCUMENT_20 | \n",
571 | " FLAG_DOCUMENT_21 | \n",
572 | " AMT_REQ_CREDIT_BUREAU_HOUR | \n",
573 | " AMT_REQ_CREDIT_BUREAU_DAY | \n",
574 | " AMT_REQ_CREDIT_BUREAU_WEEK | \n",
575 | " AMT_REQ_CREDIT_BUREAU_MON | \n",
576 | " AMT_REQ_CREDIT_BUREAU_QRT | \n",
577 | " AMT_REQ_CREDIT_BUREAU_YEAR | \n",
578 | " previous_loan_counts | \n",
579 | "
\n",
580 | " \n",
581 | " \n",
582 | " \n",
583 | " | 0 | \n",
584 | " 100002 | \n",
585 | " 1 | \n",
586 | " Cash loans | \n",
587 | " M | \n",
588 | " N | \n",
589 | " Y | \n",
590 | " 0 | \n",
591 | " 202500.0 | \n",
592 | " 406597.5 | \n",
593 | " 24700.5 | \n",
594 | " ... | \n",
595 | " 0 | \n",
596 | " 0 | \n",
597 | " 0 | \n",
598 | " 0.0 | \n",
599 | " 0.0 | \n",
600 | " 0.0 | \n",
601 | " 0.0 | \n",
602 | " 0.0 | \n",
603 | " 1.0 | \n",
604 | " 8.0 | \n",
605 | "
\n",
606 | " \n",
607 | " | 1 | \n",
608 | " 100003 | \n",
609 | " 0 | \n",
610 | " Cash loans | \n",
611 | " F | \n",
612 | " N | \n",
613 | " N | \n",
614 | " 0 | \n",
615 | " 270000.0 | \n",
616 | " 1293502.5 | \n",
617 | " 35698.5 | \n",
618 | " ... | \n",
619 | " 0 | \n",
620 | " 0 | \n",
621 | " 0 | \n",
622 | " 0.0 | \n",
623 | " 0.0 | \n",
624 | " 0.0 | \n",
625 | " 0.0 | \n",
626 | " 0.0 | \n",
627 | " 0.0 | \n",
628 | " 4.0 | \n",
629 | "
\n",
630 | " \n",
631 | " | 2 | \n",
632 | " 100004 | \n",
633 | " 0 | \n",
634 | " Revolving loans | \n",
635 | " M | \n",
636 | " Y | \n",
637 | " Y | \n",
638 | " 0 | \n",
639 | " 67500.0 | \n",
640 | " 135000.0 | \n",
641 | " 6750.0 | \n",
642 | " ... | \n",
643 | " 0 | \n",
644 | " 0 | \n",
645 | " 0 | \n",
646 | " 0.0 | \n",
647 | " 0.0 | \n",
648 | " 0.0 | \n",
649 | " 0.0 | \n",
650 | " 0.0 | \n",
651 | " 0.0 | \n",
652 | " 2.0 | \n",
653 | "
\n",
654 | " \n",
655 | " | 3 | \n",
656 | " 100006 | \n",
657 | " 0 | \n",
658 | " Cash loans | \n",
659 | " F | \n",
660 | " N | \n",
661 | " Y | \n",
662 | " 0 | \n",
663 | " 135000.0 | \n",
664 | " 312682.5 | \n",
665 | " 29686.5 | \n",
666 | " ... | \n",
667 | " 0 | \n",
668 | " 0 | \n",
669 | " 0 | \n",
670 | " NaN | \n",
671 | " NaN | \n",
672 | " NaN | \n",
673 | " NaN | \n",
674 | " NaN | \n",
675 | " NaN | \n",
676 | " 0.0 | \n",
677 | "
\n",
678 | " \n",
679 | " | 4 | \n",
680 | " 100007 | \n",
681 | " 0 | \n",
682 | " Cash loans | \n",
683 | " M | \n",
684 | " N | \n",
685 | " Y | \n",
686 | " 0 | \n",
687 | " 121500.0 | \n",
688 | " 513000.0 | \n",
689 | " 21865.5 | \n",
690 | " ... | \n",
691 | " 0 | \n",
692 | " 0 | \n",
693 | " 0 | \n",
694 | " 0.0 | \n",
695 | " 0.0 | \n",
696 | " 0.0 | \n",
697 | " 0.0 | \n",
698 | " 0.0 | \n",
699 | " 0.0 | \n",
700 | " 1.0 | \n",
701 | "
\n",
702 | " \n",
703 | "
\n",
704 | "
5 rows × 123 columns
\n",
705 | "
"
706 | ],
707 | "text/plain": [
708 | " SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR \\\n",
709 | "0 100002 1 Cash loans M N \n",
710 | "1 100003 0 Cash loans F N \n",
711 | "2 100004 0 Revolving loans M Y \n",
712 | "3 100006 0 Cash loans F N \n",
713 | "4 100007 0 Cash loans M N \n",
714 | "\n",
715 | " FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY \\\n",
716 | "0 Y 0 202500.0 406597.5 24700.5 \n",
717 | "1 N 0 270000.0 1293502.5 35698.5 \n",
718 | "2 Y 0 67500.0 135000.0 6750.0 \n",
719 | "3 Y 0 135000.0 312682.5 29686.5 \n",
720 | "4 Y 0 121500.0 513000.0 21865.5 \n",
721 | "\n",
722 | " ... FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 \\\n",
723 | "0 ... 0 0 0 \n",
724 | "1 ... 0 0 0 \n",
725 | "2 ... 0 0 0 \n",
726 | "3 ... 0 0 0 \n",
727 | "4 ... 0 0 0 \n",
728 | "\n",
729 | " AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY \\\n",
730 | "0 0.0 0.0 \n",
731 | "1 0.0 0.0 \n",
732 | "2 0.0 0.0 \n",
733 | "3 NaN NaN \n",
734 | "4 0.0 0.0 \n",
735 | "\n",
736 | " AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON \\\n",
737 | "0 0.0 0.0 \n",
738 | "1 0.0 0.0 \n",
739 | "2 0.0 0.0 \n",
740 | "3 NaN NaN \n",
741 | "4 0.0 0.0 \n",
742 | "\n",
743 | " AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR previous_loan_counts \n",
744 | "0 0.0 1.0 8.0 \n",
745 | "1 0.0 0.0 4.0 \n",
746 | "2 0.0 0.0 2.0 \n",
747 | "3 NaN NaN 0.0 \n",
748 | "4 0.0 0.0 1.0 \n",
749 | "\n",
750 | "[5 rows x 123 columns]"
751 | ]
752 | },
753 | "execution_count": 4,
754 | "metadata": {},
755 | "output_type": "execute_result"
756 | }
757 | ],
758 | "source": [
759 | "application_train = \\\n",
760 | " pd.merge(application_train, previous_loan_counts, on='SK_ID_CURR', how='left')\n",
761 | "\n",
762 | "application_train['previous_loan_counts'].fillna(0, inplace=True)\n",
763 | "application_train.head()"
764 | ]
765 | },
766 | {
767 | "cell_type": "code",
768 | "execution_count": null,
769 | "metadata": {},
770 | "outputs": [],
771 | "source": []
772 | }
773 | ],
774 | "metadata": {
775 | "kernelspec": {
776 | "display_name": "Python 3",
777 | "language": "python",
778 | "name": "python3"
779 | },
780 | "language_info": {
781 | "codemirror_mode": {
782 | "name": "ipython",
783 | "version": 3
784 | },
785 | "file_extension": ".py",
786 | "mimetype": "text/x-python",
787 | "name": "python",
788 | "nbconvert_exporter": "python",
789 | "pygments_lexer": "ipython3",
790 | "version": "3.6.6"
791 | }
792 | },
793 | "nbformat": 4,
794 | "nbformat_minor": 1
795 | }
796 |
--------------------------------------------------------------------------------
/ch02/ch02_01.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "_cell_guid": "e12020f7-4f94-4ecc-9007-9b7a6e7458a6",
7 | "_uuid": "1fecb0980d8d422ec0f005c4bfd6225385c2c60f"
8 | },
9 | "source": [
10 | "This notebook is a sample code with Japanese comments.\n",
11 | "\n",
12 | "# 2.1 まずはsubmit! 順位表に載ってみよう"
13 | ]
14 | },
15 | {
16 | "cell_type": "code",
17 | "execution_count": 1,
18 | "metadata": {
19 | "_uuid": "6413a13d8a260043bda237e211bd962582eb7ff2"
20 | },
21 | "outputs": [],
22 | "source": [
23 | "import numpy as np\n",
24 | "import pandas as pd"
25 | ]
26 | },
27 | {
28 | "cell_type": "markdown",
29 | "metadata": {
30 | "_cell_guid": "d49e43e8-0dc0-41b7-afd0-60acc96e9f07",
31 | "_uuid": "4ecd55c5bd48390d026eeb6ae8de0a7ace0d4ada"
32 | },
33 | "source": [
34 | "## データの読み込み"
35 | ]
36 | },
37 | {
38 | "cell_type": "code",
39 | "execution_count": 2,
40 | "metadata": {},
41 | "outputs": [
42 | {
43 | "name": "stdout",
44 | "output_type": "stream",
45 | "text": [
46 | "README.md gender_submission.csv test.csv train.csv\r\n"
47 | ]
48 | }
49 | ],
50 | "source": [
51 | "!ls ../input/titanic"
52 | ]
53 | },
54 | {
55 | "cell_type": "code",
56 | "execution_count": 3,
57 | "metadata": {
58 | "_cell_guid": "9c963eb3-04ac-422c-bc0c-4373bda6880e",
59 | "_uuid": "95f406c4d2f1dab6744ea248b80e3a535c652450"
60 | },
61 | "outputs": [],
62 | "source": [
63 | "train = pd.read_csv('../input/titanic/train.csv')\n",
64 | "test = pd.read_csv('../input/titanic/test.csv')\n",
65 | "gender_submission = pd.read_csv('../input/titanic/gender_submission.csv')"
66 | ]
67 | },
68 | {
69 | "cell_type": "code",
70 | "execution_count": 4,
71 | "metadata": {},
72 | "outputs": [
73 | {
74 | "data": {
75 | "text/html": [
76 | "\n",
77 | "\n",
90 | "
\n",
91 | " \n",
92 | " \n",
93 | " | \n",
94 | " PassengerId | \n",
95 | " Survived | \n",
96 | "
\n",
97 | " \n",
98 | " \n",
99 | " \n",
100 | " | 0 | \n",
101 | " 892 | \n",
102 | " 0 | \n",
103 | "
\n",
104 | " \n",
105 | " | 1 | \n",
106 | " 893 | \n",
107 | " 1 | \n",
108 | "
\n",
109 | " \n",
110 | " | 2 | \n",
111 | " 894 | \n",
112 | " 0 | \n",
113 | "
\n",
114 | " \n",
115 | " | 3 | \n",
116 | " 895 | \n",
117 | " 0 | \n",
118 | "
\n",
119 | " \n",
120 | " | 4 | \n",
121 | " 896 | \n",
122 | " 1 | \n",
123 | "
\n",
124 | " \n",
125 | "
\n",
126 | "
"
127 | ],
128 | "text/plain": [
129 | " PassengerId Survived\n",
130 | "0 892 0\n",
131 | "1 893 1\n",
132 | "2 894 0\n",
133 | "3 895 0\n",
134 | "4 896 1"
135 | ]
136 | },
137 | "execution_count": 4,
138 | "metadata": {},
139 | "output_type": "execute_result"
140 | }
141 | ],
142 | "source": [
143 | "gender_submission.head()"
144 | ]
145 | },
146 | {
147 | "cell_type": "code",
148 | "execution_count": 5,
149 | "metadata": {},
150 | "outputs": [
151 | {
152 | "data": {
153 | "text/html": [
154 | "\n",
155 | "\n",
168 | "
\n",
169 | " \n",
170 | " \n",
171 | " | \n",
172 | " PassengerId | \n",
173 | " Survived | \n",
174 | " Pclass | \n",
175 | " Name | \n",
176 | " Sex | \n",
177 | " Age | \n",
178 | " SibSp | \n",
179 | " Parch | \n",
180 | " Ticket | \n",
181 | " Fare | \n",
182 | " Cabin | \n",
183 | " Embarked | \n",
184 | "
\n",
185 | " \n",
186 | " \n",
187 | " \n",
188 | " | 0 | \n",
189 | " 1 | \n",
190 | " 0 | \n",
191 | " 3 | \n",
192 | " Braund, Mr. Owen Harris | \n",
193 | " male | \n",
194 | " 22.0 | \n",
195 | " 1 | \n",
196 | " 0 | \n",
197 | " A/5 21171 | \n",
198 | " 7.2500 | \n",
199 | " NaN | \n",
200 | " S | \n",
201 | "
\n",
202 | " \n",
203 | " | 1 | \n",
204 | " 2 | \n",
205 | " 1 | \n",
206 | " 1 | \n",
207 | " Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
208 | " female | \n",
209 | " 38.0 | \n",
210 | " 1 | \n",
211 | " 0 | \n",
212 | " PC 17599 | \n",
213 | " 71.2833 | \n",
214 | " C85 | \n",
215 | " C | \n",
216 | "
\n",
217 | " \n",
218 | " | 2 | \n",
219 | " 3 | \n",
220 | " 1 | \n",
221 | " 3 | \n",
222 | " Heikkinen, Miss. Laina | \n",
223 | " female | \n",
224 | " 26.0 | \n",
225 | " 0 | \n",
226 | " 0 | \n",
227 | " STON/O2. 3101282 | \n",
228 | " 7.9250 | \n",
229 | " NaN | \n",
230 | " S | \n",
231 | "
\n",
232 | " \n",
233 | " | 3 | \n",
234 | " 4 | \n",
235 | " 1 | \n",
236 | " 1 | \n",
237 | " Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
238 | " female | \n",
239 | " 35.0 | \n",
240 | " 1 | \n",
241 | " 0 | \n",
242 | " 113803 | \n",
243 | " 53.1000 | \n",
244 | " C123 | \n",
245 | " S | \n",
246 | "
\n",
247 | " \n",
248 | " | 4 | \n",
249 | " 5 | \n",
250 | " 0 | \n",
251 | " 3 | \n",
252 | " Allen, Mr. William Henry | \n",
253 | " male | \n",
254 | " 35.0 | \n",
255 | " 0 | \n",
256 | " 0 | \n",
257 | " 373450 | \n",
258 | " 8.0500 | \n",
259 | " NaN | \n",
260 | " S | \n",
261 | "
\n",
262 | " \n",
263 | "
\n",
264 | "
"
265 | ],
266 | "text/plain": [
267 | " PassengerId Survived Pclass \\\n",
268 | "0 1 0 3 \n",
269 | "1 2 1 1 \n",
270 | "2 3 1 3 \n",
271 | "3 4 1 1 \n",
272 | "4 5 0 3 \n",
273 | "\n",
274 | " Name Sex Age SibSp \\\n",
275 | "0 Braund, Mr. Owen Harris male 22.0 1 \n",
276 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
277 | "2 Heikkinen, Miss. Laina female 26.0 0 \n",
278 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
279 | "4 Allen, Mr. William Henry male 35.0 0 \n",
280 | "\n",
281 | " Parch Ticket Fare Cabin Embarked \n",
282 | "0 0 A/5 21171 7.2500 NaN S \n",
283 | "1 0 PC 17599 71.2833 C85 C \n",
284 | "2 0 STON/O2. 3101282 7.9250 NaN S \n",
285 | "3 0 113803 53.1000 C123 S \n",
286 | "4 0 373450 8.0500 NaN S "
287 | ]
288 | },
289 | "execution_count": 5,
290 | "metadata": {},
291 | "output_type": "execute_result"
292 | }
293 | ],
294 | "source": [
295 | "train.head()"
296 | ]
297 | },
298 | {
299 | "cell_type": "code",
300 | "execution_count": 6,
301 | "metadata": {},
302 | "outputs": [
303 | {
304 | "data": {
305 | "text/html": [
306 | "\n",
307 | "\n",
320 | "
\n",
321 | " \n",
322 | " \n",
323 | " | \n",
324 | " PassengerId | \n",
325 | " Pclass | \n",
326 | " Name | \n",
327 | " Sex | \n",
328 | " Age | \n",
329 | " SibSp | \n",
330 | " Parch | \n",
331 | " Ticket | \n",
332 | " Fare | \n",
333 | " Cabin | \n",
334 | " Embarked | \n",
335 | "
\n",
336 | " \n",
337 | " \n",
338 | " \n",
339 | " | 0 | \n",
340 | " 892 | \n",
341 | " 3 | \n",
342 | " Kelly, Mr. James | \n",
343 | " male | \n",
344 | " 34.5 | \n",
345 | " 0 | \n",
346 | " 0 | \n",
347 | " 330911 | \n",
348 | " 7.8292 | \n",
349 | " NaN | \n",
350 | " Q | \n",
351 | "
\n",
352 | " \n",
353 | " | 1 | \n",
354 | " 893 | \n",
355 | " 3 | \n",
356 | " Wilkes, Mrs. James (Ellen Needs) | \n",
357 | " female | \n",
358 | " 47.0 | \n",
359 | " 1 | \n",
360 | " 0 | \n",
361 | " 363272 | \n",
362 | " 7.0000 | \n",
363 | " NaN | \n",
364 | " S | \n",
365 | "
\n",
366 | " \n",
367 | " | 2 | \n",
368 | " 894 | \n",
369 | " 2 | \n",
370 | " Myles, Mr. Thomas Francis | \n",
371 | " male | \n",
372 | " 62.0 | \n",
373 | " 0 | \n",
374 | " 0 | \n",
375 | " 240276 | \n",
376 | " 9.6875 | \n",
377 | " NaN | \n",
378 | " Q | \n",
379 | "
\n",
380 | " \n",
381 | " | 3 | \n",
382 | " 895 | \n",
383 | " 3 | \n",
384 | " Wirz, Mr. Albert | \n",
385 | " male | \n",
386 | " 27.0 | \n",
387 | " 0 | \n",
388 | " 0 | \n",
389 | " 315154 | \n",
390 | " 8.6625 | \n",
391 | " NaN | \n",
392 | " S | \n",
393 | "
\n",
394 | " \n",
395 | " | 4 | \n",
396 | " 896 | \n",
397 | " 3 | \n",
398 | " Hirvonen, Mrs. Alexander (Helga E Lindqvist) | \n",
399 | " female | \n",
400 | " 22.0 | \n",
401 | " 1 | \n",
402 | " 1 | \n",
403 | " 3101298 | \n",
404 | " 12.2875 | \n",
405 | " NaN | \n",
406 | " S | \n",
407 | "
\n",
408 | " \n",
409 | "
\n",
410 | "
"
411 | ],
412 | "text/plain": [
413 | " PassengerId Pclass Name Sex \\\n",
414 | "0 892 3 Kelly, Mr. James male \n",
415 | "1 893 3 Wilkes, Mrs. James (Ellen Needs) female \n",
416 | "2 894 2 Myles, Mr. Thomas Francis male \n",
417 | "3 895 3 Wirz, Mr. Albert male \n",
418 | "4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female \n",
419 | "\n",
420 | " Age SibSp Parch Ticket Fare Cabin Embarked \n",
421 | "0 34.5 0 0 330911 7.8292 NaN Q \n",
422 | "1 47.0 1 0 363272 7.0000 NaN S \n",
423 | "2 62.0 0 0 240276 9.6875 NaN Q \n",
424 | "3 27.0 0 0 315154 8.6625 NaN S \n",
425 | "4 22.0 1 1 3101298 12.2875 NaN S "
426 | ]
427 | },
428 | "execution_count": 6,
429 | "metadata": {},
430 | "output_type": "execute_result"
431 | }
432 | ],
433 | "source": [
434 | "test.head()"
435 | ]
436 | },
437 | {
438 | "cell_type": "code",
439 | "execution_count": 7,
440 | "metadata": {},
441 | "outputs": [],
442 | "source": [
443 | "data = pd.concat([train, test], sort=False)"
444 | ]
445 | },
446 | {
447 | "cell_type": "code",
448 | "execution_count": 8,
449 | "metadata": {},
450 | "outputs": [
451 | {
452 | "data": {
453 | "text/html": [
454 | "\n",
455 | "\n",
468 | "
\n",
469 | " \n",
470 | " \n",
471 | " | \n",
472 | " PassengerId | \n",
473 | " Survived | \n",
474 | " Pclass | \n",
475 | " Name | \n",
476 | " Sex | \n",
477 | " Age | \n",
478 | " SibSp | \n",
479 | " Parch | \n",
480 | " Ticket | \n",
481 | " Fare | \n",
482 | " Cabin | \n",
483 | " Embarked | \n",
484 | "
\n",
485 | " \n",
486 | " \n",
487 | " \n",
488 | " | 0 | \n",
489 | " 1 | \n",
490 | " 0.0 | \n",
491 | " 3 | \n",
492 | " Braund, Mr. Owen Harris | \n",
493 | " male | \n",
494 | " 22.0 | \n",
495 | " 1 | \n",
496 | " 0 | \n",
497 | " A/5 21171 | \n",
498 | " 7.2500 | \n",
499 | " NaN | \n",
500 | " S | \n",
501 | "
\n",
502 | " \n",
503 | " | 1 | \n",
504 | " 2 | \n",
505 | " 1.0 | \n",
506 | " 1 | \n",
507 | " Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
508 | " female | \n",
509 | " 38.0 | \n",
510 | " 1 | \n",
511 | " 0 | \n",
512 | " PC 17599 | \n",
513 | " 71.2833 | \n",
514 | " C85 | \n",
515 | " C | \n",
516 | "
\n",
517 | " \n",
518 | " | 2 | \n",
519 | " 3 | \n",
520 | " 1.0 | \n",
521 | " 3 | \n",
522 | " Heikkinen, Miss. Laina | \n",
523 | " female | \n",
524 | " 26.0 | \n",
525 | " 0 | \n",
526 | " 0 | \n",
527 | " STON/O2. 3101282 | \n",
528 | " 7.9250 | \n",
529 | " NaN | \n",
530 | " S | \n",
531 | "
\n",
532 | " \n",
533 | " | 3 | \n",
534 | " 4 | \n",
535 | " 1.0 | \n",
536 | " 1 | \n",
537 | " Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
538 | " female | \n",
539 | " 35.0 | \n",
540 | " 1 | \n",
541 | " 0 | \n",
542 | " 113803 | \n",
543 | " 53.1000 | \n",
544 | " C123 | \n",
545 | " S | \n",
546 | "
\n",
547 | " \n",
548 | " | 4 | \n",
549 | " 5 | \n",
550 | " 0.0 | \n",
551 | " 3 | \n",
552 | " Allen, Mr. William Henry | \n",
553 | " male | \n",
554 | " 35.0 | \n",
555 | " 0 | \n",
556 | " 0 | \n",
557 | " 373450 | \n",
558 | " 8.0500 | \n",
559 | " NaN | \n",
560 | " S | \n",
561 | "
\n",
562 | " \n",
563 | "
\n",
564 | "
"
565 | ],
566 | "text/plain": [
567 | " PassengerId Survived Pclass \\\n",
568 | "0 1 0.0 3 \n",
569 | "1 2 1.0 1 \n",
570 | "2 3 1.0 3 \n",
571 | "3 4 1.0 1 \n",
572 | "4 5 0.0 3 \n",
573 | "\n",
574 | " Name Sex Age SibSp \\\n",
575 | "0 Braund, Mr. Owen Harris male 22.0 1 \n",
576 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
577 | "2 Heikkinen, Miss. Laina female 26.0 0 \n",
578 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
579 | "4 Allen, Mr. William Henry male 35.0 0 \n",
580 | "\n",
581 | " Parch Ticket Fare Cabin Embarked \n",
582 | "0 0 A/5 21171 7.2500 NaN S \n",
583 | "1 0 PC 17599 71.2833 C85 C \n",
584 | "2 0 STON/O2. 3101282 7.9250 NaN S \n",
585 | "3 0 113803 53.1000 C123 S \n",
586 | "4 0 373450 8.0500 NaN S "
587 | ]
588 | },
589 | "execution_count": 8,
590 | "metadata": {},
591 | "output_type": "execute_result"
592 | }
593 | ],
594 | "source": [
595 | "data.head()"
596 | ]
597 | },
598 | {
599 | "cell_type": "code",
600 | "execution_count": 9,
601 | "metadata": {},
602 | "outputs": [
603 | {
604 | "name": "stdout",
605 | "output_type": "stream",
606 | "text": [
607 | "891 418 1309\n"
608 | ]
609 | }
610 | ],
611 | "source": [
612 | "print(len(train), len(test), len(data))"
613 | ]
614 | },
615 | {
616 | "cell_type": "code",
617 | "execution_count": 10,
618 | "metadata": {},
619 | "outputs": [
620 | {
621 | "data": {
622 | "text/plain": [
623 | "PassengerId 0\n",
624 | "Survived 418\n",
625 | "Pclass 0\n",
626 | "Name 0\n",
627 | "Sex 0\n",
628 | "Age 263\n",
629 | "SibSp 0\n",
630 | "Parch 0\n",
631 | "Ticket 0\n",
632 | "Fare 1\n",
633 | "Cabin 1014\n",
634 | "Embarked 2\n",
635 | "dtype: int64"
636 | ]
637 | },
638 | "execution_count": 10,
639 | "metadata": {},
640 | "output_type": "execute_result"
641 | }
642 | ],
643 | "source": [
644 | "data.isnull().sum()"
645 | ]
646 | },
647 | {
648 | "cell_type": "markdown",
649 | "metadata": {
650 | "_cell_guid": "687a06ef-2686-4772-ac24-5e413adbda6d",
651 | "_uuid": "3846eff13d723fa6ff10117fc3ebc46f266b210f"
652 | },
653 | "source": [
654 | "## 特徴量エンジニアリング"
655 | ]
656 | },
657 | {
658 | "cell_type": "markdown",
659 | "metadata": {
660 | "_cell_guid": "09253274-14c1-4ca9-a078-85229acba814",
661 | "_uuid": "234454857fff5bd61026c51cefd5eaee4e6a1879"
662 | },
663 | "source": [
664 | "### 1. Pclass"
665 | ]
666 | },
667 | {
668 | "cell_type": "markdown",
669 | "metadata": {
670 | "_cell_guid": "b84a4c4b-9db2-4626-8a28-9817084eb554",
671 | "_uuid": "610bb44d64b400e7bafbf4d6a3295c7a43f1df23"
672 | },
673 | "source": [
674 | "### 2. Sex"
675 | ]
676 | },
677 | {
678 | "cell_type": "code",
679 | "execution_count": 11,
680 | "metadata": {
681 | "_cell_guid": "27c06d9c-61e1-4ba8-9cfc-81a2dd27390a",
682 | "_uuid": "07b661b256360d39ec561f465735042f37eee257"
683 | },
684 | "outputs": [],
685 | "source": [
686 | "data['Sex'].replace(['male', 'female'], [0, 1], inplace=True)"
687 | ]
688 | },
689 | {
690 | "cell_type": "markdown",
691 | "metadata": {
692 | "_cell_guid": "2ab02454-4dfa-4aa3-ae94-7029b86ef69e",
693 | "_uuid": "2cb8d46258c4b14ec678543f99f4f5789d60b22f"
694 | },
695 | "source": [
696 | "### 3. Embarked"
697 | ]
698 | },
699 | {
700 | "cell_type": "code",
701 | "execution_count": 12,
702 | "metadata": {
703 | "_cell_guid": "1329072e-5fc0-4aea-bc7b-ec7c27aff260",
704 | "_uuid": "5268b97889ec90508f501697d9e8d497398e0c46"
705 | },
706 | "outputs": [],
707 | "source": [
708 | "data['Embarked'].fillna(('S'), inplace=True)\n",
709 | "data['Embarked'] = data['Embarked'].map({'S': 0, 'C': 1, 'Q': 2}).astype(int)"
710 | ]
711 | },
712 | {
713 | "cell_type": "markdown",
714 | "metadata": {
715 | "_cell_guid": "8c33619e-a180-4c12-9f63-e7b3023206cd",
716 | "_uuid": "382a7882e2ae3144513f8ffea8478b4e24e8df0f"
717 | },
718 | "source": [
719 | "### 4. Fare"
720 | ]
721 | },
722 | {
723 | "cell_type": "code",
724 | "execution_count": 13,
725 | "metadata": {
726 | "_cell_guid": "fd9b2edd-cf75-4ad8-a100-5ca06cb53f7b",
727 | "_uuid": "161a7a829ad6a45b7745655b1713888a6778818f"
728 | },
729 | "outputs": [],
730 | "source": [
731 | "data['Fare'].fillna(np.mean(data['Fare']), inplace=True)"
732 | ]
733 | },
734 | {
735 | "cell_type": "markdown",
736 | "metadata": {
737 | "_cell_guid": "1ea2fef1-ec32-4688-9030-63bafed9692c",
738 | "_uuid": "0c6cb694c63862e8f1d805fdcd54769312aae246"
739 | },
740 | "source": [
741 | "### 5. Age"
742 | ]
743 | },
744 | {
745 | "cell_type": "code",
746 | "execution_count": 14,
747 | "metadata": {
748 | "_cell_guid": "5717373d-91ce-4cfd-a579-ef7dab192771",
749 | "_uuid": "42f1ebda5705d5272ea350bfd00e66c2f946a66e"
750 | },
751 | "outputs": [],
752 | "source": [
753 | "age_avg = data['Age'].mean()\n",
754 | "age_std = data['Age'].std()\n",
755 | "\n",
756 | "data['Age'].fillna(np.random.randint(age_avg - age_std, age_avg + age_std), inplace=True)"
757 | ]
758 | },
759 | {
760 | "cell_type": "code",
761 | "execution_count": 15,
762 | "metadata": {
763 | "_cell_guid": "d3f3527c-8758-41c2-bbe3-14b604b2d317",
764 | "_uuid": "f7341a6f089464180e94d5e09d1071e0350cff3d"
765 | },
766 | "outputs": [],
767 | "source": [
768 | "delete_columns = ['Name', 'PassengerId', 'SibSp', 'Parch', 'Ticket', 'Cabin']\n",
769 | "data.drop(delete_columns, axis=1, inplace=True)"
770 | ]
771 | },
772 | {
773 | "cell_type": "code",
774 | "execution_count": 16,
775 | "metadata": {},
776 | "outputs": [],
777 | "source": [
778 | "train = data[:len(train)]\n",
779 | "test = data[len(train):]"
780 | ]
781 | },
782 | {
783 | "cell_type": "code",
784 | "execution_count": 17,
785 | "metadata": {
786 | "_cell_guid": "03d91a2b-08da-4593-8c1e-840fb7bec469",
787 | "_uuid": "768050e7f210d95ba28226ada778e763d21c97f8",
788 | "scrolled": true
789 | },
790 | "outputs": [],
791 | "source": [
792 | "y_train = train['Survived']\n",
793 | "X_train = train.drop('Survived', axis=1)\n",
794 | "X_test = test.drop('Survived', axis=1)"
795 | ]
796 | },
797 | {
798 | "cell_type": "code",
799 | "execution_count": 18,
800 | "metadata": {},
801 | "outputs": [
802 | {
803 | "data": {
804 | "text/html": [
805 | "\n",
806 | "\n",
819 | "
\n",
820 | " \n",
821 | " \n",
822 | " | \n",
823 | " Pclass | \n",
824 | " Sex | \n",
825 | " Age | \n",
826 | " Fare | \n",
827 | " Embarked | \n",
828 | "
\n",
829 | " \n",
830 | " \n",
831 | " \n",
832 | " | 0 | \n",
833 | " 3 | \n",
834 | " 0 | \n",
835 | " 22.0 | \n",
836 | " 7.2500 | \n",
837 | " 0 | \n",
838 | "
\n",
839 | " \n",
840 | " | 1 | \n",
841 | " 1 | \n",
842 | " 1 | \n",
843 | " 38.0 | \n",
844 | " 71.2833 | \n",
845 | " 1 | \n",
846 | "
\n",
847 | " \n",
848 | " | 2 | \n",
849 | " 3 | \n",
850 | " 1 | \n",
851 | " 26.0 | \n",
852 | " 7.9250 | \n",
853 | " 0 | \n",
854 | "
\n",
855 | " \n",
856 | " | 3 | \n",
857 | " 1 | \n",
858 | " 1 | \n",
859 | " 35.0 | \n",
860 | " 53.1000 | \n",
861 | " 0 | \n",
862 | "
\n",
863 | " \n",
864 | " | 4 | \n",
865 | " 3 | \n",
866 | " 0 | \n",
867 | " 35.0 | \n",
868 | " 8.0500 | \n",
869 | " 0 | \n",
870 | "
\n",
871 | " \n",
872 | "
\n",
873 | "
"
874 | ],
875 | "text/plain": [
876 | " Pclass Sex Age Fare Embarked\n",
877 | "0 3 0 22.0 7.2500 0\n",
878 | "1 1 1 38.0 71.2833 1\n",
879 | "2 3 1 26.0 7.9250 0\n",
880 | "3 1 1 35.0 53.1000 0\n",
881 | "4 3 0 35.0 8.0500 0"
882 | ]
883 | },
884 | "execution_count": 18,
885 | "metadata": {},
886 | "output_type": "execute_result"
887 | }
888 | ],
889 | "source": [
890 | "X_train.head()"
891 | ]
892 | },
893 | {
894 | "cell_type": "code",
895 | "execution_count": 19,
896 | "metadata": {},
897 | "outputs": [
898 | {
899 | "data": {
900 | "text/plain": [
901 | "0 0.0\n",
902 | "1 1.0\n",
903 | "2 1.0\n",
904 | "3 1.0\n",
905 | "4 0.0\n",
906 | "Name: Survived, dtype: float64"
907 | ]
908 | },
909 | "execution_count": 19,
910 | "metadata": {},
911 | "output_type": "execute_result"
912 | }
913 | ],
914 | "source": [
915 | "y_train.head()"
916 | ]
917 | },
918 | {
919 | "cell_type": "markdown",
920 | "metadata": {},
921 | "source": [
922 | "## 機械学習アルゴリズム"
923 | ]
924 | },
925 | {
926 | "cell_type": "code",
927 | "execution_count": 20,
928 | "metadata": {},
929 | "outputs": [],
930 | "source": [
931 | "from sklearn.linear_model import LogisticRegression"
932 | ]
933 | },
934 | {
935 | "cell_type": "code",
936 | "execution_count": 21,
937 | "metadata": {},
938 | "outputs": [],
939 | "source": [
940 | "clf = LogisticRegression(penalty='l2', solver='sag', random_state=0)"
941 | ]
942 | },
943 | {
944 | "cell_type": "code",
945 | "execution_count": 22,
946 | "metadata": {},
947 | "outputs": [
948 | {
949 | "name": "stderr",
950 | "output_type": "stream",
951 | "text": [
952 | "/opt/conda/lib/python3.6/site-packages/sklearn/linear_model/sag.py:337: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n",
953 | " \"the coef_ did not converge\", ConvergenceWarning)\n"
954 | ]
955 | },
956 | {
957 | "data": {
958 | "text/plain": [
959 | "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
960 | " intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
961 | " multi_class='warn', n_jobs=None, penalty='l2',\n",
962 | " random_state=0, solver='sag', tol=0.0001, verbose=0,\n",
963 | " warm_start=False)"
964 | ]
965 | },
966 | "execution_count": 22,
967 | "metadata": {},
968 | "output_type": "execute_result"
969 | }
970 | ],
971 | "source": [
972 | "clf.fit(X_train, y_train)"
973 | ]
974 | },
975 | {
976 | "cell_type": "code",
977 | "execution_count": 23,
978 | "metadata": {},
979 | "outputs": [],
980 | "source": [
981 | "y_pred = clf.predict(X_test)"
982 | ]
983 | },
984 | {
985 | "cell_type": "code",
986 | "execution_count": 24,
987 | "metadata": {},
988 | "outputs": [
989 | {
990 | "data": {
991 | "text/plain": [
992 | "array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,\n",
993 | " 0., 0., 0.])"
994 | ]
995 | },
996 | "execution_count": 24,
997 | "metadata": {},
998 | "output_type": "execute_result"
999 | }
1000 | ],
1001 | "source": [
1002 | "y_pred[:20]"
1003 | ]
1004 | },
1005 | {
1006 | "cell_type": "markdown",
1007 | "metadata": {
1008 | "_cell_guid": "a37e176c-3b55-43ab-b358-324dc384ceef",
1009 | "_uuid": "d4d6df3e6c40063309ea72f4d4cea51cf616fd80"
1010 | },
1011 | "source": [
1012 | "## 提出"
1013 | ]
1014 | },
1015 | {
1016 | "cell_type": "code",
1017 | "execution_count": 25,
1018 | "metadata": {
1019 | "_cell_guid": "8111500e-330c-411e-a742-66b9d4c5cb2c",
1020 | "_uuid": "40858051e4f458835f937275be4dfe3dfa68b25f"
1021 | },
1022 | "outputs": [],
1023 | "source": [
1024 | "sub = pd.read_csv('../input/titanic/gender_submission.csv')\n",
1025 | "sub['Survived'] = list(map(int, y_pred))\n",
1026 | "sub.to_csv('submission.csv', index=False)"
1027 | ]
1028 | },
1029 | {
1030 | "cell_type": "code",
1031 | "execution_count": null,
1032 | "metadata": {
1033 | "_uuid": "d51cef4a043bbab7560dc972a948d96a0b369760"
1034 | },
1035 | "outputs": [],
1036 | "source": []
1037 | }
1038 | ],
1039 | "metadata": {
1040 | "kernelspec": {
1041 | "display_name": "Python 3",
1042 | "language": "python",
1043 | "name": "python3"
1044 | },
1045 | "language_info": {
1046 | "codemirror_mode": {
1047 | "name": "ipython",
1048 | "version": 3
1049 | },
1050 | "file_extension": ".py",
1051 | "mimetype": "text/x-python",
1052 | "name": "python",
1053 | "nbconvert_exporter": "python",
1054 | "pygments_lexer": "ipython3",
1055 | "version": "3.6.6"
1056 | }
1057 | },
1058 | "nbformat": 4,
1059 | "nbformat_minor": 1
1060 | }
1061 |
--------------------------------------------------------------------------------
/ch02/ch02_02.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "_cell_guid": "e12020f7-4f94-4ecc-9007-9b7a6e7458a6",
7 | "_uuid": "1fecb0980d8d422ec0f005c4bfd6225385c2c60f"
8 | },
9 | "source": [
10 | "This notebook is a sample code with Japanese comments.\n",
11 | "\n",
12 | "# 2.2 全体像を把握! submitまでの処理の流れを見てみよう"
13 | ]
14 | },
15 | {
16 | "cell_type": "markdown",
17 | "metadata": {},
18 | "source": [
19 | "## パッケージの読み込み"
20 | ]
21 | },
22 | {
23 | "cell_type": "code",
24 | "execution_count": 1,
25 | "metadata": {},
26 | "outputs": [],
27 | "source": [
28 | "import numpy as np\n",
29 | "import pandas as pd"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {
35 | "_cell_guid": "d49e43e8-0dc0-41b7-afd0-60acc96e9f07",
36 | "_uuid": "4ecd55c5bd48390d026eeb6ae8de0a7ace0d4ada"
37 | },
38 | "source": [
39 | "## データの読み込み"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 2,
45 | "metadata": {},
46 | "outputs": [
47 | {
48 | "name": "stdout",
49 | "output_type": "stream",
50 | "text": [
51 | "README.md gender_submission.csv test.csv train.csv\r\n"
52 | ]
53 | }
54 | ],
55 | "source": [
56 | "!ls ../input/titanic"
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": 3,
62 | "metadata": {
63 | "_cell_guid": "9c963eb3-04ac-422c-bc0c-4373bda6880e",
64 | "_uuid": "95f406c4d2f1dab6744ea248b80e3a535c652450"
65 | },
66 | "outputs": [],
67 | "source": [
68 | "train = pd.read_csv('../input/titanic/train.csv')\n",
69 | "test = pd.read_csv('../input/titanic/test.csv')\n",
70 | "gender_submission = pd.read_csv('../input/titanic/gender_submission.csv')"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 4,
76 | "metadata": {},
77 | "outputs": [
78 | {
79 | "data": {
80 | "text/html": [
81 | "\n",
82 | "\n",
95 | "
\n",
96 | " \n",
97 | " \n",
98 | " | \n",
99 | " PassengerId | \n",
100 | " Survived | \n",
101 | "
\n",
102 | " \n",
103 | " \n",
104 | " \n",
105 | " | 0 | \n",
106 | " 892 | \n",
107 | " 0 | \n",
108 | "
\n",
109 | " \n",
110 | " | 1 | \n",
111 | " 893 | \n",
112 | " 1 | \n",
113 | "
\n",
114 | " \n",
115 | " | 2 | \n",
116 | " 894 | \n",
117 | " 0 | \n",
118 | "
\n",
119 | " \n",
120 | " | 3 | \n",
121 | " 895 | \n",
122 | " 0 | \n",
123 | "
\n",
124 | " \n",
125 | " | 4 | \n",
126 | " 896 | \n",
127 | " 1 | \n",
128 | "
\n",
129 | " \n",
130 | "
\n",
131 | "
"
132 | ],
133 | "text/plain": [
134 | " PassengerId Survived\n",
135 | "0 892 0\n",
136 | "1 893 1\n",
137 | "2 894 0\n",
138 | "3 895 0\n",
139 | "4 896 1"
140 | ]
141 | },
142 | "execution_count": 4,
143 | "metadata": {},
144 | "output_type": "execute_result"
145 | }
146 | ],
147 | "source": [
148 | "gender_submission.head()"
149 | ]
150 | },
151 | {
152 | "cell_type": "code",
153 | "execution_count": 5,
154 | "metadata": {},
155 | "outputs": [
156 | {
157 | "data": {
158 | "text/html": [
159 | "\n",
160 | "\n",
173 | "
\n",
174 | " \n",
175 | " \n",
176 | " | \n",
177 | " PassengerId | \n",
178 | " Survived | \n",
179 | " Pclass | \n",
180 | " Name | \n",
181 | " Sex | \n",
182 | " Age | \n",
183 | " SibSp | \n",
184 | " Parch | \n",
185 | " Ticket | \n",
186 | " Fare | \n",
187 | " Cabin | \n",
188 | " Embarked | \n",
189 | "
\n",
190 | " \n",
191 | " \n",
192 | " \n",
193 | " | 0 | \n",
194 | " 1 | \n",
195 | " 0 | \n",
196 | " 3 | \n",
197 | " Braund, Mr. Owen Harris | \n",
198 | " male | \n",
199 | " 22.0 | \n",
200 | " 1 | \n",
201 | " 0 | \n",
202 | " A/5 21171 | \n",
203 | " 7.2500 | \n",
204 | " NaN | \n",
205 | " S | \n",
206 | "
\n",
207 | " \n",
208 | " | 1 | \n",
209 | " 2 | \n",
210 | " 1 | \n",
211 | " 1 | \n",
212 | " Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
213 | " female | \n",
214 | " 38.0 | \n",
215 | " 1 | \n",
216 | " 0 | \n",
217 | " PC 17599 | \n",
218 | " 71.2833 | \n",
219 | " C85 | \n",
220 | " C | \n",
221 | "
\n",
222 | " \n",
223 | " | 2 | \n",
224 | " 3 | \n",
225 | " 1 | \n",
226 | " 3 | \n",
227 | " Heikkinen, Miss. Laina | \n",
228 | " female | \n",
229 | " 26.0 | \n",
230 | " 0 | \n",
231 | " 0 | \n",
232 | " STON/O2. 3101282 | \n",
233 | " 7.9250 | \n",
234 | " NaN | \n",
235 | " S | \n",
236 | "
\n",
237 | " \n",
238 | " | 3 | \n",
239 | " 4 | \n",
240 | " 1 | \n",
241 | " 1 | \n",
242 | " Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
243 | " female | \n",
244 | " 35.0 | \n",
245 | " 1 | \n",
246 | " 0 | \n",
247 | " 113803 | \n",
248 | " 53.1000 | \n",
249 | " C123 | \n",
250 | " S | \n",
251 | "
\n",
252 | " \n",
253 | " | 4 | \n",
254 | " 5 | \n",
255 | " 0 | \n",
256 | " 3 | \n",
257 | " Allen, Mr. William Henry | \n",
258 | " male | \n",
259 | " 35.0 | \n",
260 | " 0 | \n",
261 | " 0 | \n",
262 | " 373450 | \n",
263 | " 8.0500 | \n",
264 | " NaN | \n",
265 | " S | \n",
266 | "
\n",
267 | " \n",
268 | "
\n",
269 | "
"
270 | ],
271 | "text/plain": [
272 | " PassengerId Survived Pclass \\\n",
273 | "0 1 0 3 \n",
274 | "1 2 1 1 \n",
275 | "2 3 1 3 \n",
276 | "3 4 1 1 \n",
277 | "4 5 0 3 \n",
278 | "\n",
279 | " Name Sex Age SibSp \\\n",
280 | "0 Braund, Mr. Owen Harris male 22.0 1 \n",
281 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
282 | "2 Heikkinen, Miss. Laina female 26.0 0 \n",
283 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
284 | "4 Allen, Mr. William Henry male 35.0 0 \n",
285 | "\n",
286 | " Parch Ticket Fare Cabin Embarked \n",
287 | "0 0 A/5 21171 7.2500 NaN S \n",
288 | "1 0 PC 17599 71.2833 C85 C \n",
289 | "2 0 STON/O2. 3101282 7.9250 NaN S \n",
290 | "3 0 113803 53.1000 C123 S \n",
291 | "4 0 373450 8.0500 NaN S "
292 | ]
293 | },
294 | "execution_count": 5,
295 | "metadata": {},
296 | "output_type": "execute_result"
297 | }
298 | ],
299 | "source": [
300 | "train.head()"
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "execution_count": 6,
306 | "metadata": {},
307 | "outputs": [
308 | {
309 | "data": {
310 | "text/html": [
311 | "\n",
312 | "\n",
325 | "
\n",
326 | " \n",
327 | " \n",
328 | " | \n",
329 | " PassengerId | \n",
330 | " Pclass | \n",
331 | " Name | \n",
332 | " Sex | \n",
333 | " Age | \n",
334 | " SibSp | \n",
335 | " Parch | \n",
336 | " Ticket | \n",
337 | " Fare | \n",
338 | " Cabin | \n",
339 | " Embarked | \n",
340 | "
\n",
341 | " \n",
342 | " \n",
343 | " \n",
344 | " | 0 | \n",
345 | " 892 | \n",
346 | " 3 | \n",
347 | " Kelly, Mr. James | \n",
348 | " male | \n",
349 | " 34.5 | \n",
350 | " 0 | \n",
351 | " 0 | \n",
352 | " 330911 | \n",
353 | " 7.8292 | \n",
354 | " NaN | \n",
355 | " Q | \n",
356 | "
\n",
357 | " \n",
358 | " | 1 | \n",
359 | " 893 | \n",
360 | " 3 | \n",
361 | " Wilkes, Mrs. James (Ellen Needs) | \n",
362 | " female | \n",
363 | " 47.0 | \n",
364 | " 1 | \n",
365 | " 0 | \n",
366 | " 363272 | \n",
367 | " 7.0000 | \n",
368 | " NaN | \n",
369 | " S | \n",
370 | "
\n",
371 | " \n",
372 | " | 2 | \n",
373 | " 894 | \n",
374 | " 2 | \n",
375 | " Myles, Mr. Thomas Francis | \n",
376 | " male | \n",
377 | " 62.0 | \n",
378 | " 0 | \n",
379 | " 0 | \n",
380 | " 240276 | \n",
381 | " 9.6875 | \n",
382 | " NaN | \n",
383 | " Q | \n",
384 | "
\n",
385 | " \n",
386 | " | 3 | \n",
387 | " 895 | \n",
388 | " 3 | \n",
389 | " Wirz, Mr. Albert | \n",
390 | " male | \n",
391 | " 27.0 | \n",
392 | " 0 | \n",
393 | " 0 | \n",
394 | " 315154 | \n",
395 | " 8.6625 | \n",
396 | " NaN | \n",
397 | " S | \n",
398 | "
\n",
399 | " \n",
400 | " | 4 | \n",
401 | " 896 | \n",
402 | " 3 | \n",
403 | " Hirvonen, Mrs. Alexander (Helga E Lindqvist) | \n",
404 | " female | \n",
405 | " 22.0 | \n",
406 | " 1 | \n",
407 | " 1 | \n",
408 | " 3101298 | \n",
409 | " 12.2875 | \n",
410 | " NaN | \n",
411 | " S | \n",
412 | "
\n",
413 | " \n",
414 | "
\n",
415 | "
"
416 | ],
417 | "text/plain": [
418 | " PassengerId Pclass Name Sex \\\n",
419 | "0 892 3 Kelly, Mr. James male \n",
420 | "1 893 3 Wilkes, Mrs. James (Ellen Needs) female \n",
421 | "2 894 2 Myles, Mr. Thomas Francis male \n",
422 | "3 895 3 Wirz, Mr. Albert male \n",
423 | "4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female \n",
424 | "\n",
425 | " Age SibSp Parch Ticket Fare Cabin Embarked \n",
426 | "0 34.5 0 0 330911 7.8292 NaN Q \n",
427 | "1 47.0 1 0 363272 7.0000 NaN S \n",
428 | "2 62.0 0 0 240276 9.6875 NaN Q \n",
429 | "3 27.0 0 0 315154 8.6625 NaN S \n",
430 | "4 22.0 1 1 3101298 12.2875 NaN S "
431 | ]
432 | },
433 | "execution_count": 6,
434 | "metadata": {},
435 | "output_type": "execute_result"
436 | }
437 | ],
438 | "source": [
439 | "test.head()"
440 | ]
441 | },
442 | {
443 | "cell_type": "code",
444 | "execution_count": 7,
445 | "metadata": {},
446 | "outputs": [],
447 | "source": [
448 | "data = pd.concat([train, test], sort=False)"
449 | ]
450 | },
451 | {
452 | "cell_type": "code",
453 | "execution_count": 8,
454 | "metadata": {},
455 | "outputs": [
456 | {
457 | "data": {
458 | "text/html": [
459 | "\n",
460 | "\n",
473 | "
\n",
474 | " \n",
475 | " \n",
476 | " | \n",
477 | " PassengerId | \n",
478 | " Survived | \n",
479 | " Pclass | \n",
480 | " Name | \n",
481 | " Sex | \n",
482 | " Age | \n",
483 | " SibSp | \n",
484 | " Parch | \n",
485 | " Ticket | \n",
486 | " Fare | \n",
487 | " Cabin | \n",
488 | " Embarked | \n",
489 | "
\n",
490 | " \n",
491 | " \n",
492 | " \n",
493 | " | 0 | \n",
494 | " 1 | \n",
495 | " 0.0 | \n",
496 | " 3 | \n",
497 | " Braund, Mr. Owen Harris | \n",
498 | " male | \n",
499 | " 22.0 | \n",
500 | " 1 | \n",
501 | " 0 | \n",
502 | " A/5 21171 | \n",
503 | " 7.2500 | \n",
504 | " NaN | \n",
505 | " S | \n",
506 | "
\n",
507 | " \n",
508 | " | 1 | \n",
509 | " 2 | \n",
510 | " 1.0 | \n",
511 | " 1 | \n",
512 | " Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
513 | " female | \n",
514 | " 38.0 | \n",
515 | " 1 | \n",
516 | " 0 | \n",
517 | " PC 17599 | \n",
518 | " 71.2833 | \n",
519 | " C85 | \n",
520 | " C | \n",
521 | "
\n",
522 | " \n",
523 | " | 2 | \n",
524 | " 3 | \n",
525 | " 1.0 | \n",
526 | " 3 | \n",
527 | " Heikkinen, Miss. Laina | \n",
528 | " female | \n",
529 | " 26.0 | \n",
530 | " 0 | \n",
531 | " 0 | \n",
532 | " STON/O2. 3101282 | \n",
533 | " 7.9250 | \n",
534 | " NaN | \n",
535 | " S | \n",
536 | "
\n",
537 | " \n",
538 | " | 3 | \n",
539 | " 4 | \n",
540 | " 1.0 | \n",
541 | " 1 | \n",
542 | " Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
543 | " female | \n",
544 | " 35.0 | \n",
545 | " 1 | \n",
546 | " 0 | \n",
547 | " 113803 | \n",
548 | " 53.1000 | \n",
549 | " C123 | \n",
550 | " S | \n",
551 | "
\n",
552 | " \n",
553 | " | 4 | \n",
554 | " 5 | \n",
555 | " 0.0 | \n",
556 | " 3 | \n",
557 | " Allen, Mr. William Henry | \n",
558 | " male | \n",
559 | " 35.0 | \n",
560 | " 0 | \n",
561 | " 0 | \n",
562 | " 373450 | \n",
563 | " 8.0500 | \n",
564 | " NaN | \n",
565 | " S | \n",
566 | "
\n",
567 | " \n",
568 | "
\n",
569 | "
"
570 | ],
571 | "text/plain": [
572 | " PassengerId Survived Pclass \\\n",
573 | "0 1 0.0 3 \n",
574 | "1 2 1.0 1 \n",
575 | "2 3 1.0 3 \n",
576 | "3 4 1.0 1 \n",
577 | "4 5 0.0 3 \n",
578 | "\n",
579 | " Name Sex Age SibSp \\\n",
580 | "0 Braund, Mr. Owen Harris male 22.0 1 \n",
581 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
582 | "2 Heikkinen, Miss. Laina female 26.0 0 \n",
583 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
584 | "4 Allen, Mr. William Henry male 35.0 0 \n",
585 | "\n",
586 | " Parch Ticket Fare Cabin Embarked \n",
587 | "0 0 A/5 21171 7.2500 NaN S \n",
588 | "1 0 PC 17599 71.2833 C85 C \n",
589 | "2 0 STON/O2. 3101282 7.9250 NaN S \n",
590 | "3 0 113803 53.1000 C123 S \n",
591 | "4 0 373450 8.0500 NaN S "
592 | ]
593 | },
594 | "execution_count": 8,
595 | "metadata": {},
596 | "output_type": "execute_result"
597 | }
598 | ],
599 | "source": [
600 | "data.head()"
601 | ]
602 | },
603 | {
604 | "cell_type": "code",
605 | "execution_count": 9,
606 | "metadata": {},
607 | "outputs": [
608 | {
609 | "name": "stdout",
610 | "output_type": "stream",
611 | "text": [
612 | "891 418 1309\n"
613 | ]
614 | }
615 | ],
616 | "source": [
617 | "print(len(train), len(test), len(data))"
618 | ]
619 | },
620 | {
621 | "cell_type": "code",
622 | "execution_count": 10,
623 | "metadata": {},
624 | "outputs": [
625 | {
626 | "data": {
627 | "text/plain": [
628 | "PassengerId 0\n",
629 | "Survived 418\n",
630 | "Pclass 0\n",
631 | "Name 0\n",
632 | "Sex 0\n",
633 | "Age 263\n",
634 | "SibSp 0\n",
635 | "Parch 0\n",
636 | "Ticket 0\n",
637 | "Fare 1\n",
638 | "Cabin 1014\n",
639 | "Embarked 2\n",
640 | "dtype: int64"
641 | ]
642 | },
643 | "execution_count": 10,
644 | "metadata": {},
645 | "output_type": "execute_result"
646 | }
647 | ],
648 | "source": [
649 | "data.isnull().sum()"
650 | ]
651 | },
652 | {
653 | "cell_type": "markdown",
654 | "metadata": {
655 | "_cell_guid": "687a06ef-2686-4772-ac24-5e413adbda6d",
656 | "_uuid": "3846eff13d723fa6ff10117fc3ebc46f266b210f"
657 | },
658 | "source": [
659 | "## 特徴量エンジニアリング"
660 | ]
661 | },
662 | {
663 | "cell_type": "markdown",
664 | "metadata": {
665 | "_cell_guid": "09253274-14c1-4ca9-a078-85229acba814",
666 | "_uuid": "234454857fff5bd61026c51cefd5eaee4e6a1879"
667 | },
668 | "source": [
669 | "### 1. Pclass"
670 | ]
671 | },
672 | {
673 | "cell_type": "code",
674 | "execution_count": 11,
675 | "metadata": {},
676 | "outputs": [
677 | {
678 | "data": {
679 | "text/plain": [
680 | "3 709\n",
681 | "1 323\n",
682 | "2 277\n",
683 | "Name: Pclass, dtype: int64"
684 | ]
685 | },
686 | "execution_count": 11,
687 | "metadata": {},
688 | "output_type": "execute_result"
689 | }
690 | ],
691 | "source": [
692 | "data['Pclass'].value_counts()"
693 | ]
694 | },
695 | {
696 | "cell_type": "markdown",
697 | "metadata": {
698 | "_cell_guid": "b84a4c4b-9db2-4626-8a28-9817084eb554",
699 | "_uuid": "610bb44d64b400e7bafbf4d6a3295c7a43f1df23"
700 | },
701 | "source": [
702 | "### 2. Sex"
703 | ]
704 | },
705 | {
706 | "cell_type": "code",
707 | "execution_count": 12,
708 | "metadata": {
709 | "_cell_guid": "27c06d9c-61e1-4ba8-9cfc-81a2dd27390a",
710 | "_uuid": "07b661b256360d39ec561f465735042f37eee257"
711 | },
712 | "outputs": [],
713 | "source": [
714 | "data['Sex'].replace(['male', 'female'], [0, 1], inplace=True)"
715 | ]
716 | },
717 | {
718 | "cell_type": "markdown",
719 | "metadata": {
720 | "_cell_guid": "2ab02454-4dfa-4aa3-ae94-7029b86ef69e",
721 | "_uuid": "2cb8d46258c4b14ec678543f99f4f5789d60b22f"
722 | },
723 | "source": [
724 | "### 3. Embarked"
725 | ]
726 | },
727 | {
728 | "cell_type": "code",
729 | "execution_count": 13,
730 | "metadata": {},
731 | "outputs": [
732 | {
733 | "data": {
734 | "text/plain": [
735 | "S 914\n",
736 | "C 270\n",
737 | "Q 123\n",
738 | "Name: Embarked, dtype: int64"
739 | ]
740 | },
741 | "execution_count": 13,
742 | "metadata": {},
743 | "output_type": "execute_result"
744 | }
745 | ],
746 | "source": [
747 | "data['Embarked'].value_counts()"
748 | ]
749 | },
750 | {
751 | "cell_type": "code",
752 | "execution_count": 14,
753 | "metadata": {
754 | "_cell_guid": "1329072e-5fc0-4aea-bc7b-ec7c27aff260",
755 | "_uuid": "5268b97889ec90508f501697d9e8d497398e0c46"
756 | },
757 | "outputs": [],
758 | "source": [
759 | "data['Embarked'].fillna(('S'), inplace=True)\n",
760 | "data['Embarked'] = data['Embarked'].map({'S': 0, 'C': 1, 'Q': 2}).astype(int)"
761 | ]
762 | },
763 | {
764 | "cell_type": "markdown",
765 | "metadata": {
766 | "_cell_guid": "8c33619e-a180-4c12-9f63-e7b3023206cd",
767 | "_uuid": "382a7882e2ae3144513f8ffea8478b4e24e8df0f"
768 | },
769 | "source": [
770 | "### 4. Fare"
771 | ]
772 | },
773 | {
774 | "cell_type": "code",
775 | "execution_count": 15,
776 | "metadata": {
777 | "_cell_guid": "fd9b2edd-cf75-4ad8-a100-5ca06cb53f7b",
778 | "_uuid": "161a7a829ad6a45b7745655b1713888a6778818f"
779 | },
780 | "outputs": [],
781 | "source": [
782 | "data['Fare'].fillna(np.mean(data['Fare']), inplace=True)"
783 | ]
784 | },
785 | {
786 | "cell_type": "markdown",
787 | "metadata": {},
788 | "source": [
789 | "### 5. Age"
790 | ]
791 | },
792 | {
793 | "cell_type": "code",
794 | "execution_count": 16,
795 | "metadata": {
796 | "_cell_guid": "5717373d-91ce-4cfd-a579-ef7dab192771",
797 | "_uuid": "42f1ebda5705d5272ea350bfd00e66c2f946a66e"
798 | },
799 | "outputs": [],
800 | "source": [
801 | "age_avg = data['Age'].mean()\n",
802 | "age_std = data['Age'].std()\n",
803 | "\n",
804 | "data['Age'].fillna(np.random.randint(age_avg - age_std, age_avg + age_std), inplace=True)"
805 | ]
806 | },
807 | {
808 | "cell_type": "code",
809 | "execution_count": 17,
810 | "metadata": {
811 | "_cell_guid": "d3f3527c-8758-41c2-bbe3-14b604b2d317",
812 | "_uuid": "f7341a6f089464180e94d5e09d1071e0350cff3d"
813 | },
814 | "outputs": [],
815 | "source": [
816 | "delete_columns = ['Name', 'PassengerId', 'SibSp', 'Parch', 'Ticket', 'Cabin']\n",
817 | "data.drop(delete_columns, axis=1, inplace=True)"
818 | ]
819 | },
820 | {
821 | "cell_type": "code",
822 | "execution_count": 18,
823 | "metadata": {},
824 | "outputs": [],
825 | "source": [
826 | "train = data[:len(train)]\n",
827 | "test = data[len(train):]"
828 | ]
829 | },
830 | {
831 | "cell_type": "code",
832 | "execution_count": 19,
833 | "metadata": {
834 | "_cell_guid": "03d91a2b-08da-4593-8c1e-840fb7bec469",
835 | "_uuid": "768050e7f210d95ba28226ada778e763d21c97f8",
836 | "scrolled": true
837 | },
838 | "outputs": [],
839 | "source": [
840 | "y_train = train['Survived']\n",
841 | "X_train = train.drop('Survived', axis=1)\n",
842 | "X_test = test.drop('Survived', axis=1)"
843 | ]
844 | },
845 | {
846 | "cell_type": "code",
847 | "execution_count": 20,
848 | "metadata": {},
849 | "outputs": [
850 | {
851 | "data": {
852 | "text/html": [
853 | "\n",
854 | "\n",
867 | "
\n",
868 | " \n",
869 | " \n",
870 | " | \n",
871 | " Pclass | \n",
872 | " Sex | \n",
873 | " Age | \n",
874 | " Fare | \n",
875 | " Embarked | \n",
876 | "
\n",
877 | " \n",
878 | " \n",
879 | " \n",
880 | " | 0 | \n",
881 | " 3 | \n",
882 | " 0 | \n",
883 | " 22.0 | \n",
884 | " 7.2500 | \n",
885 | " 0 | \n",
886 | "
\n",
887 | " \n",
888 | " | 1 | \n",
889 | " 1 | \n",
890 | " 1 | \n",
891 | " 38.0 | \n",
892 | " 71.2833 | \n",
893 | " 1 | \n",
894 | "
\n",
895 | " \n",
896 | " | 2 | \n",
897 | " 3 | \n",
898 | " 1 | \n",
899 | " 26.0 | \n",
900 | " 7.9250 | \n",
901 | " 0 | \n",
902 | "
\n",
903 | " \n",
904 | " | 3 | \n",
905 | " 1 | \n",
906 | " 1 | \n",
907 | " 35.0 | \n",
908 | " 53.1000 | \n",
909 | " 0 | \n",
910 | "
\n",
911 | " \n",
912 | " | 4 | \n",
913 | " 3 | \n",
914 | " 0 | \n",
915 | " 35.0 | \n",
916 | " 8.0500 | \n",
917 | " 0 | \n",
918 | "
\n",
919 | " \n",
920 | "
\n",
921 | "
"
922 | ],
923 | "text/plain": [
924 | " Pclass Sex Age Fare Embarked\n",
925 | "0 3 0 22.0 7.2500 0\n",
926 | "1 1 1 38.0 71.2833 1\n",
927 | "2 3 1 26.0 7.9250 0\n",
928 | "3 1 1 35.0 53.1000 0\n",
929 | "4 3 0 35.0 8.0500 0"
930 | ]
931 | },
932 | "execution_count": 20,
933 | "metadata": {},
934 | "output_type": "execute_result"
935 | }
936 | ],
937 | "source": [
938 | "X_train.head()"
939 | ]
940 | },
941 | {
942 | "cell_type": "code",
943 | "execution_count": 21,
944 | "metadata": {},
945 | "outputs": [
946 | {
947 | "data": {
948 | "text/plain": [
949 | "0 0.0\n",
950 | "1 1.0\n",
951 | "2 1.0\n",
952 | "3 1.0\n",
953 | "4 0.0\n",
954 | "Name: Survived, dtype: float64"
955 | ]
956 | },
957 | "execution_count": 21,
958 | "metadata": {},
959 | "output_type": "execute_result"
960 | }
961 | ],
962 | "source": [
963 | "y_train.head()"
964 | ]
965 | },
966 | {
967 | "cell_type": "markdown",
968 | "metadata": {
969 | "_cell_guid": "19f52c93-701c-4ae1-ad7c-0c89004bc1a0",
970 | "_uuid": "d2f7f7fd519f1fcc160304783c8b440e5cb552da"
971 | },
972 | "source": [
973 | "## 機械学習アルゴリズム"
974 | ]
975 | },
976 | {
977 | "cell_type": "code",
978 | "execution_count": 22,
979 | "metadata": {},
980 | "outputs": [],
981 | "source": [
982 | "from sklearn.linear_model import LogisticRegression"
983 | ]
984 | },
985 | {
986 | "cell_type": "code",
987 | "execution_count": 23,
988 | "metadata": {},
989 | "outputs": [],
990 | "source": [
991 | "clf = LogisticRegression(penalty='l2', solver='sag', random_state=0)"
992 | ]
993 | },
994 | {
995 | "cell_type": "code",
996 | "execution_count": 24,
997 | "metadata": {},
998 | "outputs": [
999 | {
1000 | "name": "stderr",
1001 | "output_type": "stream",
1002 | "text": [
1003 | "/opt/conda/lib/python3.6/site-packages/sklearn/linear_model/sag.py:337: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n",
1004 | " \"the coef_ did not converge\", ConvergenceWarning)\n"
1005 | ]
1006 | },
1007 | {
1008 | "data": {
1009 | "text/plain": [
1010 | "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
1011 | " intercept_scaling=1, l1_ratio=None, max_iter=100,\n",
1012 | " multi_class='warn', n_jobs=None, penalty='l2',\n",
1013 | " random_state=0, solver='sag', tol=0.0001, verbose=0,\n",
1014 | " warm_start=False)"
1015 | ]
1016 | },
1017 | "execution_count": 24,
1018 | "metadata": {},
1019 | "output_type": "execute_result"
1020 | }
1021 | ],
1022 | "source": [
1023 | "clf.fit(X_train, y_train)"
1024 | ]
1025 | },
1026 | {
1027 | "cell_type": "code",
1028 | "execution_count": 25,
1029 | "metadata": {},
1030 | "outputs": [],
1031 | "source": [
1032 | "y_pred = clf.predict(X_test)"
1033 | ]
1034 | },
1035 | {
1036 | "cell_type": "code",
1037 | "execution_count": 26,
1038 | "metadata": {},
1039 | "outputs": [
1040 | {
1041 | "data": {
1042 | "text/plain": [
1043 | "array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,\n",
1044 | " 0., 0., 0.])"
1045 | ]
1046 | },
1047 | "execution_count": 26,
1048 | "metadata": {},
1049 | "output_type": "execute_result"
1050 | }
1051 | ],
1052 | "source": [
1053 | "y_pred[:20]"
1054 | ]
1055 | },
1056 | {
1057 | "cell_type": "markdown",
1058 | "metadata": {
1059 | "_cell_guid": "a37e176c-3b55-43ab-b358-324dc384ceef",
1060 | "_uuid": "d4d6df3e6c40063309ea72f4d4cea51cf616fd80"
1061 | },
1062 | "source": [
1063 | "## 提出"
1064 | ]
1065 | },
1066 | {
1067 | "cell_type": "code",
1068 | "execution_count": 27,
1069 | "metadata": {
1070 | "_cell_guid": "8111500e-330c-411e-a742-66b9d4c5cb2c",
1071 | "_uuid": "40858051e4f458835f937275be4dfe3dfa68b25f"
1072 | },
1073 | "outputs": [],
1074 | "source": [
1075 | "sub = pd.read_csv('../input/titanic/gender_submission.csv')\n",
1076 | "sub['Survived'] = list(map(int, y_pred))\n",
1077 | "sub.to_csv('submission.csv', index=False)"
1078 | ]
1079 | },
1080 | {
1081 | "cell_type": "code",
1082 | "execution_count": null,
1083 | "metadata": {},
1084 | "outputs": [],
1085 | "source": []
1086 | }
1087 | ],
1088 | "metadata": {
1089 | "file_extension": ".py",
1090 | "kernelspec": {
1091 | "display_name": "Python 3",
1092 | "language": "python",
1093 | "name": "python3"
1094 | },
1095 | "language_info": {
1096 | "codemirror_mode": {
1097 | "name": "ipython",
1098 | "version": 3
1099 | },
1100 | "file_extension": ".py",
1101 | "mimetype": "text/x-python",
1102 | "name": "python",
1103 | "nbconvert_exporter": "python",
1104 | "pygments_lexer": "ipython3",
1105 | "version": "3.6.6"
1106 | },
1107 | "mimetype": "text/x-python",
1108 | "name": "python",
1109 | "npconvert_exporter": "python",
1110 | "pygments_lexer": "ipython3",
1111 | "version": 3
1112 | },
1113 | "nbformat": 4,
1114 | "nbformat_minor": 2
1115 | }
1116 |
--------------------------------------------------------------------------------