├── .gitignore ├── Dockerfile ├── docker-compose.yml ├── input ├── submit-files │ └── README.md ├── titanic │ └── README.md └── home-credit-default-risk │ └── README.md ├── errata.md ├── README_EN.md ├── README.md ├── ch03 ├── ch03_03.ipynb ├── ch03_02.ipynb └── ch03_01.ipynb ├── footnote.md └── ch02 ├── ch02_08.ipynb ├── ch02_05.ipynb ├── ch02_01.ipynb └── ch02_02.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | *.csv 2 | .ipynb_checkpoints 3 | data 4 | -------------------------------------------------------------------------------- /Dockerfile: -------------------------------------------------------------------------------- 1 | FROM gcr.io/kaggle-images/python:v68 2 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "3" 2 | services: 3 | jupyter: 4 | build: . 5 | volumes: 6 | - $PWD:/tmp/working 7 | working_dir: /tmp/working 8 | ports: 9 | - 8888:8888 10 | command: jupyter notebook --ip=0.0.0.0 --allow-root --no-browser 11 | -------------------------------------------------------------------------------- /input/submit-files/README.md: -------------------------------------------------------------------------------- 1 | # 入力データ 2 | 3 | [データ](https://www.kaggle.com/sishihara/submit-files)をダウンロードし、このディレクトリ内に配置してください。 4 | 5 | ``` 6 | input 7 | └── submit-files 8 | |── submission_lightgbm_holdout.csv 9 | |── submission_lightgbm_skfold.csv 10 | └── submission_randomforest.csv 11 | ``` 12 | -------------------------------------------------------------------------------- /input/titanic/README.md: -------------------------------------------------------------------------------- 1 | # 入力データ 2 | 3 | Kaggleの[Titanic: Machine Learning from Disaster](https://www.kaggle.com/c/titanic) から[データ](https://www.kaggle.com/c/titanic/data)をダウンロードし、このディレクトリ内に配置してください。 4 | 5 | ``` 6 | input 7 | └── titanic 8 | |── train.csv 9 | |── test.csv 10 | └── gender_submission.csv 11 | ``` 12 | -------------------------------------------------------------------------------- /input/home-credit-default-risk/README.md: -------------------------------------------------------------------------------- 1 | # 入力データ 2 | 3 | Kaggleの[Home Credit Default Risk](https://www.kaggle.com/c/home-credit-default-risk) から[データ](https://www.kaggle.com/c/home-credit-default-risk/data)をダウンロードし、このディレクトリ内に配置してください。 4 | ダウンロードしたファイルの拡張子は「.csv.zip」になっています。 5 | 解凍してcsvファイルに変換してください。 6 | 7 | ``` 8 | input 9 | └── home-credit-default-risk 10 | |── application_train.csv 11 | └── bureau.csv 12 | ``` 13 | -------------------------------------------------------------------------------- /errata.md: -------------------------------------------------------------------------------- 1 | # 正誤表 2 | 3 | 現時点で判明している誤植などを掲載しています。 4 | 次回重版時に修正予定です。 5 | 6 | | 該当箇所 | 誤 | 正 | 対応 | 7 | | -- | -- | -- | -- | 8 | | p. 13 | Kaggleのコンペの概要は図1.1の通りで,次のような流れになっています. | Kaggleのコンペの概要は図1.1の通りで,次のような流れになっています.図1.1は参考文献[16]を参考に作成しました.| 紙版第3刷で対応 | 9 | | p. 13 | 「R」[16] | 「R」 | 紙版第3刷で対応 | 10 | | p. 13 | [16] R: The R Project for Statistical Computing, https://www.r-project.org/ (Accessed: 30 November 2019). | [16] Kaggleで描く成長戦略 〜個人編・組織編〜, https://www2.slideshare.net/HaradaKei/devsumi-2018summer (Accessed: 24 December 2020). | 紙版第3刷で対応 | 11 | | p. 27 | つまづき | つまずき | 紙版第3刷で対応 | 12 | | p. 48 | プログ | ブログ | 紙版第3刷で対応 | 13 | | p. 58 | 習う | 倣う | 紙版第3刷で対応 | 14 | | p. 71 | 深堀り | 深掘り | 紙版第3刷で対応 | 15 | | p. 84 | Fale | Fare | 紙版第3刷で対応 | 16 | | p. 138 | 脚注[138]の欠落 | 「日本語版Wikipediaで事前に学習済のモデル[103]を用いて」
[103] ja.text8 https://github.com/Hironsan/ja.text8 (Accessed: 30 November 2019).| 紙版第2刷で対応 | 17 | | p. 149 | ベンチーマーク | ベンチマーク | 紙版第3刷で対応 | 18 | | p. 149 | Disscussion | Discussion | 紙版第3刷で対応 | 19 | | p. 154 | [107] Kaggleの画像コンペのためのGCPインスタンス作成手順(2019年10月版), https://currypurin.qrunch.io/entries/T9iGWHdsiI6o2wke (Accessed: 30 November 2019). | [107] Kaggleの画像コンペのためのGCPインスタンス作成手順(2019年10月版), https://www.currypurin.com/entry/2019/10/10/094133 (Accessed: 24 December 2020). | 紙版第3刷で対応 | 20 | | p. 166, 168, 169 | Suvivied, Surviviedなど | Survived | 紙版第3刷で対応 | 21 | -------------------------------------------------------------------------------- /README_EN.md: -------------------------------------------------------------------------------- 1 | # Supplemental materials for *"Python Kaggle Start Book"* 2 | 3 | - This repository contains sample code for a Japanese book entitled "Python Kaggle Start Book (PythonではじめるKaggleスタートブック)". 4 | - You can also see a list of [footnote](footnote.md) and [errata](errata.md). 5 | - If you have any questions or comments, please create an [issue](https://github.com/upura/python-kaggle-start-book/issues). 6 | 7 | ## Directories 8 | 9 | | Directory | Description | 10 | |:----|:-------| 11 | | input | input data | 12 | | ch02 | sample code for chapter 2 | 13 | | ch03 | sample code for chapter 3 | 14 | 15 | ## Kaggle Notebooks 16 | 17 | - [ch02_01.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-01) 18 | - [ch02_02.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-02) 19 | - [ch02_03.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-03) 20 | - [ch02_04.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-04) 21 | - [ch02_05.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-05) 22 | - [ch02_06.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-06) 23 | - [ch02_07.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-07) 24 | - [ch02_08.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-08) 25 | - [ch03_01.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-01) 26 | - [ch03_02.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-02) 27 | - [ch03_03.ipynb](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-03) 28 | 29 | ## FAQ 30 | 31 | You can refer to the [past questions](https://github.com/upura/python-kaggle-start-book/issues?q=is%3Aissue) and FAQ. 32 | 33 | ### Is there an electronic version available? 34 | 35 | An electronic version is [available](https://bookclub.kodansha.co.jp/buy?item=0000325172) in reflowable formats including Kindle and Kobo from 26 May 2020. 36 | 37 | ### p.41: There is no "Commit" in Kaggle Notebook 38 | 39 | The design of the Kaggle Notebook is constantly updated, and you can find the [tutorial video](https://youtu.be/lU_VY79vJfk) on YouTube. 40 | 41 | ### Is there the other languages version available? 42 | 43 | [Traditional Chinese](http://books.gotop.com.tw/v_ACD021100) version and [Korean](https://jpub.tistory.com/1147) version were published in April 2021. 44 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![readme_en](https://img.shields.io/static/v1?label=README&message=English&color=blue)](README_EN.md) 2 | 3 | # サンプルコード・脚注・正誤表 4 | 5 | - 『PythonではじめるKaggleスタートブック』のサンプルコードです。 6 | - 書籍内の脚注は[こちら](footnote.md)にまとめています。 7 | - 正誤表は[こちら](errata.md)です。 8 | - ご感想・ご質問は、[issue](https://github.com/upura/python-kaggle-start-book/issues)にてお願いします。よくある質問(FAQ)は、[下記](https://github.com/upura/python-kaggle-start-book#%E3%82%88%E3%81%8F%E3%81%82%E3%82%8B%E8%B3%AA%E5%95%8Ffaq)にまとめています。 9 | 10 | ## 各ディレクトリの内容 11 | 12 | - 「Kaggle (py36)」は、書籍の第1刷に合わせて公開したファイルです。Pythonバージョンは3.6です。ディレクトリ名`ch02`と`ch03`のファイルをアップロードしました。 13 | - 「Kaggle (py310)」は、書籍の第5刷に合わせて公開したファイルです。Pythonバージョンは3.10です。 14 | 15 | | ディレクトリ | 内容 | ファイル名 | Kaggle (py36) | Kaggle (py310) | 16 | |:---|:---|:---|:---|:---| 17 | | input | 入力ファイル | - | - | - | 18 | | ch02 | 第2章のサンプルコード | ch02_01.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-01) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-01) | 19 | | | | ch02_02.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-02) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-02) | 20 | | | | ch02_03.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-03) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-03) | 21 | | | | ch02_04.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-04) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-04) | 22 | | | | ch02_05.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-05) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-05) | 23 | | | | ch02_06.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-06) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-06) | 24 | | | | ch02_07.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-07) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-07) | 25 | | | | ch02_08.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch02-08) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch02-08) | 26 | | ch03 | 第3章のサンプルコード | ch03_01.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-01) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch03-01) | 27 | | | | ch03_02.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-02) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch03-02) | 28 | | | | ch03_03.ipynb | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/python-kaggle-start-book-ch03-03) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://www.kaggle.com/sishihara/py310-python-kaggle-start-book-ch03-03) | 29 | 30 | ### 「Kaggle (py310)」での主な変更点 31 | 32 | - 注:乱数やライブラリの仕様変更などの問題で、初版と合わせた公開したファイルと実行結果が一致しない場合があります。 33 | - 第5刷:LightGBMの[仕様変更](https://github.com/microsoft/LightGBM/pull/4908)に伴い`early_stopping_rounds`引数を削除しました。 34 | - 第5刷:ch03_02.ipynbで`dataiter.next()`を`next(dataiter)`に変更しました。 35 | - 第5刷:ch03_03.ipynbで`word2vec.Word2Vec`の引数の`size`を`vector_size`に変更しました。 36 | - 第6刷:ch02_03.ipynbでimportするライブラリ名を`pandas_profiling`から`ydata_profiling`に変更しました。 37 | 38 | ## よくある質問(FAQ) 39 | 40 | 過去の全質問・やり取りは[こちら](https://github.com/upura/python-kaggle-start-book/issues?q=is%3Aissue)で公開しています。 41 | 42 | ### 電子版の販売はありますか? 43 | 44 | 2020年5月26日から、kindle版やkobo版などリフロー型で配信しています。[オンライン書店](https://bookclub.kodansha.co.jp/buy?item=0000325172)にてご確認ください。 45 | 46 | ### p.41: Kaggle Notebookの「Commit」が存在しません 47 | 48 | Kaggle Notebookのデザインは随時更新されています。2023年5月時点のCommit方法は、[動画](https://youtu.be/u6Bc0jiWu38)で解説しています。 49 | 50 | ### 他言語版はありますか? 51 | 52 | 中国語繁体字版の『[Kaggle大師教您用Python玩資料科學,比賽拿獎金](http://books.gotop.com.tw/v_ACD021100)』と、韓国語版の『[파이썬으로 시작하는 캐글: 입문에서 컴피티션까지](https://jpub.tistory.com/1147)』が、2021年4月に出版されました。 53 | -------------------------------------------------------------------------------- /ch03/ch03_03.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This notebook is a sample code with Japanese comments.\n", 8 | "\n", 9 | "# 3.3 Titanicの先へ行く③! テキストデータに触れてみよう" 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 1, 15 | "metadata": { 16 | "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0", 17 | "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a" 18 | }, 19 | "outputs": [], 20 | "source": [ 21 | "import pandas as pd" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 2, 27 | "metadata": {}, 28 | "outputs": [ 29 | { 30 | "data": { 31 | "text/html": [ 32 | "
\n", 33 | "\n", 46 | "\n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | "
text
0I like kaggle very much
1I do not like kaggle
2I do really love machine learning
\n", 68 | "
" 69 | ], 70 | "text/plain": [ 71 | " text\n", 72 | "0 I like kaggle very much\n", 73 | "1 I do not like kaggle\n", 74 | "2 I do really love machine learning" 75 | ] 76 | }, 77 | "execution_count": 2, 78 | "metadata": {}, 79 | "output_type": "execute_result" 80 | } 81 | ], 82 | "source": [ 83 | "df = pd.DataFrame({'text': ['I like kaggle very much',\n", 84 | " 'I do not like kaggle',\n", 85 | " 'I do really love machine learning']})\n", 86 | "df" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "# Bag of Words" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 3, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "data": { 103 | "text/plain": [ 104 | "array([[0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1],\n", 105 | " [1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0],\n", 106 | " [1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0]], dtype=int64)" 107 | ] 108 | }, 109 | "execution_count": 3, 110 | "metadata": {}, 111 | "output_type": "execute_result" 112 | } 113 | ], 114 | "source": [ 115 | "from sklearn.feature_extraction.text import CountVectorizer\n", 116 | "\n", 117 | "\n", 118 | "vectorizer = CountVectorizer(token_pattern=u'(?u)\\\\b\\\\w+\\\\b')\n", 119 | "bag = vectorizer.fit_transform(df['text'])\n", 120 | "bag.toarray()" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 4, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "{'i': 1, 'like': 4, 'kaggle': 2, 'very': 10, 'much': 7, 'do': 0, 'not': 8, 'really': 9, 'love': 5, 'machine': 6, 'learning': 3}\n" 133 | ] 134 | } 135 | ], 136 | "source": [ 137 | "print(vectorizer.vocabulary_)" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "# TF-IDF" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 5, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "name": "stdout", 154 | "output_type": "stream", 155 | "text": [ 156 | "[[0. 0.31544415 0.40619178 0. 0.40619178 0.\n", 157 | " 0. 0.53409337 0. 0. 0.53409337]\n", 158 | " [0.43306685 0.33631504 0.43306685 0. 0.43306685 0.\n", 159 | " 0. 0. 0.56943086 0. 0. ]\n", 160 | " [0.34261996 0.26607496 0. 0.45050407 0. 0.45050407\n", 161 | " 0.45050407 0. 0. 0.45050407 0. ]]\n" 162 | ] 163 | } 164 | ], 165 | "source": [ 166 | "from sklearn.feature_extraction.text import CountVectorizer\n", 167 | "from sklearn.feature_extraction.text import TfidfTransformer\n", 168 | "\n", 169 | "\n", 170 | "vectorizer = CountVectorizer(token_pattern=u'(?u)\\\\b\\\\w+\\\\b')\n", 171 | "transformer = TfidfTransformer()\n", 172 | "\n", 173 | "tf = vectorizer.fit_transform(df['text'])\n", 174 | "tfidf = transformer.fit_transform(tf)\n", 175 | "print(tfidf.toarray())" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 6, 181 | "metadata": {}, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | "{'i': 1, 'like': 4, 'kaggle': 2, 'very': 10, 'much': 7, 'do': 0, 'not': 8, 'really': 9, 'love': 5, 'machine': 6, 'learning': 3}\n" 188 | ] 189 | } 190 | ], 191 | "source": [ 192 | "print(vectorizer.vocabulary_)" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "# Word2vec" 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 7, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "from gensim.models import word2vec\n", 209 | "\n", 210 | "\n", 211 | "sentences = [d.split() for d in df['text']]\n", 212 | "model = word2vec.Word2Vec(sentences, size=10, min_count=1, window=2, seed=7)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 8, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "data": { 222 | "text/plain": [ 223 | "array([-0.04932676, -0.01171829, 0.04239148, 0.01735417, -0.04764815,\n", 224 | " -0.03205363, -0.02873827, 0.04682567, 0.04185081, 0.00795709],\n", 225 | " dtype=float32)" 226 | ] 227 | }, 228 | "execution_count": 8, 229 | "metadata": {}, 230 | "output_type": "execute_result" 231 | } 232 | ], 233 | "source": [ 234 | "model.wv['like']" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 9, 240 | "metadata": {}, 241 | "outputs": [ 242 | { 243 | "data": { 244 | "text/plain": [ 245 | "[('much', 0.31108221411705017),\n", 246 | " ('really', 0.11813490092754364),\n", 247 | " ('not', 0.07177764177322388),\n", 248 | " ('learning', -0.014833025634288788),\n", 249 | " ('very', -0.03584161400794983),\n", 250 | " ('do', -0.11829414963722229),\n", 251 | " ('machine', -0.12069450318813324),\n", 252 | " ('kaggle', -0.532151997089386),\n", 253 | " ('love', -0.5468614101409912),\n", 254 | " ('I', -0.7641928195953369)]" 255 | ] 256 | }, 257 | "execution_count": 9, 258 | "metadata": {}, 259 | "output_type": "execute_result" 260 | } 261 | ], 262 | "source": [ 263 | "model.wv.most_similar('like')" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 10, 269 | "metadata": {}, 270 | "outputs": [ 271 | { 272 | "data": { 273 | "text/plain": [ 274 | "['I', 'like', 'kaggle', 'very', 'much']" 275 | ] 276 | }, 277 | "execution_count": 10, 278 | "metadata": {}, 279 | "output_type": "execute_result" 280 | } 281 | ], 282 | "source": [ 283 | "df['text'][0].split()" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 11, 289 | "metadata": {}, 290 | "outputs": [ 291 | { 292 | "data": { 293 | "text/plain": [ 294 | "array([[-0.00070634, 0.04390315, -0.03669089, 0.02026465, 0.04046954,\n", 295 | " 0.02365695, 0.020924 , -0.03109757, -0.04436051, -0.00691835],\n", 296 | " [-0.04932676, -0.01171829, 0.04239148, 0.01735417, -0.04764815,\n", 297 | " -0.03205363, -0.02873827, 0.04682567, 0.04185081, 0.00795709],\n", 298 | " [ 0.03825201, -0.04983004, -0.03085005, -0.0421443 , 0.04703034,\n", 299 | " -0.01274201, 0.00586073, -0.02872854, 0.01241979, -0.03893603],\n", 300 | " [-0.01407814, -0.03944685, 0.01979917, -0.00788147, 0.03230685,\n", 301 | " 0.04465036, -0.01564248, 0.04261149, -0.04766037, 0.03080159],\n", 302 | " [ 0.0160588 , -0.04853946, 0.02299253, 0.00940678, -0.04020066,\n", 303 | " 0.00423941, 0.00689822, 0.02838706, -0.02563218, 0.02724046]],\n", 304 | " dtype=float32)" 305 | ] 306 | }, 307 | "execution_count": 11, 308 | "metadata": {}, 309 | "output_type": "execute_result" 310 | } 311 | ], 312 | "source": [ 313 | "import numpy as np\n", 314 | "\n", 315 | "\n", 316 | "wordvec = np.array([model.wv[word] for word in df['text'][0].split()])\n", 317 | "wordvec" 318 | ] 319 | }, 320 | { 321 | "cell_type": "code", 322 | "execution_count": 12, 323 | "metadata": {}, 324 | "outputs": [ 325 | { 326 | "data": { 327 | "text/plain": [ 328 | "array([-0.00196009, -0.0211263 , 0.00352845, -0.00060003, 0.00639158,\n", 329 | " 0.00555022, -0.00213956, 0.01159962, -0.01267649, 0.00402895],\n", 330 | " dtype=float32)" 331 | ] 332 | }, 333 | "execution_count": 12, 334 | "metadata": {}, 335 | "output_type": "execute_result" 336 | } 337 | ], 338 | "source": [ 339 | "np.mean(wordvec, axis=0)" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": 13, 345 | "metadata": {}, 346 | "outputs": [ 347 | { 348 | "data": { 349 | "text/plain": [ 350 | "array([0.03825201, 0.04390315, 0.04239148, 0.02026465, 0.04703034,\n", 351 | " 0.04465036, 0.020924 , 0.04682567, 0.04185081, 0.03080159],\n", 352 | " dtype=float32)" 353 | ] 354 | }, 355 | "execution_count": 13, 356 | "metadata": {}, 357 | "output_type": "execute_result" 358 | } 359 | ], 360 | "source": [ 361 | "np.max(wordvec, axis=0)" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": null, 367 | "metadata": {}, 368 | "outputs": [], 369 | "source": [] 370 | } 371 | ], 372 | "metadata": { 373 | "file_extension": ".py", 374 | "kernelspec": { 375 | "display_name": "Python 3", 376 | "language": "python", 377 | "name": "python3" 378 | }, 379 | "language_info": { 380 | "codemirror_mode": { 381 | "name": "ipython", 382 | "version": 3 383 | }, 384 | "file_extension": ".py", 385 | "mimetype": "text/x-python", 386 | "name": "python", 387 | "nbconvert_exporter": "python", 388 | "pygments_lexer": "ipython3", 389 | "version": "3.6.6" 390 | }, 391 | "mimetype": "text/x-python", 392 | "name": "python", 393 | "npconvert_exporter": "python", 394 | "pygments_lexer": "ipython3", 395 | "version": 3 396 | }, 397 | "nbformat": 4, 398 | "nbformat_minor": 2 399 | } 400 | -------------------------------------------------------------------------------- /footnote.md: -------------------------------------------------------------------------------- 1 | # 書籍内の脚注 2 | 3 | ## はじめに 4 | 5 | - [1] Welcome to Python.org, https://www.python.org/ (Accessed: 30 November 2019). 6 | - [2] Kaggle: Your Home for Data Science, https://www.kaggle.com/ (Accessed: 30 November 2019). 7 | - [3] Titanic: Machine Learning from Disaster, https://www.kaggle.com/c/titanic (Accessed: 30 November 2019). 8 | - [4] Qiita, https://qiita.com/ (Accessed: 30 November 2019). 9 | - [5] Kaggleに登録したら次にやること ~ これだけやれば十分闘える!Titanicの先へ行く入門 10 Kernel ~], https://qiita.com/upura/items/3c10ff6fed4e7c3d70f0 (Accessed: 30 November 2019). 10 | - [6] Kaggle - Qiita, https://qiita.com/tags/kaggle (Accessed: 30 November 2019). 11 | - [7] 村田秀樹, 『Kaggleのチュートリアル』, https://note.mu/currypurin/n/nf390914c721e (Accessed: 30 November 2019). 12 | - [8] GitHub, http://github.com (Accessed: 30 November 2019). 13 | - [9] Docker: Enterprise Container Platform, https://www.docker.com/ (Accessed: 30 November 2019). 14 | - [10] Container Registry - Google Cloud Platform, https://console.cloud.google.com/gcr/images/kaggle-images/GLOBAL/python (Accessed: 30 November 2019). 15 | - [11] PetFinder.my Adoption Prediction, https://www.kaggle.com/c/petfinder-adoption-prediction (Accessed: 30 November 2019). 16 | - [12] Kaggle Days Tokyo, https://www.kaggle.com/c/kaggle-days-tokyo (Accessed: 10 March 2024). 17 | - [13] 機械学習を用いた日経電子版Proのユーザ分析 データドリブンチームの知られざる取り組み, https://logmi.jp/tech/articles/321077 (Accessed: 30 November 2019). 18 | - [14] Santander Value Prediction Challenge, https://www.kaggle.com/c/santander-value-prediction-challenge (Accessed: 30 November 2019). 19 | - [15] LANL Earthquake Prediction, https://www.kaggle.com/c/LANL-Earthquake-Prediction (Accessed: 30 November 2019). 20 | 21 | ## 第1章 22 | 23 | - [16] Kaggleで描く成長戦略 〜個人編・組織編〜, https://www2.slideshare.net/HaradaKei/devsumi-2018summer (Accessed: 24 December 2020). 24 | - [17] Kaggle Progression System, https://www.kaggle.com/progression (Accessed: 30 November 2019). 25 | - [18] KaggleのGrandmasterやmasterの条件や人数について調べたので、詳細に書きとめます。, http://www.currypurin.com/entry/2018/02/21/011316 (Accessed: 30 November 2019). 26 | - [19] SIGNATE, https://signate.jp/ (Accessed: 30 November 2019). 27 | - [20] 杉山将, 『イラストで学ぶ 機械学習』, 講談社, 2013 28 | - [21] AtCoder:競技プログラミングコンテストを開催する国内最大のサイト, https://atcoder.jp/ (Accessed: 30 November 2019). 29 | - [22] AtCoder に登録したら次にやること ~ これだけ解けば十分闘える!過去問精選 10 問 ~, https://qiita.com/drken/items/fd4e5e3630d0f5859067 (Accessed: 30 November 2019). 30 | - [23] TalkingData AdTracking Fraud Detection Challenge, https://www.kaggle.com/c/talkingdata-adtracking-fraud-detection (Accessed: 30 November 2019). 31 | 32 | ## 第2章 33 | 34 | - [24] Kaggle API, https://github.com/Kaggle/kaggle-api (Accessed: 30 November 2019). 35 | - [25] kaggle-apiというKaggle公式のapiの使い方をまとめます, http://www.currypurin.com/entry/2018/kaggle-api (Accessed: 30 November 2019). 36 | - [26] NumPy, https://numpy.org/ (Accessed: 30 November 2019). 37 | - [27] Pandas, https://pandas.pydata.org/ (Accessed: 30 November 2019). 38 | - [28] sklearn.preprocessing.StandardScaler, https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html (Accessed: 30 November 2019). 39 | - [29] sklearn.ensemble.RandomForestClassifier, https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (Accessed: 30 November 2019). 40 | - [30] LightGBM, https://lightgbm.readthedocs.io/en/latest/ (Accessed: 30 November 2019). 41 | - [31] Pandas Profiling, https://github.com/pandas-profiling/pandas-profiling (Accessed: 30 November 2019). 42 | - [32] Santander Customer Transaction Prediction, https://www.kaggle.com/c/santander-customer-transaction-prediction (Accessed: 30 November 2019). 43 | - [33] IEEE-CIS Fraud Detection, https://www.kaggle.com/c/ieee-fraud-detection (Accessed: 30 November 2019). 44 | - [34] Home Credit Default Risk, https://www.kaggle.com/c/home-credit-default-risk (Accessed: 30 November 2019). 45 | - [35] Deterministic neural networks using PyTorch, https://www.kaggle.com/bminixhofer/deterministic-neural-networks-using-pytorch (Accessed: 30 November 2019). 46 | - [36] 門脇大輔・阪田隆司・保坂桂佑・平松雄司,『Kaggleで勝つデータ分析の技術』, 技術評論社, 2019 47 | - [37] 著:Alice Zheng, Amanda Casari, 訳:株式会社ホクソエム, 『機械学習のための特徴量エンジニアリング』, オライリージャパン, 2019 48 | - [38] 最近のKaggleに学ぶテーブルデータの特徴量エンジニアリング, https://www.slideshare.net/mlm_kansai/kaggle-138546659 (Accessed: 30 November 2019). 49 | - [39] 【随時更新】Kaggleテーブルデータコンペできっと役立つTipsまとめ, https://naotaka1128.hatenadiary.jp/entry/kaggle-compe-tips (Accessed: 30 November 2019). 50 | - [40] nejumi/kaggle_memo, https://github.com/nejumi/kaggle_memo (Accessed: 30 November 2019). 51 | - [41] 本橋智光,『前処理大全』, 技術評論社, 2018 52 | - [42] Instacart Market Basket Analysis, https://www.kaggle.com/c/instacart-market-basket-analysis (Accessed: 30 November 2019). 53 | - [43] 第2回:「Kaggle」の面白さとは--食品宅配サービスの購買予測コンペで考える -, https://japan.zdnet.com/article/35124706/ (Accessed: 30 November 2019). 54 | - [44] PLAsTiCC Astronomical Classification, https://www.kaggle.com/c/PLAsTiCC-2018 (Accessed: 30 November 2019). 55 | - [45] 半田利弘, 『基礎からわかる天文学』, 誠文堂新光社, 2011 56 | - [46] Python-package Introduction, https://lightgbm.readthedocs.io/en/latest/Python-Intro.html (Accessed: 30 November 2019). 57 | - [47] Supervised learning, https://scikit-learn.org/stable/supervised_learning.html (Accessed: 30 November 2019). 58 | - [48] lightgbm カテゴリカル変数と欠損値の扱いについて+α, https://tebasakisan.hatenadiary.com/entry/2019/01/27/222102 (Accessed: 30 November 2019). 59 | - [49] XGBoost, https://xgboost.readthedocs.io/en/latest/ (Accessed: 30 November 2019). 60 | - [50] CatBoost, https://catboost.ai/ (Accessed: 30 November 2019). 61 | - [51] PyTorch, https://pytorch.org/ (Accessed: 30 November 2019). 62 | - [52] TensorFlow, https://www.tensorflow.org/ (Accessed: 30 November 2019). 63 | - [53] LightGBM Parameters, https://lightgbm.readthedocs.io/en/latest/Parameters.html (Accessed: 30 November 2019). 64 | - [54] LightGBM Parameters-Tuning, https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html (Accessed: 30 November 2019). 65 | - [55] sklearn.model_selection.GridSearchCV, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html (Accessed: 30 November 2019). 66 | - [56] Bayesian Optimization, https://github.com/fmfn/BayesianOptimization (Accessed: 30 November 2019). 67 | - [57] Hyperopt, https://github.com/hyperopt/hyperopt (Accessed: 30 November 2019). 68 | - [58] Optuna, https://optuna.org/ (Accessed: 30 November 2019). 69 | - [59] Optuna Trial, https://optuna.readthedocs.io/en/latest/reference/trial.html (Accessed: 30 November 2019). 70 | - [60] Optunaでrandomのseedを固定する方法, https://qiita.com/phorizon20/items/1b795beb202c2dc378ed (Accessed: 30 November 2019). 71 | - [61] 勾配ブースティングで大事なパラメータの気持ち, https://nykergoto.hatenablog.jp/entry/2019/03/29/勾配ブースティングで大事なパラメータの気持ち (Accessed: 30 November 2019). 72 | - [62] 有名ライブラリと比較したLightGBMの現在, https://alphaimpact.jp/downloads/pydata20190927.pdf (Accessed: 30 November 2019). 73 | - [63] Recruit Restaurant Visitor Forecasting, https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting (Accessed: 30 November 2019). 74 | - [64] Neko kin, https://www.slideshare.net/ShotaOkubo/neko-kin-96769953 (Accessed: 30 November 2019). 75 | - [65] sklearn.model_selection.TimeSeriesSplit, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html (Accessed: 30 November 2019). 76 | - [66] State Farm Distracted Driver Detection, https://www.kaggle.com/c/state-farm-distracted-driver-detection (Accessed: 30 November 2019). 77 | - [67] Kaggle State Farm Distracted Driver Detection, https://speakerdeck.com/iwiwi/kaggle-state-farm-distracted-driver-detection (Accessed: 30 November 2019). 78 | - [68] sklearn.model_selection.GroupKFold, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GroupKFold.html (Accessed: 30 November 2019). 79 | - [69] Profiling Top Kagglers: Bestfitting, Currently #1 in the World, https://medium.com/kaggle-blog/profiling-top-kagglers-bestfitting-currently-1-in-the-world-58cc0e187b (Accessed: 30 November 2019). 80 | - [70] Kaggle Ensembling Guide, http://web.archive.org/web/20210727094233/https://mlwave.com/kaggle-ensembling-guide/ (Accessed: 14 May 2023). 81 | - [71] Avito Demand Prediction Challenge, https://www.kaggle.com/c/avito-demand-prediction (Accessed: 30 November 2019). 82 | - [72] Kaggle Avito Demand Prediction Challenge 9th Place Solution, https://www.slideshare.net/JinZhan/kaggle-avito-demand-prediction-challenge-9th-place-solution-124500050 (Accessed: 30 November 2019). 83 | - [73] The BigChaos Solution to the Netflix Grand Prize, https://www.asc.ohio-state.edu/statistics/statgen/joul_aut2009/BigChaos.pdf (Accessed: 14 May 2023). 84 | 85 | ## 第3章 86 | 87 | - [74] Introduction to Manual Feature Engineering, https://www.kaggle.com/willkoehrsen/introduction-to-manual-feature-engineering (Accessed: 30 November 2019). 88 | - [75] 第9回:Kaggleの「画像コンペ」とは--取り組み方と面白さを読み解く, https://japan.zdnet.com/article/35140207/ (Accessed: 30 November 2019). 89 | - [76] Adversarial Example, https://arxiv.org/abs/1312.6199 (Accessed: 30 November 2019). 90 | - [77] Generative Adversarial Network(GAN), https://arxiv.org/abs/1406.2661 (Accessed: 30 November 2019). 91 | - [78] CS231n: Convolutional Neural Networks for Visual Recognition, http://cs231n.stanford.edu/ (Accessed: 30 November 2019). 92 | - [79] Lecture 11: Detection and Segmentation, http://cs231n.stanford.edu/slides/2018/cs231n_2018_lecture11.pdf (Accessed: 30 November 2019). 93 | - [80] Neural Information Processing Systems (NeurIPS), https://nips.cc/ (Accessed: 30 November 2019). 94 | - [81] NIPS 2017: Non-targeted Adversarial Attack, https://www.kaggle.com/c/nips-2017-non-targeted-adversarial-attack/ (Accessed: 30 November 2019). 95 | - [82] NIPS’17 Adversarial Learning Competition に参戦しました, https://research.preferred.jp/2018/04/nips17-adversarial-learning-competition/ (Accessed: 30 November 2019). 96 | - [83] Explaining and Harnessing Adversarial Examples, https://arxiv.org/abs/1412.6572 (Accessed: 30 November 2019). 97 | - [84] Generative Dog Images, https://www.kaggle.com/c/generative-dog-images (Accessed: 30 November 2019). 98 | - [85] An intuitive introduction to Generative Adversarial Networks (GANs), https://www.freecodecamp.org/news/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394/ (Accessed: 30 November 2019). 99 | - [86] Generative Dog Images, https://speakerdeck.com/hirune924/generative-dog-images (Accessed: 30 November 2019). 100 | - [87] TRAINING A CLASSIFIER, https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html (Accessed: 30 November 2019). 101 | - [88] CIFAR10, https://www.cs.toronto.edu/~kriz/cifar.html (Accessed: 30 November 2019). 102 | - [89] 原田達也, 『画像認識』, 講談社, 2017 103 | - [90] Distinctive Image Features from Scale-Invariant Keypoints, https://www.robots.ox.ac.uk/~vgg/research/affine/det_eval_files/lowe_ijcv2004.pdf (Accessed: 30 November 2019). 104 | - [91] iMet 7th place solution & my approach to image data competition, https://speakerdeck.com/phalanx/imet-7th-place-solution-and-my-approach-to-image-data-competition?slide=30 (Accessed: 30 November 2019). 105 | - [92] Convolutional Neural Network (CNN), https://www.deeplearningbook.org/front_matter.pdf (Accessed: 30 November 2019). 106 | - [93] APTOS 2019 Blindness Detection, https://www.kaggle.com/c/aptos2019-blindness-detection (Accessed: 30 November 2019). 107 | - [94] TensorFlow 2.0 Question Answering, https://www.kaggle.com/c/tensorflow2-question-answering (Accessed: 30 November 2019). 108 | - [95] Quora Insincere Questions Classification, https://www.kaggle.com/c/quora-insincere-questions-classification/ (Accessed: 30 November 2019). 109 | - [96] Jigsaw Unintended Bias in Toxicity Classification, https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification (Accessed: 30 November 2019). 110 | - [97] 絵で理解するWord2vecの仕組み, https://qiita.com/Hironsan/items/11b388575a058dc8a46a (Accessed: 30 November 2019). 111 | - [98] word2vec(Skip-Gram Model)の仕組みを恐らく日本一簡潔にまとめてみたつもり, https://www.randpy.tokyo/entry/word2vec_skip_gram_model (Accessed: 30 November 2019). 112 | - [99] Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms, https://arxiv.org/abs/1805.09843 (Accessed: 30 November 2019). 113 | - [100] Approaching (Almost) Any NLP Problem on Kaggle, https://www.kaggle.com/abhishek/approaching-almost-any-nlp-problem-on-kaggle (Accessed: 30 November 2019). 114 | - [101] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, https://arxiv.org/abs/1810.04805 (Accessed: 30 November 2019). 115 | - [102] XLNet: Generalized Autoregressive Pretraining for Language Understanding, https://arxiv.org/abs/1906.08237 (Accessed: 30 November 2019). 116 | - [103] ja.text8, https://github.com/Hironsan/ja.text8 (Accessed: 30 November 2019). 117 | - [104] 日本語版text8コーパスを作って分散表現を学習する, https://hironsan.hatenablog.com/entry/japanese-text8-corpus (Accessed: 30 November 2019). 118 | 119 | ## 第4章 120 | 121 | - [105] GCPとDockerでKaggle用計算環境構築, https://qiita.com/lain21/items/a33a39d465cd08b662f1 (Accessed: 30 November 2019). 122 | - [106] Kaggle用のGCP環境を手軽に構築, https://qiita.com/hiromu166/items/2a738f7be49d88d8b599 (Accessed: 30 November 2019). 123 | - [107] Kaggleの画像コンペのためのGCPインスタンス作成手順(2019年10月版), https://www.currypurin.com/entry/2019/10/10/094133 (Accessed: 24 December 2020). 124 | 125 | ### 4.4 お勧めの資料・文献・リンク 126 | 127 | - 4.4.1 kaggler-ja slack, https://yutori-datascience.hatenablog.com/entry/2017/08/23/143146 128 | - 4.4.2 kaggler-ja wiki, https://kaggler-ja.wiki/ 129 | - 4.4.3 門脇大輔ら,『Kaggleで勝つデータ分析の技術』, 技術評論社, 2019, https://gihyo.jp/book/2019/978-4-297-10843-4 130 | - 4.4.4 Kaggle Tokyo Meetupの資料・動画 131 | 132 | | 回 | URL | 133 | | -- | -- | 134 | | 第1回 | https://kaggler-ja.wiki/5e82184687ef5e0040104d40 | 135 | | 第2回 | https://kaggler-ja.wiki/5e82190787ef5e0040104d45 | 136 | | 第3回 | http://yutori-datascience.hatenablog.com/entry/2017/10/29/205433 | 137 | | 第4回 | https://connpass.com/event/82458/presentation/ | 138 | | 第4回(動画) | https://www.youtube.com/watch?v=VMjnhGW2MgU&list=PLkBjLQIG@{}EjJlciM9lEz1AsuZZ8lDgyxDu | 139 | | 第5回 | https://connpass.com/event/105298/presentation/ | 140 | | 第6回 | https://connpass.com/event/132935/ | 141 | -------------------------------------------------------------------------------- /ch02/ch02_08.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "_cell_guid": "e12020f7-4f94-4ecc-9007-9b7a6e7458a6", 7 | "_uuid": "1fecb0980d8d422ec0f005c4bfd6225385c2c60f" 8 | }, 9 | "source": [ 10 | "This notebook is a sample code with Japanese comments.\n", 11 | "\n", 12 | "# 2.8 三人寄れば文殊の知恵! アンサンブルを体験しよう" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 1, 18 | "metadata": {}, 19 | "outputs": [ 20 | { 21 | "name": "stdout", 22 | "output_type": "stream", 23 | "text": [ 24 | "README.md submission_lightgbm_skfold.csv\r\n", 25 | "submission_lightgbm_holdout.csv submission_randomforest.csv\r\n" 26 | ] 27 | } 28 | ], 29 | "source": [ 30 | "ls ../input/submit-files" 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 2, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "import pandas as pd\n", 40 | "\n", 41 | "\n", 42 | "sub_lgbm_sk = pd.read_csv('../input/submit-files/submission_lightgbm_skfold.csv')\n", 43 | "sub_lgbm_ho = pd.read_csv('../input/submit-files/submission_lightgbm_holdout.csv')\n", 44 | "sub_rf = pd.read_csv('../input/submit-files/submission_randomforest.csv')" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 3, 50 | "metadata": { 51 | "scrolled": true 52 | }, 53 | "outputs": [ 54 | { 55 | "data": { 56 | "text/html": [ 57 | "
\n", 58 | "\n", 71 | "\n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | "
PassengerIdSurvived
08920
18930
28940
38950
48960
\n", 107 | "
" 108 | ], 109 | "text/plain": [ 110 | " PassengerId Survived\n", 111 | "0 892 0\n", 112 | "1 893 0\n", 113 | "2 894 0\n", 114 | "3 895 0\n", 115 | "4 896 0" 116 | ] 117 | }, 118 | "execution_count": 3, 119 | "metadata": {}, 120 | "output_type": "execute_result" 121 | } 122 | ], 123 | "source": [ 124 | "sub_lgbm_sk.head()" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 4, 130 | "metadata": { 131 | "scrolled": true 132 | }, 133 | "outputs": [ 134 | { 135 | "data": { 136 | "text/html": [ 137 | "
\n", 138 | "\n", 151 | "\n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | "
sub_lgbm_sksub_lgbm_hosub_rf
0000
1001
2000
3000
4001
\n", 193 | "
" 194 | ], 195 | "text/plain": [ 196 | " sub_lgbm_sk sub_lgbm_ho sub_rf\n", 197 | "0 0 0 0\n", 198 | "1 0 0 1\n", 199 | "2 0 0 0\n", 200 | "3 0 0 0\n", 201 | "4 0 0 1" 202 | ] 203 | }, 204 | "execution_count": 4, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "df = pd.DataFrame({'sub_lgbm_sk': sub_lgbm_sk['Survived'].values,\n", 211 | " 'sub_lgbm_ho': sub_lgbm_ho['Survived'].values,\n", 212 | " 'sub_rf': sub_rf['Survived'].values})\n", 213 | "df.head()" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 5, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/html": [ 224 | "
\n", 225 | "\n", 238 | "\n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | "
sub_lgbm_sksub_lgbm_hosub_rf
sub_lgbm_sk1.0000000.8830770.796033
sub_lgbm_ho0.8830771.0000000.731329
sub_rf0.7960330.7313291.000000
\n", 268 | "
" 269 | ], 270 | "text/plain": [ 271 | " sub_lgbm_sk sub_lgbm_ho sub_rf\n", 272 | "sub_lgbm_sk 1.000000 0.883077 0.796033\n", 273 | "sub_lgbm_ho 0.883077 1.000000 0.731329\n", 274 | "sub_rf 0.796033 0.731329 1.000000" 275 | ] 276 | }, 277 | "execution_count": 5, 278 | "metadata": {}, 279 | "output_type": "execute_result" 280 | } 281 | ], 282 | "source": [ 283 | "df.corr()" 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": 6, 289 | "metadata": { 290 | "scrolled": true 291 | }, 292 | "outputs": [ 293 | { 294 | "data": { 295 | "text/html": [ 296 | "
\n", 297 | "\n", 310 | "\n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | "
PassengerIdSurvived
08920
18931
28940
38950
48961
\n", 346 | "
" 347 | ], 348 | "text/plain": [ 349 | " PassengerId Survived\n", 350 | "0 892 0\n", 351 | "1 893 1\n", 352 | "2 894 0\n", 353 | "3 895 0\n", 354 | "4 896 1" 355 | ] 356 | }, 357 | "execution_count": 6, 358 | "metadata": {}, 359 | "output_type": "execute_result" 360 | } 361 | ], 362 | "source": [ 363 | "sub = pd.read_csv('../input/titanic/gender_submission.csv')\n", 364 | "sub['Survived'] = sub_lgbm_sk['Survived'] + sub_lgbm_ho['Survived'] + sub_rf['Survived']\n", 365 | "sub.head()" 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": 7, 371 | "metadata": { 372 | "scrolled": true 373 | }, 374 | "outputs": [ 375 | { 376 | "data": { 377 | "text/html": [ 378 | "
\n", 379 | "\n", 392 | "\n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | "
PassengerIdSurvived
08920
18930
28940
38950
48960
\n", 428 | "
" 429 | ], 430 | "text/plain": [ 431 | " PassengerId Survived\n", 432 | "0 892 0\n", 433 | "1 893 0\n", 434 | "2 894 0\n", 435 | "3 895 0\n", 436 | "4 896 0" 437 | ] 438 | }, 439 | "execution_count": 7, 440 | "metadata": {}, 441 | "output_type": "execute_result" 442 | } 443 | ], 444 | "source": [ 445 | "sub['Survived'] = (sub['Survived'] >= 2).astype(int)\n", 446 | "sub.to_csv('submission_lightgbm_ensemble.csv', index=False)\n", 447 | "sub.head()" 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": null, 453 | "metadata": {}, 454 | "outputs": [], 455 | "source": [] 456 | } 457 | ], 458 | "metadata": { 459 | "kernelspec": { 460 | "display_name": "Python 3", 461 | "language": "python", 462 | "name": "python3" 463 | }, 464 | "language_info": { 465 | "codemirror_mode": { 466 | "name": "ipython", 467 | "version": 3 468 | }, 469 | "file_extension": ".py", 470 | "mimetype": "text/x-python", 471 | "name": "python", 472 | "nbconvert_exporter": "python", 473 | "pygments_lexer": "ipython3", 474 | "version": "3.6.6" 475 | } 476 | }, 477 | "nbformat": 4, 478 | "nbformat_minor": 1 479 | } 480 | -------------------------------------------------------------------------------- /ch02/ch02_05.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "_cell_guid": "e12020f7-4f94-4ecc-9007-9b7a6e7458a6", 7 | "_uuid": "1fecb0980d8d422ec0f005c4bfd6225385c2c60f" 8 | }, 9 | "source": [ 10 | "This notebook is a sample code with Japanese comments.\n", 11 | "\n", 12 | "# 2.5 勾配ブースティングが最強?! いろいろな機械学習アルゴリズムを使ってみよう" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 1, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import numpy as np\n", 22 | "import pandas as pd\n", 23 | "\n", 24 | "\n", 25 | "train = pd.read_csv('../input/titanic/train.csv')\n", 26 | "test = pd.read_csv('../input/titanic/test.csv')\n", 27 | "gender_submission = pd.read_csv('../input/titanic/gender_submission.csv')\n", 28 | "\n", 29 | "data = pd.concat([train, test], sort=False)\n", 30 | "\n", 31 | "data['Sex'].replace(['male', 'female'], [0, 1], inplace=True)\n", 32 | "data['Embarked'].fillna(('S'), inplace=True)\n", 33 | "data['Embarked'] = data['Embarked'].map({'S': 0, 'C': 1, 'Q': 2}).astype(int)\n", 34 | "data['Fare'].fillna(np.mean(data['Fare']), inplace=True)\n", 35 | "data['Age'].fillna(data['Age'].median(), inplace=True)\n", 36 | "data['FamilySize'] = data['Parch'] + data['SibSp'] + 1\n", 37 | "data['IsAlone'] = 0\n", 38 | "data.loc[data['FamilySize'] == 1, 'IsAlone'] = 1" 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": {}, 45 | "outputs": [ 46 | { 47 | "data": { 48 | "text/html": [ 49 | "
\n", 50 | "\n", 63 | "\n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedFamilySizeIsAlone
010.03Braund, Mr. Owen Harris022.010A/5 211717.2500NaN020
121.01Cumings, Mrs. John Bradley (Florence Briggs Th...138.010PC 1759971.2833C85120
231.03Heikkinen, Miss. Laina126.000STON/O2. 31012827.9250NaN011
341.01Futrelle, Mrs. Jacques Heath (Lily May Peel)135.01011380353.1000C123020
450.03Allen, Mr. William Henry035.0003734508.0500NaN011
\n", 171 | "
" 172 | ], 173 | "text/plain": [ 174 | " PassengerId Survived Pclass \\\n", 175 | "0 1 0.0 3 \n", 176 | "1 2 1.0 1 \n", 177 | "2 3 1.0 3 \n", 178 | "3 4 1.0 1 \n", 179 | "4 5 0.0 3 \n", 180 | "\n", 181 | " Name Sex Age SibSp Parch \\\n", 182 | "0 Braund, Mr. Owen Harris 0 22.0 1 0 \n", 183 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1 38.0 1 0 \n", 184 | "2 Heikkinen, Miss. Laina 1 26.0 0 0 \n", 185 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1 35.0 1 0 \n", 186 | "4 Allen, Mr. William Henry 0 35.0 0 0 \n", 187 | "\n", 188 | " Ticket Fare Cabin Embarked FamilySize IsAlone \n", 189 | "0 A/5 21171 7.2500 NaN 0 2 0 \n", 190 | "1 PC 17599 71.2833 C85 1 2 0 \n", 191 | "2 STON/O2. 3101282 7.9250 NaN 0 1 1 \n", 192 | "3 113803 53.1000 C123 0 2 0 \n", 193 | "4 373450 8.0500 NaN 0 1 1 " 194 | ] 195 | }, 196 | "execution_count": 2, 197 | "metadata": {}, 198 | "output_type": "execute_result" 199 | } 200 | ], 201 | "source": [ 202 | "data.head()" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 3, 208 | "metadata": {}, 209 | "outputs": [], 210 | "source": [ 211 | "delete_columns = ['Name', 'PassengerId', 'Ticket', 'Cabin']\n", 212 | "data.drop(delete_columns, axis=1, inplace=True)\n", 213 | "\n", 214 | "train = data[:len(train)]\n", 215 | "test = data[len(train):]\n", 216 | "\n", 217 | "y_train = train['Survived']\n", 218 | "X_train = train.drop('Survived', axis=1)\n", 219 | "X_test = test.drop('Survived', axis=1)" 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 4, 225 | "metadata": {}, 226 | "outputs": [ 227 | { 228 | "data": { 229 | "text/html": [ 230 | "
\n", 231 | "\n", 244 | "\n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | "
PclassSexAgeSibSpParchFareEmbarkedFamilySizeIsAlone
03022.0107.2500020
11138.01071.2833120
23126.0007.9250011
31135.01053.1000020
43035.0008.0500011
\n", 322 | "
" 323 | ], 324 | "text/plain": [ 325 | " Pclass Sex Age SibSp Parch Fare Embarked FamilySize IsAlone\n", 326 | "0 3 0 22.0 1 0 7.2500 0 2 0\n", 327 | "1 1 1 38.0 1 0 71.2833 1 2 0\n", 328 | "2 3 1 26.0 0 0 7.9250 0 1 1\n", 329 | "3 1 1 35.0 1 0 53.1000 0 2 0\n", 330 | "4 3 0 35.0 0 0 8.0500 0 1 1" 331 | ] 332 | }, 333 | "execution_count": 4, 334 | "metadata": {}, 335 | "output_type": "execute_result" 336 | } 337 | ], 338 | "source": [ 339 | "X_train.head()" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "# sklearn" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": 5, 352 | "metadata": {}, 353 | "outputs": [], 354 | "source": [ 355 | "from sklearn.linear_model import LogisticRegression\n", 356 | "\n", 357 | "\n", 358 | "clf = LogisticRegression(penalty='l2', solver='sag', random_state=0)" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": 6, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "from sklearn.ensemble import RandomForestClassifier\n", 368 | "\n", 369 | "\n", 370 | "clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": 7, 376 | "metadata": {}, 377 | "outputs": [], 378 | "source": [ 379 | "clf.fit(X_train, y_train)\n", 380 | "y_pred = clf.predict(X_test)" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 8, 386 | "metadata": {}, 387 | "outputs": [ 388 | { 389 | "data": { 390 | "text/plain": [ 391 | "array([0., 1., 0., 0., 1., 0., 1., 0., 1., 0.])" 392 | ] 393 | }, 394 | "execution_count": 8, 395 | "metadata": {}, 396 | "output_type": "execute_result" 397 | } 398 | ], 399 | "source": [ 400 | "y_pred[:10]" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 9, 406 | "metadata": {}, 407 | "outputs": [], 408 | "source": [ 409 | "sub = pd.read_csv('../input/titanic/gender_submission.csv')" 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": 10, 415 | "metadata": {}, 416 | "outputs": [ 417 | { 418 | "data": { 419 | "text/html": [ 420 | "
\n", 421 | "\n", 434 | "\n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | " \n", 465 | " \n", 466 | " \n", 467 | " \n", 468 | " \n", 469 | "
PassengerIdSurvived
08920
18931
28940
38950
48961
\n", 470 | "
" 471 | ], 472 | "text/plain": [ 473 | " PassengerId Survived\n", 474 | "0 892 0\n", 475 | "1 893 1\n", 476 | "2 894 0\n", 477 | "3 895 0\n", 478 | "4 896 1" 479 | ] 480 | }, 481 | "execution_count": 10, 482 | "metadata": {}, 483 | "output_type": "execute_result" 484 | } 485 | ], 486 | "source": [ 487 | "sub['Survived'] = list(map(int, y_pred))\n", 488 | "sub.to_csv('submission_randomforest.csv', index=False)\n", 489 | "sub.head()" 490 | ] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "metadata": {}, 495 | "source": [ 496 | "# LightGBM" 497 | ] 498 | }, 499 | { 500 | "cell_type": "code", 501 | "execution_count": 11, 502 | "metadata": {}, 503 | "outputs": [], 504 | "source": [ 505 | "from sklearn.model_selection import train_test_split\n", 506 | "\n", 507 | "\n", 508 | "X_train, X_valid, y_train, y_valid = \\\n", 509 | " train_test_split(X_train, y_train, test_size=0.3,\n", 510 | " random_state=0, stratify=y_train)" 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": 12, 516 | "metadata": {}, 517 | "outputs": [], 518 | "source": [ 519 | "categorical_features = ['Embarked', 'Pclass', 'Sex']" 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": 13, 525 | "metadata": {}, 526 | "outputs": [ 527 | { 528 | "name": "stdout", 529 | "output_type": "stream", 530 | "text": [ 531 | "Training until validation scores don't improve for 10 rounds\n", 532 | "[10]\ttraining's binary_logloss: 0.425241\tvalid_1's binary_logloss: 0.478975\n", 533 | "[20]\ttraining's binary_logloss: 0.344972\tvalid_1's binary_logloss: 0.444039\n", 534 | "[30]\ttraining's binary_logloss: 0.301357\tvalid_1's binary_logloss: 0.436304\n", 535 | "[40]\ttraining's binary_logloss: 0.265535\tvalid_1's binary_logloss: 0.438139\n", 536 | "Early stopping, best iteration is:\n", 537 | "[38]\ttraining's binary_logloss: 0.271328\tvalid_1's binary_logloss: 0.435633\n" 538 | ] 539 | }, 540 | { 541 | "name": "stderr", 542 | "output_type": "stream", 543 | "text": [ 544 | "/opt/conda/lib/python3.6/site-packages/lightgbm/basic.py:1243: UserWarning: Using categorical_feature in Dataset.\n", 545 | " warnings.warn('Using categorical_feature in Dataset.')\n" 546 | ] 547 | } 548 | ], 549 | "source": [ 550 | "import lightgbm as lgb\n", 551 | "\n", 552 | "\n", 553 | "lgb_train = lgb.Dataset(X_train, y_train,\n", 554 | " categorical_feature=categorical_features)\n", 555 | "lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train,\n", 556 | " categorical_feature=categorical_features)\n", 557 | "\n", 558 | "params = {\n", 559 | " 'objective': 'binary'\n", 560 | "}\n", 561 | "\n", 562 | "model = lgb.train(params, lgb_train,\n", 563 | " valid_sets=[lgb_train, lgb_eval],\n", 564 | " verbose_eval=10,\n", 565 | " num_boost_round=1000,\n", 566 | " early_stopping_rounds=10)\n", 567 | "\n", 568 | "y_pred = model.predict(X_test, num_iteration=model.best_iteration)" 569 | ] 570 | }, 571 | { 572 | "cell_type": "code", 573 | "execution_count": 14, 574 | "metadata": {}, 575 | "outputs": [ 576 | { 577 | "data": { 578 | "text/plain": [ 579 | "array([0.0320592 , 0.34308916, 0.09903007, 0.05723199, 0.39919906,\n", 580 | " 0.22299318, 0.55036246, 0.0908458 , 0.78109016, 0.01881392])" 581 | ] 582 | }, 583 | "execution_count": 14, 584 | "metadata": {}, 585 | "output_type": "execute_result" 586 | } 587 | ], 588 | "source": [ 589 | "y_pred[:10]" 590 | ] 591 | }, 592 | { 593 | "cell_type": "code", 594 | "execution_count": 15, 595 | "metadata": {}, 596 | "outputs": [ 597 | { 598 | "data": { 599 | "text/plain": [ 600 | "array([0, 0, 0, 0, 0, 0, 1, 0, 1, 0])" 601 | ] 602 | }, 603 | "execution_count": 15, 604 | "metadata": {}, 605 | "output_type": "execute_result" 606 | } 607 | ], 608 | "source": [ 609 | "y_pred = (y_pred > 0.5).astype(int)\n", 610 | "y_pred[:10]" 611 | ] 612 | }, 613 | { 614 | "cell_type": "code", 615 | "execution_count": 16, 616 | "metadata": { 617 | "scrolled": true 618 | }, 619 | "outputs": [ 620 | { 621 | "data": { 622 | "text/html": [ 623 | "
\n", 624 | "\n", 637 | "\n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | "
PassengerIdSurvived
08920
18930
28940
38950
48960
\n", 673 | "
" 674 | ], 675 | "text/plain": [ 676 | " PassengerId Survived\n", 677 | "0 892 0\n", 678 | "1 893 0\n", 679 | "2 894 0\n", 680 | "3 895 0\n", 681 | "4 896 0" 682 | ] 683 | }, 684 | "execution_count": 16, 685 | "metadata": {}, 686 | "output_type": "execute_result" 687 | } 688 | ], 689 | "source": [ 690 | "sub['Survived'] = y_pred\n", 691 | "sub.to_csv('submission_lightgbm.csv', index=False)\n", 692 | "\n", 693 | "sub.head()" 694 | ] 695 | }, 696 | { 697 | "cell_type": "code", 698 | "execution_count": null, 699 | "metadata": {}, 700 | "outputs": [], 701 | "source": [] 702 | } 703 | ], 704 | "metadata": { 705 | "file_extension": ".py", 706 | "kernelspec": { 707 | "display_name": "Python 3", 708 | "language": "python", 709 | "name": "python3" 710 | }, 711 | "language_info": { 712 | "codemirror_mode": { 713 | "name": "ipython", 714 | "version": 3 715 | }, 716 | "file_extension": ".py", 717 | "mimetype": "text/x-python", 718 | "name": "python", 719 | "nbconvert_exporter": "python", 720 | "pygments_lexer": "ipython3", 721 | "version": "3.6.6" 722 | }, 723 | "mimetype": "text/x-python", 724 | "name": "python", 725 | "npconvert_exporter": "python", 726 | "pygments_lexer": "ipython3", 727 | "version": 3 728 | }, 729 | "nbformat": 4, 730 | "nbformat_minor": 2 731 | } 732 | -------------------------------------------------------------------------------- /ch03/ch03_02.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This notebook is a sample code with Japanese comments.\n", 8 | "\n", 9 | "Ref: [TRAINING A CLASSIFIER](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)\n", 10 | "\n", 11 | "# 3.2 Titanicの先へ行く②! 画像データに触れてみよう" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": { 18 | "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", 19 | "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" 20 | }, 21 | "outputs": [], 22 | "source": [ 23 | "import torch\n", 24 | "import torchvision\n", 25 | "import torchvision.transforms as transforms" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 2, 31 | "metadata": { 32 | "_cell_guid": "79c7e3d0-c299-4dcb-8224-4455121ee9b0", 33 | "_uuid": "d629ff2d2480ee46fbb7e2d37f6b5fab8052498a" 34 | }, 35 | "outputs": [ 36 | { 37 | "name": "stderr", 38 | "output_type": "stream", 39 | "text": [ 40 | "\r", 41 | "0it [00:00, ?it/s]" 42 | ] 43 | }, 44 | { 45 | "name": "stdout", 46 | "output_type": "stream", 47 | "text": [ 48 | "Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz\n" 49 | ] 50 | }, 51 | { 52 | "name": "stderr", 53 | "output_type": "stream", 54 | "text": [ 55 | "170500096it [01:30, 4515126.28it/s] " 56 | ] 57 | }, 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "Files already downloaded and verified\n" 63 | ] 64 | } 65 | ], 66 | "source": [ 67 | "transform = transforms.Compose(\n", 68 | " [transforms.ToTensor(),\n", 69 | " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])\n", 70 | "\n", 71 | "trainset = torchvision.datasets.CIFAR10(root='./data', train=True,\n", 72 | " download=True, transform=transform)\n", 73 | "trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,\n", 74 | " shuffle=True, num_workers=2)\n", 75 | "\n", 76 | "testset = torchvision.datasets.CIFAR10(root='./data', train=False,\n", 77 | " download=True, transform=transform)\n", 78 | "testloader = torch.utils.data.DataLoader(testset, batch_size=4,\n", 79 | " shuffle=False, num_workers=2)\n", 80 | "\n", 81 | "classes = ('plane', 'car', 'bird', 'cat',\n", 82 | " 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 3, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "data": { 92 | "image/png": "\n", 93 | "text/plain": [ 94 | "
" 95 | ] 96 | }, 97 | "metadata": { 98 | "needs_background": "light" 99 | }, 100 | "output_type": "display_data" 101 | }, 102 | { 103 | "name": "stdout", 104 | "output_type": "stream", 105 | "text": [ 106 | "plane cat deer deer\n" 107 | ] 108 | } 109 | ], 110 | "source": [ 111 | "import matplotlib.pyplot as plt\n", 112 | "import numpy as np\n", 113 | "\n", 114 | "\n", 115 | "def imshow(img):\n", 116 | " img = img / 2 + 0.5\n", 117 | " npimg = img.numpy()\n", 118 | " plt.imshow(np.transpose(npimg, (1, 2, 0)))\n", 119 | " plt.show()\n", 120 | "\n", 121 | "\n", 122 | "dataiter = iter(trainloader)\n", 123 | "images, labels = dataiter.next()\n", 124 | "imshow(torchvision.utils.make_grid(images))\n", 125 | "print(' '.join('%5s' % classes[labels[j]] for j in range(4)))" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 4, 131 | "metadata": {}, 132 | "outputs": [ 133 | { 134 | "data": { 135 | "text/plain": [ 136 | "torch.Size([4, 3, 32, 32])" 137 | ] 138 | }, 139 | "execution_count": 4, 140 | "metadata": {}, 141 | "output_type": "execute_result" 142 | } 143 | ], 144 | "source": [ 145 | "images.shape" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 5, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/plain": [ 156 | "tensor([[[-0.9451, -0.9529, -0.9529, ..., -0.9765, -0.9765, -0.9765],\n", 157 | " [-0.9686, -0.9686, -0.9765, ..., -0.9922, -0.9922, -0.9922],\n", 158 | " [-0.9765, -0.9765, -0.9843, ..., -0.9922, -0.9922, -0.9922],\n", 159 | " ...,\n", 160 | " [ 0.6863, 0.6863, 0.7020, ..., 0.8196, 0.8118, 0.7961],\n", 161 | " [ 0.6863, 0.6941, 0.7098, ..., 0.8196, 0.8039, 0.7961],\n", 162 | " [ 0.6941, 0.7020, 0.7176, ..., 0.8118, 0.8039, 0.7961]],\n", 163 | "\n", 164 | " [[-0.3490, -0.3490, -0.3412, ..., -0.2941, -0.2941, -0.2941],\n", 165 | " [-0.3333, -0.3255, -0.3255, ..., -0.2784, -0.2784, -0.2784],\n", 166 | " [-0.3020, -0.3020, -0.3020, ..., -0.2549, -0.2549, -0.2549],\n", 167 | " ...,\n", 168 | " [ 0.6392, 0.6392, 0.6549, ..., 0.8039, 0.7882, 0.7804],\n", 169 | " [ 0.6392, 0.6471, 0.6627, ..., 0.8118, 0.7961, 0.7804],\n", 170 | " [ 0.6471, 0.6549, 0.6706, ..., 0.8118, 0.7961, 0.7804]],\n", 171 | "\n", 172 | " [[ 0.0745, 0.0824, 0.0980, ..., 0.1765, 0.1765, 0.1608],\n", 173 | " [ 0.0745, 0.0902, 0.1059, ..., 0.1843, 0.1765, 0.1765],\n", 174 | " [ 0.1216, 0.1216, 0.1294, ..., 0.2078, 0.2078, 0.2078],\n", 175 | " ...,\n", 176 | " [ 0.6392, 0.6392, 0.6549, ..., 0.8118, 0.7961, 0.7882],\n", 177 | " [ 0.6392, 0.6471, 0.6627, ..., 0.8118, 0.7961, 0.7882],\n", 178 | " [ 0.6471, 0.6549, 0.6706, ..., 0.8118, 0.7961, 0.7882]]])" 179 | ] 180 | }, 181 | "execution_count": 5, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | }, 185 | { 186 | "name": "stderr", 187 | "output_type": "stream", 188 | "text": [ 189 | "\r", 190 | "170500096it [01:50, 4515126.28it/s]" 191 | ] 192 | } 193 | ], 194 | "source": [ 195 | "images[0]" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [] 204 | } 205 | ], 206 | "metadata": { 207 | "kernelspec": { 208 | "display_name": "Python 3", 209 | "language": "python", 210 | "name": "python3" 211 | }, 212 | "language_info": { 213 | "codemirror_mode": { 214 | "name": "ipython", 215 | "version": 3 216 | }, 217 | "file_extension": ".py", 218 | "mimetype": "text/x-python", 219 | "name": "python", 220 | "nbconvert_exporter": "python", 221 | "pygments_lexer": "ipython3", 222 | "version": "3.6.6" 223 | } 224 | }, 225 | "nbformat": 4, 226 | "nbformat_minor": 1 227 | } 228 | -------------------------------------------------------------------------------- /ch03/ch03_01.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This notebook is a sample code with Japanese comments.\n", 8 | "\n", 9 | "Ref: [Introduction to Manual Feature Engineering](https://www.kaggle.com/willkoehrsen/introduction-to-manual-feature-engineering)\n", 10 | "\n", 11 | "# 3.1 Titanicの先へ行く①! 複数テーブルを結合してみよう" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": { 18 | "_cell_guid": "b1076dfc-b9ad-4769-8c92-a6c4dae69d19", 19 | "_uuid": "8f2839f25d086af736a60e9eeb907d3b93b6e0e5" 20 | }, 21 | "outputs": [ 22 | { 23 | "data": { 24 | "text/html": [ 25 | "
\n", 26 | "\n", 39 | "\n", 40 | " \n", 41 | " \n", 42 | " \n", 43 | " \n", 44 | " \n", 45 | " \n", 46 | " \n", 47 | " \n", 48 | " \n", 49 | " \n", 50 | " \n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | "
SK_ID_CURRTARGETNAME_CONTRACT_TYPECODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALAMT_CREDITAMT_ANNUITY...FLAG_DOCUMENT_18FLAG_DOCUMENT_19FLAG_DOCUMENT_20FLAG_DOCUMENT_21AMT_REQ_CREDIT_BUREAU_HOURAMT_REQ_CREDIT_BUREAU_DAYAMT_REQ_CREDIT_BUREAU_WEEKAMT_REQ_CREDIT_BUREAU_MONAMT_REQ_CREDIT_BUREAU_QRTAMT_REQ_CREDIT_BUREAU_YEAR
01000021Cash loansMNY0202500.0406597.524700.5...00000.00.00.00.00.01.0
11000030Cash loansFNN0270000.01293502.535698.5...00000.00.00.00.00.00.0
21000040Revolving loansMYY067500.0135000.06750.0...00000.00.00.00.00.00.0
31000060Cash loansFNY0135000.0312682.529686.5...0000NaNNaNNaNNaNNaNNaN
41000070Cash loansMNY0121500.0513000.021865.5...00000.00.00.00.00.00.0
\n", 189 | "

5 rows × 122 columns

\n", 190 | "
" 191 | ], 192 | "text/plain": [ 193 | " SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR \\\n", 194 | "0 100002 1 Cash loans M N \n", 195 | "1 100003 0 Cash loans F N \n", 196 | "2 100004 0 Revolving loans M Y \n", 197 | "3 100006 0 Cash loans F N \n", 198 | "4 100007 0 Cash loans M N \n", 199 | "\n", 200 | " FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY \\\n", 201 | "0 Y 0 202500.0 406597.5 24700.5 \n", 202 | "1 N 0 270000.0 1293502.5 35698.5 \n", 203 | "2 Y 0 67500.0 135000.0 6750.0 \n", 204 | "3 Y 0 135000.0 312682.5 29686.5 \n", 205 | "4 Y 0 121500.0 513000.0 21865.5 \n", 206 | "\n", 207 | " ... FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 \\\n", 208 | "0 ... 0 0 0 0 \n", 209 | "1 ... 0 0 0 0 \n", 210 | "2 ... 0 0 0 0 \n", 211 | "3 ... 0 0 0 0 \n", 212 | "4 ... 0 0 0 0 \n", 213 | "\n", 214 | " AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY \\\n", 215 | "0 0.0 0.0 \n", 216 | "1 0.0 0.0 \n", 217 | "2 0.0 0.0 \n", 218 | "3 NaN NaN \n", 219 | "4 0.0 0.0 \n", 220 | "\n", 221 | " AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON \\\n", 222 | "0 0.0 0.0 \n", 223 | "1 0.0 0.0 \n", 224 | "2 0.0 0.0 \n", 225 | "3 NaN NaN \n", 226 | "4 0.0 0.0 \n", 227 | "\n", 228 | " AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR \n", 229 | "0 0.0 1.0 \n", 230 | "1 0.0 0.0 \n", 231 | "2 0.0 0.0 \n", 232 | "3 NaN NaN \n", 233 | "4 0.0 0.0 \n", 234 | "\n", 235 | "[5 rows x 122 columns]" 236 | ] 237 | }, 238 | "execution_count": 1, 239 | "metadata": {}, 240 | "output_type": "execute_result" 241 | } 242 | ], 243 | "source": [ 244 | "import pandas as pd\n", 245 | "\n", 246 | "\n", 247 | "application_train = \\\n", 248 | " pd.read_csv('../input/home-credit-default-risk/application_train.csv')\n", 249 | "application_train.head()" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 2, 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "data": { 259 | "text/html": [ 260 | "
\n", 261 | "\n", 274 | "\n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | " \n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | "
SK_ID_CURRSK_ID_BUREAUCREDIT_ACTIVECREDIT_CURRENCYDAYS_CREDITCREDIT_DAY_OVERDUEDAYS_CREDIT_ENDDATEDAYS_ENDDATE_FACTAMT_CREDIT_MAX_OVERDUECNT_CREDIT_PROLONGAMT_CREDIT_SUMAMT_CREDIT_SUM_DEBTAMT_CREDIT_SUM_LIMITAMT_CREDIT_SUM_OVERDUECREDIT_TYPEDAYS_CREDIT_UPDATEAMT_ANNUITY
02153545714462Closedcurrency 1-4970-153.0-153.0NaN091323.00.0NaN0.0Consumer credit-131NaN
12153545714463Activecurrency 1-20801075.0NaNNaN0225000.0171342.0NaN0.0Credit card-20NaN
22153545714464Activecurrency 1-2030528.0NaNNaN0464323.5NaNNaN0.0Consumer credit-16NaN
32153545714465Activecurrency 1-2030NaNNaNNaN090000.0NaNNaN0.0Credit card-16NaN
42153545714466Activecurrency 1-62901197.0NaN77674.502700000.0NaNNaN0.0Consumer credit-21NaN
\n", 400 | "
" 401 | ], 402 | "text/plain": [ 403 | " SK_ID_CURR SK_ID_BUREAU CREDIT_ACTIVE CREDIT_CURRENCY DAYS_CREDIT \\\n", 404 | "0 215354 5714462 Closed currency 1 -497 \n", 405 | "1 215354 5714463 Active currency 1 -208 \n", 406 | "2 215354 5714464 Active currency 1 -203 \n", 407 | "3 215354 5714465 Active currency 1 -203 \n", 408 | "4 215354 5714466 Active currency 1 -629 \n", 409 | "\n", 410 | " CREDIT_DAY_OVERDUE DAYS_CREDIT_ENDDATE DAYS_ENDDATE_FACT \\\n", 411 | "0 0 -153.0 -153.0 \n", 412 | "1 0 1075.0 NaN \n", 413 | "2 0 528.0 NaN \n", 414 | "3 0 NaN NaN \n", 415 | "4 0 1197.0 NaN \n", 416 | "\n", 417 | " AMT_CREDIT_MAX_OVERDUE CNT_CREDIT_PROLONG AMT_CREDIT_SUM \\\n", 418 | "0 NaN 0 91323.0 \n", 419 | "1 NaN 0 225000.0 \n", 420 | "2 NaN 0 464323.5 \n", 421 | "3 NaN 0 90000.0 \n", 422 | "4 77674.5 0 2700000.0 \n", 423 | "\n", 424 | " AMT_CREDIT_SUM_DEBT AMT_CREDIT_SUM_LIMIT AMT_CREDIT_SUM_OVERDUE \\\n", 425 | "0 0.0 NaN 0.0 \n", 426 | "1 171342.0 NaN 0.0 \n", 427 | "2 NaN NaN 0.0 \n", 428 | "3 NaN NaN 0.0 \n", 429 | "4 NaN NaN 0.0 \n", 430 | "\n", 431 | " CREDIT_TYPE DAYS_CREDIT_UPDATE AMT_ANNUITY \n", 432 | "0 Consumer credit -131 NaN \n", 433 | "1 Credit card -20 NaN \n", 434 | "2 Consumer credit -16 NaN \n", 435 | "3 Credit card -16 NaN \n", 436 | "4 Consumer credit -21 NaN " 437 | ] 438 | }, 439 | "execution_count": 2, 440 | "metadata": {}, 441 | "output_type": "execute_result" 442 | } 443 | ], 444 | "source": [ 445 | "bureau = pd.read_csv('../input/home-credit-default-risk/bureau.csv')\n", 446 | "bureau.head()" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": 3, 452 | "metadata": {}, 453 | "outputs": [ 454 | { 455 | "data": { 456 | "text/html": [ 457 | "
\n", 458 | "\n", 471 | "\n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | "
SK_ID_CURRprevious_loan_counts
01000017
11000028
21000034
31000042
41000053
\n", 507 | "
" 508 | ], 509 | "text/plain": [ 510 | " SK_ID_CURR previous_loan_counts\n", 511 | "0 100001 7\n", 512 | "1 100002 8\n", 513 | "2 100003 4\n", 514 | "3 100004 2\n", 515 | "4 100005 3" 516 | ] 517 | }, 518 | "execution_count": 3, 519 | "metadata": {}, 520 | "output_type": "execute_result" 521 | } 522 | ], 523 | "source": [ 524 | "previous_loan_counts = \\\n", 525 | " bureau.groupby('SK_ID_CURR', as_index=False)['SK_ID_BUREAU'].count().rename(\n", 526 | " columns={'SK_ID_BUREAU': 'previous_loan_counts'})\n", 527 | "previous_loan_counts.head()" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 4, 533 | "metadata": { 534 | "scrolled": true 535 | }, 536 | "outputs": [ 537 | { 538 | "data": { 539 | "text/html": [ 540 | "
\n", 541 | "\n", 554 | "\n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | " \n", 632 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | " \n", 637 | " \n", 638 | " \n", 639 | " \n", 640 | " \n", 641 | " \n", 642 | " \n", 643 | " \n", 644 | " \n", 645 | " \n", 646 | " \n", 647 | " \n", 648 | " \n", 649 | " \n", 650 | " \n", 651 | " \n", 652 | " \n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 660 | " \n", 661 | " \n", 662 | " \n", 663 | " \n", 664 | " \n", 665 | " \n", 666 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 674 | " \n", 675 | " \n", 676 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | " \n", 681 | " \n", 682 | " \n", 683 | " \n", 684 | " \n", 685 | " \n", 686 | " \n", 687 | " \n", 688 | " \n", 689 | " \n", 690 | " \n", 691 | " \n", 692 | " \n", 693 | " \n", 694 | " \n", 695 | " \n", 696 | " \n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | "
SK_ID_CURRTARGETNAME_CONTRACT_TYPECODE_GENDERFLAG_OWN_CARFLAG_OWN_REALTYCNT_CHILDRENAMT_INCOME_TOTALAMT_CREDITAMT_ANNUITY...FLAG_DOCUMENT_19FLAG_DOCUMENT_20FLAG_DOCUMENT_21AMT_REQ_CREDIT_BUREAU_HOURAMT_REQ_CREDIT_BUREAU_DAYAMT_REQ_CREDIT_BUREAU_WEEKAMT_REQ_CREDIT_BUREAU_MONAMT_REQ_CREDIT_BUREAU_QRTAMT_REQ_CREDIT_BUREAU_YEARprevious_loan_counts
01000021Cash loansMNY0202500.0406597.524700.5...0000.00.00.00.00.01.08.0
11000030Cash loansFNN0270000.01293502.535698.5...0000.00.00.00.00.00.04.0
21000040Revolving loansMYY067500.0135000.06750.0...0000.00.00.00.00.00.02.0
31000060Cash loansFNY0135000.0312682.529686.5...000NaNNaNNaNNaNNaNNaN0.0
41000070Cash loansMNY0121500.0513000.021865.5...0000.00.00.00.00.00.01.0
\n", 704 | "

5 rows × 123 columns

\n", 705 | "
" 706 | ], 707 | "text/plain": [ 708 | " SK_ID_CURR TARGET NAME_CONTRACT_TYPE CODE_GENDER FLAG_OWN_CAR \\\n", 709 | "0 100002 1 Cash loans M N \n", 710 | "1 100003 0 Cash loans F N \n", 711 | "2 100004 0 Revolving loans M Y \n", 712 | "3 100006 0 Cash loans F N \n", 713 | "4 100007 0 Cash loans M N \n", 714 | "\n", 715 | " FLAG_OWN_REALTY CNT_CHILDREN AMT_INCOME_TOTAL AMT_CREDIT AMT_ANNUITY \\\n", 716 | "0 Y 0 202500.0 406597.5 24700.5 \n", 717 | "1 N 0 270000.0 1293502.5 35698.5 \n", 718 | "2 Y 0 67500.0 135000.0 6750.0 \n", 719 | "3 Y 0 135000.0 312682.5 29686.5 \n", 720 | "4 Y 0 121500.0 513000.0 21865.5 \n", 721 | "\n", 722 | " ... FLAG_DOCUMENT_19 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 \\\n", 723 | "0 ... 0 0 0 \n", 724 | "1 ... 0 0 0 \n", 725 | "2 ... 0 0 0 \n", 726 | "3 ... 0 0 0 \n", 727 | "4 ... 0 0 0 \n", 728 | "\n", 729 | " AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_DAY \\\n", 730 | "0 0.0 0.0 \n", 731 | "1 0.0 0.0 \n", 732 | "2 0.0 0.0 \n", 733 | "3 NaN NaN \n", 734 | "4 0.0 0.0 \n", 735 | "\n", 736 | " AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_MON \\\n", 737 | "0 0.0 0.0 \n", 738 | "1 0.0 0.0 \n", 739 | "2 0.0 0.0 \n", 740 | "3 NaN NaN \n", 741 | "4 0.0 0.0 \n", 742 | "\n", 743 | " AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_YEAR previous_loan_counts \n", 744 | "0 0.0 1.0 8.0 \n", 745 | "1 0.0 0.0 4.0 \n", 746 | "2 0.0 0.0 2.0 \n", 747 | "3 NaN NaN 0.0 \n", 748 | "4 0.0 0.0 1.0 \n", 749 | "\n", 750 | "[5 rows x 123 columns]" 751 | ] 752 | }, 753 | "execution_count": 4, 754 | "metadata": {}, 755 | "output_type": "execute_result" 756 | } 757 | ], 758 | "source": [ 759 | "application_train = \\\n", 760 | " pd.merge(application_train, previous_loan_counts, on='SK_ID_CURR', how='left')\n", 761 | "\n", 762 | "application_train['previous_loan_counts'].fillna(0, inplace=True)\n", 763 | "application_train.head()" 764 | ] 765 | }, 766 | { 767 | "cell_type": "code", 768 | "execution_count": null, 769 | "metadata": {}, 770 | "outputs": [], 771 | "source": [] 772 | } 773 | ], 774 | "metadata": { 775 | "kernelspec": { 776 | "display_name": "Python 3", 777 | "language": "python", 778 | "name": "python3" 779 | }, 780 | "language_info": { 781 | "codemirror_mode": { 782 | "name": "ipython", 783 | "version": 3 784 | }, 785 | "file_extension": ".py", 786 | "mimetype": "text/x-python", 787 | "name": "python", 788 | "nbconvert_exporter": "python", 789 | "pygments_lexer": "ipython3", 790 | "version": "3.6.6" 791 | } 792 | }, 793 | "nbformat": 4, 794 | "nbformat_minor": 1 795 | } 796 | -------------------------------------------------------------------------------- /ch02/ch02_01.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "_cell_guid": "e12020f7-4f94-4ecc-9007-9b7a6e7458a6", 7 | "_uuid": "1fecb0980d8d422ec0f005c4bfd6225385c2c60f" 8 | }, 9 | "source": [ 10 | "This notebook is a sample code with Japanese comments.\n", 11 | "\n", 12 | "# 2.1 まずはsubmit! 順位表に載ってみよう" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 1, 18 | "metadata": { 19 | "_uuid": "6413a13d8a260043bda237e211bd962582eb7ff2" 20 | }, 21 | "outputs": [], 22 | "source": [ 23 | "import numpy as np\n", 24 | "import pandas as pd" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": { 30 | "_cell_guid": "d49e43e8-0dc0-41b7-afd0-60acc96e9f07", 31 | "_uuid": "4ecd55c5bd48390d026eeb6ae8de0a7ace0d4ada" 32 | }, 33 | "source": [ 34 | "## データの読み込み" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 2, 40 | "metadata": {}, 41 | "outputs": [ 42 | { 43 | "name": "stdout", 44 | "output_type": "stream", 45 | "text": [ 46 | "README.md gender_submission.csv test.csv train.csv\r\n" 47 | ] 48 | } 49 | ], 50 | "source": [ 51 | "!ls ../input/titanic" 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 3, 57 | "metadata": { 58 | "_cell_guid": "9c963eb3-04ac-422c-bc0c-4373bda6880e", 59 | "_uuid": "95f406c4d2f1dab6744ea248b80e3a535c652450" 60 | }, 61 | "outputs": [], 62 | "source": [ 63 | "train = pd.read_csv('../input/titanic/train.csv')\n", 64 | "test = pd.read_csv('../input/titanic/test.csv')\n", 65 | "gender_submission = pd.read_csv('../input/titanic/gender_submission.csv')" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 4, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "data": { 75 | "text/html": [ 76 | "
\n", 77 | "\n", 90 | "\n", 91 | " \n", 92 | " \n", 93 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | "
PassengerIdSurvived
08920
18931
28940
38950
48961
\n", 126 | "
" 127 | ], 128 | "text/plain": [ 129 | " PassengerId Survived\n", 130 | "0 892 0\n", 131 | "1 893 1\n", 132 | "2 894 0\n", 133 | "3 895 0\n", 134 | "4 896 1" 135 | ] 136 | }, 137 | "execution_count": 4, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "gender_submission.head()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 5, 149 | "metadata": {}, 150 | "outputs": [ 151 | { 152 | "data": { 153 | "text/html": [ 154 | "
\n", 155 | "\n", 168 | "\n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", 264 | "
" 265 | ], 266 | "text/plain": [ 267 | " PassengerId Survived Pclass \\\n", 268 | "0 1 0 3 \n", 269 | "1 2 1 1 \n", 270 | "2 3 1 3 \n", 271 | "3 4 1 1 \n", 272 | "4 5 0 3 \n", 273 | "\n", 274 | " Name Sex Age SibSp \\\n", 275 | "0 Braund, Mr. Owen Harris male 22.0 1 \n", 276 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", 277 | "2 Heikkinen, Miss. Laina female 26.0 0 \n", 278 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", 279 | "4 Allen, Mr. William Henry male 35.0 0 \n", 280 | "\n", 281 | " Parch Ticket Fare Cabin Embarked \n", 282 | "0 0 A/5 21171 7.2500 NaN S \n", 283 | "1 0 PC 17599 71.2833 C85 C \n", 284 | "2 0 STON/O2. 3101282 7.9250 NaN S \n", 285 | "3 0 113803 53.1000 C123 S \n", 286 | "4 0 373450 8.0500 NaN S " 287 | ] 288 | }, 289 | "execution_count": 5, 290 | "metadata": {}, 291 | "output_type": "execute_result" 292 | } 293 | ], 294 | "source": [ 295 | "train.head()" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": 6, 301 | "metadata": {}, 302 | "outputs": [ 303 | { 304 | "data": { 305 | "text/html": [ 306 | "
\n", 307 | "\n", 320 | "\n", 321 | " \n", 322 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | "
PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
08923Kelly, Mr. Jamesmale34.5003309117.8292NaNQ
18933Wilkes, Mrs. James (Ellen Needs)female47.0103632727.0000NaNS
28942Myles, Mr. Thomas Francismale62.0002402769.6875NaNQ
38953Wirz, Mr. Albertmale27.0003151548.6625NaNS
48963Hirvonen, Mrs. Alexander (Helga E Lindqvist)female22.011310129812.2875NaNS
\n", 410 | "
" 411 | ], 412 | "text/plain": [ 413 | " PassengerId Pclass Name Sex \\\n", 414 | "0 892 3 Kelly, Mr. James male \n", 415 | "1 893 3 Wilkes, Mrs. James (Ellen Needs) female \n", 416 | "2 894 2 Myles, Mr. Thomas Francis male \n", 417 | "3 895 3 Wirz, Mr. Albert male \n", 418 | "4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female \n", 419 | "\n", 420 | " Age SibSp Parch Ticket Fare Cabin Embarked \n", 421 | "0 34.5 0 0 330911 7.8292 NaN Q \n", 422 | "1 47.0 1 0 363272 7.0000 NaN S \n", 423 | "2 62.0 0 0 240276 9.6875 NaN Q \n", 424 | "3 27.0 0 0 315154 8.6625 NaN S \n", 425 | "4 22.0 1 1 3101298 12.2875 NaN S " 426 | ] 427 | }, 428 | "execution_count": 6, 429 | "metadata": {}, 430 | "output_type": "execute_result" 431 | } 432 | ], 433 | "source": [ 434 | "test.head()" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 7, 440 | "metadata": {}, 441 | "outputs": [], 442 | "source": [ 443 | "data = pd.concat([train, test], sort=False)" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": 8, 449 | "metadata": {}, 450 | "outputs": [ 451 | { 452 | "data": { 453 | "text/html": [ 454 | "
\n", 455 | "\n", 468 | "\n", 469 | " \n", 470 | " \n", 471 | " \n", 472 | " \n", 473 | " \n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
010.03Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
121.01Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
231.03Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
341.01Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
450.03Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", 564 | "
" 565 | ], 566 | "text/plain": [ 567 | " PassengerId Survived Pclass \\\n", 568 | "0 1 0.0 3 \n", 569 | "1 2 1.0 1 \n", 570 | "2 3 1.0 3 \n", 571 | "3 4 1.0 1 \n", 572 | "4 5 0.0 3 \n", 573 | "\n", 574 | " Name Sex Age SibSp \\\n", 575 | "0 Braund, Mr. Owen Harris male 22.0 1 \n", 576 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", 577 | "2 Heikkinen, Miss. Laina female 26.0 0 \n", 578 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", 579 | "4 Allen, Mr. William Henry male 35.0 0 \n", 580 | "\n", 581 | " Parch Ticket Fare Cabin Embarked \n", 582 | "0 0 A/5 21171 7.2500 NaN S \n", 583 | "1 0 PC 17599 71.2833 C85 C \n", 584 | "2 0 STON/O2. 3101282 7.9250 NaN S \n", 585 | "3 0 113803 53.1000 C123 S \n", 586 | "4 0 373450 8.0500 NaN S " 587 | ] 588 | }, 589 | "execution_count": 8, 590 | "metadata": {}, 591 | "output_type": "execute_result" 592 | } 593 | ], 594 | "source": [ 595 | "data.head()" 596 | ] 597 | }, 598 | { 599 | "cell_type": "code", 600 | "execution_count": 9, 601 | "metadata": {}, 602 | "outputs": [ 603 | { 604 | "name": "stdout", 605 | "output_type": "stream", 606 | "text": [ 607 | "891 418 1309\n" 608 | ] 609 | } 610 | ], 611 | "source": [ 612 | "print(len(train), len(test), len(data))" 613 | ] 614 | }, 615 | { 616 | "cell_type": "code", 617 | "execution_count": 10, 618 | "metadata": {}, 619 | "outputs": [ 620 | { 621 | "data": { 622 | "text/plain": [ 623 | "PassengerId 0\n", 624 | "Survived 418\n", 625 | "Pclass 0\n", 626 | "Name 0\n", 627 | "Sex 0\n", 628 | "Age 263\n", 629 | "SibSp 0\n", 630 | "Parch 0\n", 631 | "Ticket 0\n", 632 | "Fare 1\n", 633 | "Cabin 1014\n", 634 | "Embarked 2\n", 635 | "dtype: int64" 636 | ] 637 | }, 638 | "execution_count": 10, 639 | "metadata": {}, 640 | "output_type": "execute_result" 641 | } 642 | ], 643 | "source": [ 644 | "data.isnull().sum()" 645 | ] 646 | }, 647 | { 648 | "cell_type": "markdown", 649 | "metadata": { 650 | "_cell_guid": "687a06ef-2686-4772-ac24-5e413adbda6d", 651 | "_uuid": "3846eff13d723fa6ff10117fc3ebc46f266b210f" 652 | }, 653 | "source": [ 654 | "## 特徴量エンジニアリング" 655 | ] 656 | }, 657 | { 658 | "cell_type": "markdown", 659 | "metadata": { 660 | "_cell_guid": "09253274-14c1-4ca9-a078-85229acba814", 661 | "_uuid": "234454857fff5bd61026c51cefd5eaee4e6a1879" 662 | }, 663 | "source": [ 664 | "### 1. Pclass" 665 | ] 666 | }, 667 | { 668 | "cell_type": "markdown", 669 | "metadata": { 670 | "_cell_guid": "b84a4c4b-9db2-4626-8a28-9817084eb554", 671 | "_uuid": "610bb44d64b400e7bafbf4d6a3295c7a43f1df23" 672 | }, 673 | "source": [ 674 | "### 2. Sex" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": 11, 680 | "metadata": { 681 | "_cell_guid": "27c06d9c-61e1-4ba8-9cfc-81a2dd27390a", 682 | "_uuid": "07b661b256360d39ec561f465735042f37eee257" 683 | }, 684 | "outputs": [], 685 | "source": [ 686 | "data['Sex'].replace(['male', 'female'], [0, 1], inplace=True)" 687 | ] 688 | }, 689 | { 690 | "cell_type": "markdown", 691 | "metadata": { 692 | "_cell_guid": "2ab02454-4dfa-4aa3-ae94-7029b86ef69e", 693 | "_uuid": "2cb8d46258c4b14ec678543f99f4f5789d60b22f" 694 | }, 695 | "source": [ 696 | "### 3. Embarked" 697 | ] 698 | }, 699 | { 700 | "cell_type": "code", 701 | "execution_count": 12, 702 | "metadata": { 703 | "_cell_guid": "1329072e-5fc0-4aea-bc7b-ec7c27aff260", 704 | "_uuid": "5268b97889ec90508f501697d9e8d497398e0c46" 705 | }, 706 | "outputs": [], 707 | "source": [ 708 | "data['Embarked'].fillna(('S'), inplace=True)\n", 709 | "data['Embarked'] = data['Embarked'].map({'S': 0, 'C': 1, 'Q': 2}).astype(int)" 710 | ] 711 | }, 712 | { 713 | "cell_type": "markdown", 714 | "metadata": { 715 | "_cell_guid": "8c33619e-a180-4c12-9f63-e7b3023206cd", 716 | "_uuid": "382a7882e2ae3144513f8ffea8478b4e24e8df0f" 717 | }, 718 | "source": [ 719 | "### 4. Fare" 720 | ] 721 | }, 722 | { 723 | "cell_type": "code", 724 | "execution_count": 13, 725 | "metadata": { 726 | "_cell_guid": "fd9b2edd-cf75-4ad8-a100-5ca06cb53f7b", 727 | "_uuid": "161a7a829ad6a45b7745655b1713888a6778818f" 728 | }, 729 | "outputs": [], 730 | "source": [ 731 | "data['Fare'].fillna(np.mean(data['Fare']), inplace=True)" 732 | ] 733 | }, 734 | { 735 | "cell_type": "markdown", 736 | "metadata": { 737 | "_cell_guid": "1ea2fef1-ec32-4688-9030-63bafed9692c", 738 | "_uuid": "0c6cb694c63862e8f1d805fdcd54769312aae246" 739 | }, 740 | "source": [ 741 | "### 5. Age" 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "execution_count": 14, 747 | "metadata": { 748 | "_cell_guid": "5717373d-91ce-4cfd-a579-ef7dab192771", 749 | "_uuid": "42f1ebda5705d5272ea350bfd00e66c2f946a66e" 750 | }, 751 | "outputs": [], 752 | "source": [ 753 | "age_avg = data['Age'].mean()\n", 754 | "age_std = data['Age'].std()\n", 755 | "\n", 756 | "data['Age'].fillna(np.random.randint(age_avg - age_std, age_avg + age_std), inplace=True)" 757 | ] 758 | }, 759 | { 760 | "cell_type": "code", 761 | "execution_count": 15, 762 | "metadata": { 763 | "_cell_guid": "d3f3527c-8758-41c2-bbe3-14b604b2d317", 764 | "_uuid": "f7341a6f089464180e94d5e09d1071e0350cff3d" 765 | }, 766 | "outputs": [], 767 | "source": [ 768 | "delete_columns = ['Name', 'PassengerId', 'SibSp', 'Parch', 'Ticket', 'Cabin']\n", 769 | "data.drop(delete_columns, axis=1, inplace=True)" 770 | ] 771 | }, 772 | { 773 | "cell_type": "code", 774 | "execution_count": 16, 775 | "metadata": {}, 776 | "outputs": [], 777 | "source": [ 778 | "train = data[:len(train)]\n", 779 | "test = data[len(train):]" 780 | ] 781 | }, 782 | { 783 | "cell_type": "code", 784 | "execution_count": 17, 785 | "metadata": { 786 | "_cell_guid": "03d91a2b-08da-4593-8c1e-840fb7bec469", 787 | "_uuid": "768050e7f210d95ba28226ada778e763d21c97f8", 788 | "scrolled": true 789 | }, 790 | "outputs": [], 791 | "source": [ 792 | "y_train = train['Survived']\n", 793 | "X_train = train.drop('Survived', axis=1)\n", 794 | "X_test = test.drop('Survived', axis=1)" 795 | ] 796 | }, 797 | { 798 | "cell_type": "code", 799 | "execution_count": 18, 800 | "metadata": {}, 801 | "outputs": [ 802 | { 803 | "data": { 804 | "text/html": [ 805 | "
\n", 806 | "\n", 819 | "\n", 820 | " \n", 821 | " \n", 822 | " \n", 823 | " \n", 824 | " \n", 825 | " \n", 826 | " \n", 827 | " \n", 828 | " \n", 829 | " \n", 830 | " \n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | " \n", 841 | " \n", 842 | " \n", 843 | " \n", 844 | " \n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | " \n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | "
PclassSexAgeFareEmbarked
03022.07.25000
11138.071.28331
23126.07.92500
31135.053.10000
43035.08.05000
\n", 873 | "
" 874 | ], 875 | "text/plain": [ 876 | " Pclass Sex Age Fare Embarked\n", 877 | "0 3 0 22.0 7.2500 0\n", 878 | "1 1 1 38.0 71.2833 1\n", 879 | "2 3 1 26.0 7.9250 0\n", 880 | "3 1 1 35.0 53.1000 0\n", 881 | "4 3 0 35.0 8.0500 0" 882 | ] 883 | }, 884 | "execution_count": 18, 885 | "metadata": {}, 886 | "output_type": "execute_result" 887 | } 888 | ], 889 | "source": [ 890 | "X_train.head()" 891 | ] 892 | }, 893 | { 894 | "cell_type": "code", 895 | "execution_count": 19, 896 | "metadata": {}, 897 | "outputs": [ 898 | { 899 | "data": { 900 | "text/plain": [ 901 | "0 0.0\n", 902 | "1 1.0\n", 903 | "2 1.0\n", 904 | "3 1.0\n", 905 | "4 0.0\n", 906 | "Name: Survived, dtype: float64" 907 | ] 908 | }, 909 | "execution_count": 19, 910 | "metadata": {}, 911 | "output_type": "execute_result" 912 | } 913 | ], 914 | "source": [ 915 | "y_train.head()" 916 | ] 917 | }, 918 | { 919 | "cell_type": "markdown", 920 | "metadata": {}, 921 | "source": [ 922 | "## 機械学習アルゴリズム" 923 | ] 924 | }, 925 | { 926 | "cell_type": "code", 927 | "execution_count": 20, 928 | "metadata": {}, 929 | "outputs": [], 930 | "source": [ 931 | "from sklearn.linear_model import LogisticRegression" 932 | ] 933 | }, 934 | { 935 | "cell_type": "code", 936 | "execution_count": 21, 937 | "metadata": {}, 938 | "outputs": [], 939 | "source": [ 940 | "clf = LogisticRegression(penalty='l2', solver='sag', random_state=0)" 941 | ] 942 | }, 943 | { 944 | "cell_type": "code", 945 | "execution_count": 22, 946 | "metadata": {}, 947 | "outputs": [ 948 | { 949 | "name": "stderr", 950 | "output_type": "stream", 951 | "text": [ 952 | "/opt/conda/lib/python3.6/site-packages/sklearn/linear_model/sag.py:337: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n", 953 | " \"the coef_ did not converge\", ConvergenceWarning)\n" 954 | ] 955 | }, 956 | { 957 | "data": { 958 | "text/plain": [ 959 | "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", 960 | " intercept_scaling=1, l1_ratio=None, max_iter=100,\n", 961 | " multi_class='warn', n_jobs=None, penalty='l2',\n", 962 | " random_state=0, solver='sag', tol=0.0001, verbose=0,\n", 963 | " warm_start=False)" 964 | ] 965 | }, 966 | "execution_count": 22, 967 | "metadata": {}, 968 | "output_type": "execute_result" 969 | } 970 | ], 971 | "source": [ 972 | "clf.fit(X_train, y_train)" 973 | ] 974 | }, 975 | { 976 | "cell_type": "code", 977 | "execution_count": 23, 978 | "metadata": {}, 979 | "outputs": [], 980 | "source": [ 981 | "y_pred = clf.predict(X_test)" 982 | ] 983 | }, 984 | { 985 | "cell_type": "code", 986 | "execution_count": 24, 987 | "metadata": {}, 988 | "outputs": [ 989 | { 990 | "data": { 991 | "text/plain": [ 992 | "array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,\n", 993 | " 0., 0., 0.])" 994 | ] 995 | }, 996 | "execution_count": 24, 997 | "metadata": {}, 998 | "output_type": "execute_result" 999 | } 1000 | ], 1001 | "source": [ 1002 | "y_pred[:20]" 1003 | ] 1004 | }, 1005 | { 1006 | "cell_type": "markdown", 1007 | "metadata": { 1008 | "_cell_guid": "a37e176c-3b55-43ab-b358-324dc384ceef", 1009 | "_uuid": "d4d6df3e6c40063309ea72f4d4cea51cf616fd80" 1010 | }, 1011 | "source": [ 1012 | "## 提出" 1013 | ] 1014 | }, 1015 | { 1016 | "cell_type": "code", 1017 | "execution_count": 25, 1018 | "metadata": { 1019 | "_cell_guid": "8111500e-330c-411e-a742-66b9d4c5cb2c", 1020 | "_uuid": "40858051e4f458835f937275be4dfe3dfa68b25f" 1021 | }, 1022 | "outputs": [], 1023 | "source": [ 1024 | "sub = pd.read_csv('../input/titanic/gender_submission.csv')\n", 1025 | "sub['Survived'] = list(map(int, y_pred))\n", 1026 | "sub.to_csv('submission.csv', index=False)" 1027 | ] 1028 | }, 1029 | { 1030 | "cell_type": "code", 1031 | "execution_count": null, 1032 | "metadata": { 1033 | "_uuid": "d51cef4a043bbab7560dc972a948d96a0b369760" 1034 | }, 1035 | "outputs": [], 1036 | "source": [] 1037 | } 1038 | ], 1039 | "metadata": { 1040 | "kernelspec": { 1041 | "display_name": "Python 3", 1042 | "language": "python", 1043 | "name": "python3" 1044 | }, 1045 | "language_info": { 1046 | "codemirror_mode": { 1047 | "name": "ipython", 1048 | "version": 3 1049 | }, 1050 | "file_extension": ".py", 1051 | "mimetype": "text/x-python", 1052 | "name": "python", 1053 | "nbconvert_exporter": "python", 1054 | "pygments_lexer": "ipython3", 1055 | "version": "3.6.6" 1056 | } 1057 | }, 1058 | "nbformat": 4, 1059 | "nbformat_minor": 1 1060 | } 1061 | -------------------------------------------------------------------------------- /ch02/ch02_02.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "_cell_guid": "e12020f7-4f94-4ecc-9007-9b7a6e7458a6", 7 | "_uuid": "1fecb0980d8d422ec0f005c4bfd6225385c2c60f" 8 | }, 9 | "source": [ 10 | "This notebook is a sample code with Japanese comments.\n", 11 | "\n", 12 | "# 2.2 全体像を把握! submitまでの処理の流れを見てみよう" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "## パッケージの読み込み" 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 1, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "import numpy as np\n", 29 | "import pandas as pd" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": { 35 | "_cell_guid": "d49e43e8-0dc0-41b7-afd0-60acc96e9f07", 36 | "_uuid": "4ecd55c5bd48390d026eeb6ae8de0a7ace0d4ada" 37 | }, 38 | "source": [ 39 | "## データの読み込み" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 2, 45 | "metadata": {}, 46 | "outputs": [ 47 | { 48 | "name": "stdout", 49 | "output_type": "stream", 50 | "text": [ 51 | "README.md gender_submission.csv test.csv train.csv\r\n" 52 | ] 53 | } 54 | ], 55 | "source": [ 56 | "!ls ../input/titanic" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 3, 62 | "metadata": { 63 | "_cell_guid": "9c963eb3-04ac-422c-bc0c-4373bda6880e", 64 | "_uuid": "95f406c4d2f1dab6744ea248b80e3a535c652450" 65 | }, 66 | "outputs": [], 67 | "source": [ 68 | "train = pd.read_csv('../input/titanic/train.csv')\n", 69 | "test = pd.read_csv('../input/titanic/test.csv')\n", 70 | "gender_submission = pd.read_csv('../input/titanic/gender_submission.csv')" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 4, 76 | "metadata": {}, 77 | "outputs": [ 78 | { 79 | "data": { 80 | "text/html": [ 81 | "
\n", 82 | "\n", 95 | "\n", 96 | " \n", 97 | " \n", 98 | " \n", 99 | " \n", 100 | " \n", 101 | " \n", 102 | " \n", 103 | " \n", 104 | " \n", 105 | " \n", 106 | " \n", 107 | " \n", 108 | " \n", 109 | " \n", 110 | " \n", 111 | " \n", 112 | " \n", 113 | " \n", 114 | " \n", 115 | " \n", 116 | " \n", 117 | " \n", 118 | " \n", 119 | " \n", 120 | " \n", 121 | " \n", 122 | " \n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | "
PassengerIdSurvived
08920
18931
28940
38950
48961
\n", 131 | "
" 132 | ], 133 | "text/plain": [ 134 | " PassengerId Survived\n", 135 | "0 892 0\n", 136 | "1 893 1\n", 137 | "2 894 0\n", 138 | "3 895 0\n", 139 | "4 896 1" 140 | ] 141 | }, 142 | "execution_count": 4, 143 | "metadata": {}, 144 | "output_type": "execute_result" 145 | } 146 | ], 147 | "source": [ 148 | "gender_submission.head()" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 5, 154 | "metadata": {}, 155 | "outputs": [ 156 | { 157 | "data": { 158 | "text/html": [ 159 | "
\n", 160 | "\n", 173 | "\n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", 269 | "
" 270 | ], 271 | "text/plain": [ 272 | " PassengerId Survived Pclass \\\n", 273 | "0 1 0 3 \n", 274 | "1 2 1 1 \n", 275 | "2 3 1 3 \n", 276 | "3 4 1 1 \n", 277 | "4 5 0 3 \n", 278 | "\n", 279 | " Name Sex Age SibSp \\\n", 280 | "0 Braund, Mr. Owen Harris male 22.0 1 \n", 281 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", 282 | "2 Heikkinen, Miss. Laina female 26.0 0 \n", 283 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", 284 | "4 Allen, Mr. William Henry male 35.0 0 \n", 285 | "\n", 286 | " Parch Ticket Fare Cabin Embarked \n", 287 | "0 0 A/5 21171 7.2500 NaN S \n", 288 | "1 0 PC 17599 71.2833 C85 C \n", 289 | "2 0 STON/O2. 3101282 7.9250 NaN S \n", 290 | "3 0 113803 53.1000 C123 S \n", 291 | "4 0 373450 8.0500 NaN S " 292 | ] 293 | }, 294 | "execution_count": 5, 295 | "metadata": {}, 296 | "output_type": "execute_result" 297 | } 298 | ], 299 | "source": [ 300 | "train.head()" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 6, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "data": { 310 | "text/html": [ 311 | "
\n", 312 | "\n", 325 | "\n", 326 | " \n", 327 | " \n", 328 | " \n", 329 | " \n", 330 | " \n", 331 | " \n", 332 | " \n", 333 | " \n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | "
PassengerIdPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
08923Kelly, Mr. Jamesmale34.5003309117.8292NaNQ
18933Wilkes, Mrs. James (Ellen Needs)female47.0103632727.0000NaNS
28942Myles, Mr. Thomas Francismale62.0002402769.6875NaNQ
38953Wirz, Mr. Albertmale27.0003151548.6625NaNS
48963Hirvonen, Mrs. Alexander (Helga E Lindqvist)female22.011310129812.2875NaNS
\n", 415 | "
" 416 | ], 417 | "text/plain": [ 418 | " PassengerId Pclass Name Sex \\\n", 419 | "0 892 3 Kelly, Mr. James male \n", 420 | "1 893 3 Wilkes, Mrs. James (Ellen Needs) female \n", 421 | "2 894 2 Myles, Mr. Thomas Francis male \n", 422 | "3 895 3 Wirz, Mr. Albert male \n", 423 | "4 896 3 Hirvonen, Mrs. Alexander (Helga E Lindqvist) female \n", 424 | "\n", 425 | " Age SibSp Parch Ticket Fare Cabin Embarked \n", 426 | "0 34.5 0 0 330911 7.8292 NaN Q \n", 427 | "1 47.0 1 0 363272 7.0000 NaN S \n", 428 | "2 62.0 0 0 240276 9.6875 NaN Q \n", 429 | "3 27.0 0 0 315154 8.6625 NaN S \n", 430 | "4 22.0 1 1 3101298 12.2875 NaN S " 431 | ] 432 | }, 433 | "execution_count": 6, 434 | "metadata": {}, 435 | "output_type": "execute_result" 436 | } 437 | ], 438 | "source": [ 439 | "test.head()" 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": 7, 445 | "metadata": {}, 446 | "outputs": [], 447 | "source": [ 448 | "data = pd.concat([train, test], sort=False)" 449 | ] 450 | }, 451 | { 452 | "cell_type": "code", 453 | "execution_count": 8, 454 | "metadata": {}, 455 | "outputs": [ 456 | { 457 | "data": { 458 | "text/html": [ 459 | "
\n", 460 | "\n", 473 | "\n", 474 | " \n", 475 | " \n", 476 | " \n", 477 | " \n", 478 | " \n", 479 | " \n", 480 | " \n", 481 | " \n", 482 | " \n", 483 | " \n", 484 | " \n", 485 | " \n", 486 | " \n", 487 | " \n", 488 | " \n", 489 | " \n", 490 | " \n", 491 | " \n", 492 | " \n", 493 | " \n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 503 | " \n", 504 | " \n", 505 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | " \n", 510 | " \n", 511 | " \n", 512 | " \n", 513 | " \n", 514 | " \n", 515 | " \n", 516 | " \n", 517 | " \n", 518 | " \n", 519 | " \n", 520 | " \n", 521 | " \n", 522 | " \n", 523 | " \n", 524 | " \n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 534 | " \n", 535 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 540 | " \n", 541 | " \n", 542 | " \n", 543 | " \n", 544 | " \n", 545 | " \n", 546 | " \n", 547 | " \n", 548 | " \n", 549 | " \n", 550 | " \n", 551 | " \n", 552 | " \n", 553 | " \n", 554 | " \n", 555 | " \n", 556 | " \n", 557 | " \n", 558 | " \n", 559 | " \n", 560 | " \n", 561 | " \n", 562 | " \n", 563 | " \n", 564 | " \n", 565 | " \n", 566 | " \n", 567 | " \n", 568 | "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
010.03Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
121.01Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
231.03Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
341.01Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
450.03Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", 569 | "
" 570 | ], 571 | "text/plain": [ 572 | " PassengerId Survived Pclass \\\n", 573 | "0 1 0.0 3 \n", 574 | "1 2 1.0 1 \n", 575 | "2 3 1.0 3 \n", 576 | "3 4 1.0 1 \n", 577 | "4 5 0.0 3 \n", 578 | "\n", 579 | " Name Sex Age SibSp \\\n", 580 | "0 Braund, Mr. Owen Harris male 22.0 1 \n", 581 | "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", 582 | "2 Heikkinen, Miss. Laina female 26.0 0 \n", 583 | "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", 584 | "4 Allen, Mr. William Henry male 35.0 0 \n", 585 | "\n", 586 | " Parch Ticket Fare Cabin Embarked \n", 587 | "0 0 A/5 21171 7.2500 NaN S \n", 588 | "1 0 PC 17599 71.2833 C85 C \n", 589 | "2 0 STON/O2. 3101282 7.9250 NaN S \n", 590 | "3 0 113803 53.1000 C123 S \n", 591 | "4 0 373450 8.0500 NaN S " 592 | ] 593 | }, 594 | "execution_count": 8, 595 | "metadata": {}, 596 | "output_type": "execute_result" 597 | } 598 | ], 599 | "source": [ 600 | "data.head()" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": 9, 606 | "metadata": {}, 607 | "outputs": [ 608 | { 609 | "name": "stdout", 610 | "output_type": "stream", 611 | "text": [ 612 | "891 418 1309\n" 613 | ] 614 | } 615 | ], 616 | "source": [ 617 | "print(len(train), len(test), len(data))" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 10, 623 | "metadata": {}, 624 | "outputs": [ 625 | { 626 | "data": { 627 | "text/plain": [ 628 | "PassengerId 0\n", 629 | "Survived 418\n", 630 | "Pclass 0\n", 631 | "Name 0\n", 632 | "Sex 0\n", 633 | "Age 263\n", 634 | "SibSp 0\n", 635 | "Parch 0\n", 636 | "Ticket 0\n", 637 | "Fare 1\n", 638 | "Cabin 1014\n", 639 | "Embarked 2\n", 640 | "dtype: int64" 641 | ] 642 | }, 643 | "execution_count": 10, 644 | "metadata": {}, 645 | "output_type": "execute_result" 646 | } 647 | ], 648 | "source": [ 649 | "data.isnull().sum()" 650 | ] 651 | }, 652 | { 653 | "cell_type": "markdown", 654 | "metadata": { 655 | "_cell_guid": "687a06ef-2686-4772-ac24-5e413adbda6d", 656 | "_uuid": "3846eff13d723fa6ff10117fc3ebc46f266b210f" 657 | }, 658 | "source": [ 659 | "## 特徴量エンジニアリング" 660 | ] 661 | }, 662 | { 663 | "cell_type": "markdown", 664 | "metadata": { 665 | "_cell_guid": "09253274-14c1-4ca9-a078-85229acba814", 666 | "_uuid": "234454857fff5bd61026c51cefd5eaee4e6a1879" 667 | }, 668 | "source": [ 669 | "### 1. Pclass" 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "execution_count": 11, 675 | "metadata": {}, 676 | "outputs": [ 677 | { 678 | "data": { 679 | "text/plain": [ 680 | "3 709\n", 681 | "1 323\n", 682 | "2 277\n", 683 | "Name: Pclass, dtype: int64" 684 | ] 685 | }, 686 | "execution_count": 11, 687 | "metadata": {}, 688 | "output_type": "execute_result" 689 | } 690 | ], 691 | "source": [ 692 | "data['Pclass'].value_counts()" 693 | ] 694 | }, 695 | { 696 | "cell_type": "markdown", 697 | "metadata": { 698 | "_cell_guid": "b84a4c4b-9db2-4626-8a28-9817084eb554", 699 | "_uuid": "610bb44d64b400e7bafbf4d6a3295c7a43f1df23" 700 | }, 701 | "source": [ 702 | "### 2. Sex" 703 | ] 704 | }, 705 | { 706 | "cell_type": "code", 707 | "execution_count": 12, 708 | "metadata": { 709 | "_cell_guid": "27c06d9c-61e1-4ba8-9cfc-81a2dd27390a", 710 | "_uuid": "07b661b256360d39ec561f465735042f37eee257" 711 | }, 712 | "outputs": [], 713 | "source": [ 714 | "data['Sex'].replace(['male', 'female'], [0, 1], inplace=True)" 715 | ] 716 | }, 717 | { 718 | "cell_type": "markdown", 719 | "metadata": { 720 | "_cell_guid": "2ab02454-4dfa-4aa3-ae94-7029b86ef69e", 721 | "_uuid": "2cb8d46258c4b14ec678543f99f4f5789d60b22f" 722 | }, 723 | "source": [ 724 | "### 3. Embarked" 725 | ] 726 | }, 727 | { 728 | "cell_type": "code", 729 | "execution_count": 13, 730 | "metadata": {}, 731 | "outputs": [ 732 | { 733 | "data": { 734 | "text/plain": [ 735 | "S 914\n", 736 | "C 270\n", 737 | "Q 123\n", 738 | "Name: Embarked, dtype: int64" 739 | ] 740 | }, 741 | "execution_count": 13, 742 | "metadata": {}, 743 | "output_type": "execute_result" 744 | } 745 | ], 746 | "source": [ 747 | "data['Embarked'].value_counts()" 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "execution_count": 14, 753 | "metadata": { 754 | "_cell_guid": "1329072e-5fc0-4aea-bc7b-ec7c27aff260", 755 | "_uuid": "5268b97889ec90508f501697d9e8d497398e0c46" 756 | }, 757 | "outputs": [], 758 | "source": [ 759 | "data['Embarked'].fillna(('S'), inplace=True)\n", 760 | "data['Embarked'] = data['Embarked'].map({'S': 0, 'C': 1, 'Q': 2}).astype(int)" 761 | ] 762 | }, 763 | { 764 | "cell_type": "markdown", 765 | "metadata": { 766 | "_cell_guid": "8c33619e-a180-4c12-9f63-e7b3023206cd", 767 | "_uuid": "382a7882e2ae3144513f8ffea8478b4e24e8df0f" 768 | }, 769 | "source": [ 770 | "### 4. Fare" 771 | ] 772 | }, 773 | { 774 | "cell_type": "code", 775 | "execution_count": 15, 776 | "metadata": { 777 | "_cell_guid": "fd9b2edd-cf75-4ad8-a100-5ca06cb53f7b", 778 | "_uuid": "161a7a829ad6a45b7745655b1713888a6778818f" 779 | }, 780 | "outputs": [], 781 | "source": [ 782 | "data['Fare'].fillna(np.mean(data['Fare']), inplace=True)" 783 | ] 784 | }, 785 | { 786 | "cell_type": "markdown", 787 | "metadata": {}, 788 | "source": [ 789 | "### 5. Age" 790 | ] 791 | }, 792 | { 793 | "cell_type": "code", 794 | "execution_count": 16, 795 | "metadata": { 796 | "_cell_guid": "5717373d-91ce-4cfd-a579-ef7dab192771", 797 | "_uuid": "42f1ebda5705d5272ea350bfd00e66c2f946a66e" 798 | }, 799 | "outputs": [], 800 | "source": [ 801 | "age_avg = data['Age'].mean()\n", 802 | "age_std = data['Age'].std()\n", 803 | "\n", 804 | "data['Age'].fillna(np.random.randint(age_avg - age_std, age_avg + age_std), inplace=True)" 805 | ] 806 | }, 807 | { 808 | "cell_type": "code", 809 | "execution_count": 17, 810 | "metadata": { 811 | "_cell_guid": "d3f3527c-8758-41c2-bbe3-14b604b2d317", 812 | "_uuid": "f7341a6f089464180e94d5e09d1071e0350cff3d" 813 | }, 814 | "outputs": [], 815 | "source": [ 816 | "delete_columns = ['Name', 'PassengerId', 'SibSp', 'Parch', 'Ticket', 'Cabin']\n", 817 | "data.drop(delete_columns, axis=1, inplace=True)" 818 | ] 819 | }, 820 | { 821 | "cell_type": "code", 822 | "execution_count": 18, 823 | "metadata": {}, 824 | "outputs": [], 825 | "source": [ 826 | "train = data[:len(train)]\n", 827 | "test = data[len(train):]" 828 | ] 829 | }, 830 | { 831 | "cell_type": "code", 832 | "execution_count": 19, 833 | "metadata": { 834 | "_cell_guid": "03d91a2b-08da-4593-8c1e-840fb7bec469", 835 | "_uuid": "768050e7f210d95ba28226ada778e763d21c97f8", 836 | "scrolled": true 837 | }, 838 | "outputs": [], 839 | "source": [ 840 | "y_train = train['Survived']\n", 841 | "X_train = train.drop('Survived', axis=1)\n", 842 | "X_test = test.drop('Survived', axis=1)" 843 | ] 844 | }, 845 | { 846 | "cell_type": "code", 847 | "execution_count": 20, 848 | "metadata": {}, 849 | "outputs": [ 850 | { 851 | "data": { 852 | "text/html": [ 853 | "
\n", 854 | "\n", 867 | "\n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | " \n", 889 | " \n", 890 | " \n", 891 | " \n", 892 | " \n", 893 | " \n", 894 | " \n", 895 | " \n", 896 | " \n", 897 | " \n", 898 | " \n", 899 | " \n", 900 | " \n", 901 | " \n", 902 | " \n", 903 | " \n", 904 | " \n", 905 | " \n", 906 | " \n", 907 | " \n", 908 | " \n", 909 | " \n", 910 | " \n", 911 | " \n", 912 | " \n", 913 | " \n", 914 | " \n", 915 | " \n", 916 | " \n", 917 | " \n", 918 | " \n", 919 | " \n", 920 | "
PclassSexAgeFareEmbarked
03022.07.25000
11138.071.28331
23126.07.92500
31135.053.10000
43035.08.05000
\n", 921 | "
" 922 | ], 923 | "text/plain": [ 924 | " Pclass Sex Age Fare Embarked\n", 925 | "0 3 0 22.0 7.2500 0\n", 926 | "1 1 1 38.0 71.2833 1\n", 927 | "2 3 1 26.0 7.9250 0\n", 928 | "3 1 1 35.0 53.1000 0\n", 929 | "4 3 0 35.0 8.0500 0" 930 | ] 931 | }, 932 | "execution_count": 20, 933 | "metadata": {}, 934 | "output_type": "execute_result" 935 | } 936 | ], 937 | "source": [ 938 | "X_train.head()" 939 | ] 940 | }, 941 | { 942 | "cell_type": "code", 943 | "execution_count": 21, 944 | "metadata": {}, 945 | "outputs": [ 946 | { 947 | "data": { 948 | "text/plain": [ 949 | "0 0.0\n", 950 | "1 1.0\n", 951 | "2 1.0\n", 952 | "3 1.0\n", 953 | "4 0.0\n", 954 | "Name: Survived, dtype: float64" 955 | ] 956 | }, 957 | "execution_count": 21, 958 | "metadata": {}, 959 | "output_type": "execute_result" 960 | } 961 | ], 962 | "source": [ 963 | "y_train.head()" 964 | ] 965 | }, 966 | { 967 | "cell_type": "markdown", 968 | "metadata": { 969 | "_cell_guid": "19f52c93-701c-4ae1-ad7c-0c89004bc1a0", 970 | "_uuid": "d2f7f7fd519f1fcc160304783c8b440e5cb552da" 971 | }, 972 | "source": [ 973 | "## 機械学習アルゴリズム" 974 | ] 975 | }, 976 | { 977 | "cell_type": "code", 978 | "execution_count": 22, 979 | "metadata": {}, 980 | "outputs": [], 981 | "source": [ 982 | "from sklearn.linear_model import LogisticRegression" 983 | ] 984 | }, 985 | { 986 | "cell_type": "code", 987 | "execution_count": 23, 988 | "metadata": {}, 989 | "outputs": [], 990 | "source": [ 991 | "clf = LogisticRegression(penalty='l2', solver='sag', random_state=0)" 992 | ] 993 | }, 994 | { 995 | "cell_type": "code", 996 | "execution_count": 24, 997 | "metadata": {}, 998 | "outputs": [ 999 | { 1000 | "name": "stderr", 1001 | "output_type": "stream", 1002 | "text": [ 1003 | "/opt/conda/lib/python3.6/site-packages/sklearn/linear_model/sag.py:337: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge\n", 1004 | " \"the coef_ did not converge\", ConvergenceWarning)\n" 1005 | ] 1006 | }, 1007 | { 1008 | "data": { 1009 | "text/plain": [ 1010 | "LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n", 1011 | " intercept_scaling=1, l1_ratio=None, max_iter=100,\n", 1012 | " multi_class='warn', n_jobs=None, penalty='l2',\n", 1013 | " random_state=0, solver='sag', tol=0.0001, verbose=0,\n", 1014 | " warm_start=False)" 1015 | ] 1016 | }, 1017 | "execution_count": 24, 1018 | "metadata": {}, 1019 | "output_type": "execute_result" 1020 | } 1021 | ], 1022 | "source": [ 1023 | "clf.fit(X_train, y_train)" 1024 | ] 1025 | }, 1026 | { 1027 | "cell_type": "code", 1028 | "execution_count": 25, 1029 | "metadata": {}, 1030 | "outputs": [], 1031 | "source": [ 1032 | "y_pred = clf.predict(X_test)" 1033 | ] 1034 | }, 1035 | { 1036 | "cell_type": "code", 1037 | "execution_count": 26, 1038 | "metadata": {}, 1039 | "outputs": [ 1040 | { 1041 | "data": { 1042 | "text/plain": [ 1043 | "array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0.,\n", 1044 | " 0., 0., 0.])" 1045 | ] 1046 | }, 1047 | "execution_count": 26, 1048 | "metadata": {}, 1049 | "output_type": "execute_result" 1050 | } 1051 | ], 1052 | "source": [ 1053 | "y_pred[:20]" 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "markdown", 1058 | "metadata": { 1059 | "_cell_guid": "a37e176c-3b55-43ab-b358-324dc384ceef", 1060 | "_uuid": "d4d6df3e6c40063309ea72f4d4cea51cf616fd80" 1061 | }, 1062 | "source": [ 1063 | "## 提出" 1064 | ] 1065 | }, 1066 | { 1067 | "cell_type": "code", 1068 | "execution_count": 27, 1069 | "metadata": { 1070 | "_cell_guid": "8111500e-330c-411e-a742-66b9d4c5cb2c", 1071 | "_uuid": "40858051e4f458835f937275be4dfe3dfa68b25f" 1072 | }, 1073 | "outputs": [], 1074 | "source": [ 1075 | "sub = pd.read_csv('../input/titanic/gender_submission.csv')\n", 1076 | "sub['Survived'] = list(map(int, y_pred))\n", 1077 | "sub.to_csv('submission.csv', index=False)" 1078 | ] 1079 | }, 1080 | { 1081 | "cell_type": "code", 1082 | "execution_count": null, 1083 | "metadata": {}, 1084 | "outputs": [], 1085 | "source": [] 1086 | } 1087 | ], 1088 | "metadata": { 1089 | "file_extension": ".py", 1090 | "kernelspec": { 1091 | "display_name": "Python 3", 1092 | "language": "python", 1093 | "name": "python3" 1094 | }, 1095 | "language_info": { 1096 | "codemirror_mode": { 1097 | "name": "ipython", 1098 | "version": 3 1099 | }, 1100 | "file_extension": ".py", 1101 | "mimetype": "text/x-python", 1102 | "name": "python", 1103 | "nbconvert_exporter": "python", 1104 | "pygments_lexer": "ipython3", 1105 | "version": "3.6.6" 1106 | }, 1107 | "mimetype": "text/x-python", 1108 | "name": "python", 1109 | "npconvert_exporter": "python", 1110 | "pygments_lexer": "ipython3", 1111 | "version": 3 1112 | }, 1113 | "nbformat": 4, 1114 | "nbformat_minor": 2 1115 | } 1116 | --------------------------------------------------------------------------------