├── .gitignore ├── 01_the_machine_learning_landscape.ipynb ├── 02_end_to_end_machine_learning_project.ipynb ├── 03_classification.ipynb ├── 04_training_linear_models.ipynb ├── 05_support_vector_machines.ipynb ├── 06_decision_trees.ipynb ├── 07_ensemble_learning_and_random_forests.ipynb ├── 08_dimensionality_reduction.ipynb ├── 09_unsupervised_learning.ipynb ├── 10_neural_nets_with_keras.ipynb ├── 11_training_deep_neural_networks.ipynb ├── 12_custom_models_and_training_with_tensorflow.ipynb ├── 13_loading_and_preprocessing_data.ipynb ├── 14_deep_computer_vision_with_cnns.ipynb ├── 15_processing_sequences_using_rnns_and_cnns.ipynb ├── 16_nlp_with_rnns_and_attention.ipynb ├── 17_autoencoders_and_gans.ipynb ├── 18_reinforcement_learning.ipynb ├── 19_training_and_deploying_at_scale.ipynb ├── INSTALL.md ├── LICENSE ├── README.md ├── apt.txt ├── book_equations.ipynb ├── book_equations.pdf ├── changes_in_2nd_edition.md ├── cover.png ├── custom_model_in_keras.ipynb ├── datasets ├── housing │ ├── README.md │ ├── housing.csv │ └── housing.tgz ├── inception │ └── imagenet_class_names.txt ├── jsb_chorales │ ├── README.md │ └── jsb_chorales.tgz ├── lifesat │ ├── README.md │ ├── gdp_per_capita.csv │ └── oecd_bli_2015.csv └── titanic │ ├── test.csv │ └── train.csv ├── docker ├── .env ├── Dockerfile ├── Makefile ├── README.md ├── bashrc.bash ├── bin │ ├── nbclean_checkpoints │ ├── nbdiff_checkpoint │ ├── rm_empty_subdirs │ └── tensorboard ├── docker-compose.yml └── jupyter_notebook_config.py ├── environment.yml ├── extra_autodiff.ipynb ├── extra_gradient_descent_comparison.ipynb ├── images ├── ann │ └── README ├── autoencoders │ └── README ├── classification │ └── README ├── cnn │ ├── README │ └── test_image.png ├── decision_trees │ └── README ├── deep │ └── README ├── distributed │ └── README ├── end_to_end_project │ ├── README │ └── california.png ├── ensembles │ └── README ├── fundamentals │ └── README ├── nlp │ └── README ├── rl │ ├── README │ └── breakout.gif ├── rnn │ └── README ├── svm │ └── README ├── tensorflow │ └── README ├── training_linear_models │ └── README └── unsupervised_learning │ ├── README │ └── ladybug.png ├── index.ipynb ├── math_differential_calculus.ipynb ├── math_linear_algebra.ipynb ├── ml-project-checklist.md ├── requirements.txt ├── tools_matplotlib.ipynb ├── tools_numpy.ipynb └── tools_pandas.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | *.bak 2 | *.bak.* 3 | *.ckpt 4 | *.old 5 | *.pyc 6 | .DS_Store 7 | .ipynb_checkpoints 8 | checkpoint 9 | logs/* 10 | tf_logs/* 11 | images/**/*.png 12 | images/**/*.dot 13 | my_* 14 | person.proto 15 | person.desc 16 | person_pb2.py 17 | datasets/flowers 18 | datasets/lifesat/lifesat.csv 19 | datasets/spam 20 | datasets/titanic 21 | datasets/words 22 | datasets/jsb_chorales 23 | 24 | -------------------------------------------------------------------------------- /INSTALL.md: -------------------------------------------------------------------------------- 1 | # 설치 2 | 3 | ## 저장소 다운로드 4 | 컴퓨터에 이 저장소를 다운로드하고 주피터 노트북을 실행하려면 먼저 git이 필요합니다. 이미 설치되어 있을 수 있습니다. 터미널을 열고 `git` 명령을 타이핑하여 확인해 보세요. 만약 git이 없다면 [git-scm.com](https://git-scm.com/)에서 다운로드할 수 있습니다. 5 | 6 | 그다음 이 저장소를 클론하기 위해 터미널을 열고 다음 명령을 타이핑하세요(맨 앞의 `$` 기호는 타이핑하지 마세요. 이 기호는 파이썬 코드가 아니라 터미널 프롬프트로 관례상 표시합니다): 7 | 8 | $ cd $HOME # 또는 각자 원하는 다른 개발 디렉토리 9 | $ git clone https://github.com/rickiepark/handson-ml2.git 10 | $ cd handson-ml2 11 | 12 | git을 설치하고 싶지 않다면 [master.zip](https://github.com/rickiepark/handson-ml2/archive/master.zip) 파일을 다운로드하여 압축을 풀고 디렉토리 이름을 `handson-ml2`로 바꾸어 주세요. 그다음 원하는 개발 디렉토리로 옮깁니다. 13 | 14 | ## 아나콘다 설치 15 | 다음으로 파이썬 3과 여러 가지 파이썬 라이브러리가 필요합니다. 가장 쉬운 설치 방법은 [아나콘다를 다운로드하고 설치](https://www.anaconda.com/distribution/)하는 것입니다. 아나콘다는 과학 컴퓨팅을 위한 훌륭한 파이썬 배포판으로 여러 플랫폼에서 사용할 수 있습니다. 넘파이, 판다스, 맷플롯립, 사이킷런 등과 같은 많은 과학 라이브러리를 포함하고 있기 때문에 용량이 큽니다. 작은 용량의 아나콘다 배포판가 필요하다면 `conda` 패키지 도구를 실행하기 위한 최소 기능을 담고 있는 [미니콘다를 설치](https://docs.conda.io/en/latest/miniconda.html)하세요. 꼭 최신 버전의 아나콘다(또는 미니콘다)를 설치해야 합니다. 16 | 17 | MacOS와 리눅스에 설치할 때 `conda init` 명령을 실행하여 아나콘다를 초기화할지 묻습니다. 이를 수락하면 터미널을 열 때마다 `conda` 명령을 사용하도록 쉘 스크립트를 업데이트합니다. 설치가 끝나면 터미널을 닫고 새로운 터미널을 열어 잘 적용되었는지 확인해 보세요. 18 | 19 | 윈도우에서 설치할 때 `PATH` 환경 변수를 업데이트할지 묻습니다. 이것은 다른 소프트웨어와 충돌할 수 있기 때문에 권장되지 않습니다. 대신 설치가 끝난 후에 아나콘다를 사용하려면 시작 메뉴를 열고 Anaconda Shell을 실행하세요. 20 | 21 | 아나콘다(또는 미니콘다)가 설치되면 다음 명령을 실행하여 `conda` 패키지 도구를 최신 버전으로 업데이트합니다: 22 | 23 | $ conda update -n base -c defaults conda 24 | 25 | > **노트**: 어떤 이유로 아나콘다를 좋아하지 않는다면 수동으로 파이썬 3을 설치하고 pip를 사용해 필요한 라이브러리를 설치할 수 있습니다(구체적인 내용에 대해 잘 알지 못한다면 권장하는 방법이 아닙니다). 일부 라이브러리는 아직 파이썬 3.8이나 3.9를 지원하지 않기 때문에 파이썬 3.7을 권장합니다. 26 | 27 | 28 | ## GPU 드라이버와 라이브러리 설치 29 | 텐서플로 호환 GPU 카드(Compute Capability ≥ 3.5인 NVidia 카드)를 가지고 있고 텐서플로에서 사용하려면 [nvidia.com](https://www.nvidia.com/Download/index.aspx?lang=en-us)에서 해당 카드에 맞는 최신 드라이버를 다운로드하고 설치해야 합니다. 또한 NVidia CUDA와 cuDNN 라이브리가 필요합니다. 다행히 아나콘다에서 tensorflow-gpu 패키지를 설치할 때 자동으로 설치됩니다. 하지만 아나콘다를 사용하지 않으면 수동으로 설치해야 합니다. 설치에 어려움이 있다면 텐서플로 [GPU 설치 가이드](https://tensorflow.org/install/gpu)를 참고하세요. 30 | 31 | ## `tf2` 환경 만들기 32 | 그다음 `handson-ml2` 디렉토리 안에서 다음 명령을 실행하세요. 이 명령은 노트북을 실행하기 위해 필요한 모든 라이브러리를 포함한 새로운 `conda` 환경을 만듭니다(기본적으로 이 환경의 이름은 `tf2`이지만 `-n` 옵션으로 바꿀 수 있습니다): 33 | 34 | $ conda env create -f environment.yml 35 | 36 | 그다음 새로운 환경을 활성화합니다: 37 | 38 | $ conda activate tf2 39 | 40 | 41 | ## 주피터 시작 42 | 거의 다 되었습니다! `tf2` 콘다 환경을 주피터에 등록해야 합니다. 이 프로젝트의 노트북은 기본적으로 `python3` 이름의 환경을 사용합니다. 따라서 이 환경을 `python3`로 등록하는 것이 좋습니다(다른 이름으로 등록하고 싶다면 노트북을 열 때마다 주피터에서 "Kernel > Change kernel..." 메뉴를 선택해야 합니다): 43 | 44 | $ python3 -m ipykernel install --user --name=python3 45 | 46 | 이제 끝났습니다! 다음 명령으로 주피터를 실행할 수 있습니다: 47 | 48 | $ jupyter notebook 49 | 50 | 브라우저가 열리고 주피터가 현재 디렉토리의 목록을 보여줍니다. 브라우저가 자동으로 열리지 않는다면 [localhost:8888](http://localhost:8888/tree)에 접속해 보세요. `index.ipynb`를 클릭하여 시작합니다. 51 | 52 | 축하합니다! 머신러닝을 배우고 실습할 준비를 마쳤습니다! 53 | 54 | 주피터 작업을 마치려면 주피터를 실행한 터미널 윈도에서 Ctrl-C를 타이핑하여 종료할 수 있습니다. 이 프로젝트를 사용할 때마다 터미널을 열고 다음을 실행합니다: 55 | 56 | $ cd $HOME # 또는 설치한 다른 디렉토리 57 | $ cd handson-ml2 58 | $ conda activate tf2 59 | $ jupyter notebook 60 | 61 | ## 프로젝트와 라이브러리 업데이트하기 62 | 이슈를 해결하고 새로운 라이브러리를 지원하기 위해 정기적으로 노트북을 업데이트합니다. 따라서 업데이트된 프로젝트를 받는 것이 좋습니다. 63 | 64 | 이렇게 하려면 터미널을 열고 다음을 실행하세요: 65 | 66 | $ cd $HOME # 또는 설치한 다른 디렉토리 67 | $ cd handson-ml2 # 프로젝트 디렉토리로 이동 68 | $ git pull 69 | 70 | 에러가 난다면 아마도 노트북이 수정되었기 때문입니다. 이런 경우에 `git pull`을 실행하기 전에 먼저 수정한 내용을 커밋해야 합니다. 이 작업은 별도의 브랜치에서 진행하는 것이 좋습니다. 그렇지 않으면 충돌이 발생할 것입니다: 71 | 72 | $ git checkout -b my_branch # 원하는 브랜치 이름 73 | $ git add -u 74 | $ git commit -m "수정 내용" 75 | $ git checkout master 76 | $ git pull 77 | 78 | 그다음 라이브러리를 업데이트해 보죠. 먼저 `conda` 자체를 업데이트합니다: 79 | 80 | $ conda update -c defaults -n base conda 81 | 82 | 그다음 `tf2` 환경을 삭제합니다: 83 | 84 | $ conda activate base 85 | $ conda env remove -n tf2 86 | 87 | 그리고 환경을 새로 만듭니다: 88 | 89 | $ conda env create -f environment.yml 90 | 91 | 마지막으로 환경을 다시 활성화 하고 주피터를 시작합니다: 92 | 93 | $ conda activate tf2 94 | $ jupyter notebook 95 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | 179 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 핸즈온 머신러닝 주피터 노트북 2 | ========================== 3 | 4 | 파이썬을 사용해 머신러닝을 공부하는데 도움이 되기 위해 이 저장소를 만들었습니다. 여기에는 한빛미디어의 핸즈온 머신러닝 2판의 예제 코드와 연습문제 답도 포함되어 있습니다. 5 | 6 | 7 | 8 | 이 책은 서점에서 판매 중입니다. [Yes24](http://www.yes24.com/Product/Goods/89959711), [교보문고](http://www.kyobobook.co.kr/product/detailViewKor.laf?ejkGb=KOR&mallGb=KOR&barcode=9791162242964), [알라딘](https://www.aladin.co.kr/shop/wproduct.aspx?ItemId=237677114), [한빛미디어](http://www.hanbit.co.kr/store/books/look.php?p_code=B7033438574) 9 | 10 | **노트**: 1판의 노트북을 찾고 있다면 [rickiepark/handson-ml](https://github.com/rickiepark/handson-ml)을 참고하세요. 11 | 12 | ## 동영상 강의 13 | 14 | 이 책의 동영상 강의를 [유튜브](http://bit.ly/homl2-youtube)와 [인프런](https://www.inflearn.com/course/%ED%95%B8%EC%A6%88%EC%98%A8-%EB%A8%B8%EC%8B%A0%EB%9F%AC%EB%8B%9D)에서 볼 수 있습니다. 혼자 공부하시는 분들에게 도움이 되면 좋겠습니다. :-) 15 | 16 | [![핸즈온 머신러닝 2](https://img.youtube.com/vi/kpuRasV_Q9k/0.jpg)](http://bit.ly/homl2-youtube) 17 | 18 | ## 시작하기 19 | 20 | ### 아무것도 설치하지 않고 노트북을 실행하고 싶나요? 21 | 22 | 다음 서비스 중 하나를 사용하세요. 23 | 24 | **경고**: 이런 서비스들은 임시 환경을 제공합니다. 실행이 끝난 후 시간이 지나면 모두 사라집니다. 필요하다면 다운로드해서 보관하세요. 25 | 26 | * **추천 옵션**: 구글 [코랩](https://colab.research.google.com/github/rickiepark/handson-ml2/blob/master/)(Colab): 27 | 을 사용합니다. 28 | 29 | * 또는 [바인더](https://mybinder.org/v2/gh/rickiepark/handson-ml2/master)(Binder): 30 | 를 사용합니다. 31 | 32 | * _노트_: 대부분의 경우 바인더가 빠르게 잘 실행되지만 이 깃허브가 업데이트되면 바인더가 처음부터 새로운 환경을 만들어야 하기 때문에 시간이 좀 걸립니다. 33 | 34 | * 또는 [딥노트](https://beta.deepnote.com/launch?template=data-science&url=https%3A//github.com/rickiepark/handson-ml2/blob/master/index.ipynb)(Deepnote): 35 | 36 | 37 | ### 코드를 실행하지 않고 노트북을 간단히 둘러 보고 싶나요? 38 | 39 | [주피터 노트북 뷰어](https://nbviewer.jupyter.org/github/rickiepark/handson-ml2/blob/master/index.ipynb): 40 | 로 이 저장소의 노트북을 볼 수 있습니다. 41 | 42 | _노트_: [깃허브의 노트북 뷰어](index.ipynb)를 사용할 수도 있지만 느리고 수학 공식을 완전하게 표시하지 못합니다. 43 | 44 | ### 도커 이미지를 사용해 실행하고 싶나요? 45 | 46 | [도커 가이드](https://github.com/rickiepark/handson-ml2/tree/master/docker)를 참고하세요. 47 | 48 | ### 자신의 컴퓨터에서 실행하고 싶나요? 49 | 50 | 먼저 [아나콘다](https://www.anaconda.com/distribution/)(Anaconda) (또는 [미니콘다](https://docs.conda.io/en/latest/miniconda.html)(Miniconda)), [깃](https://git-scm.com/downloads)을 설치하세요. 텐서플로와 호환되는 GPU를 가지고 있다면 [GPU 드라이버](https://www.nvidia.com/Download/index.aspx)와 알맞은 버전의 CUDA와 cuDNN을 설치하세요(자세한 내용은 텐서플로 문서를 참고하세요). 51 | 52 | 그다음 터미널에서 다음 명령을 실행하여 이 저장소를 클론하세요(`$` 기호는 입력하지 마세요. 이 기호는 터미널 명령이라는 것을 표시하는 것 뿐입니다): 53 | 54 | $ git clone https://github.com/rickiepark/handson-ml2.git 55 | $ cd handson-ml2 56 | 57 | 그다음 다음 명령을 실행합니다: 58 | 59 | $ conda env create -f environment.yml # 윈도우일 경우 environment-windows.yml 60 | $ conda activate homl2 # 윈도우일 경우 conda activate tf2 61 | $ python -m ipykernel install --user --name=python3 62 | 63 | 이제 주피터를 시작합니다: 64 | 65 | $ jupyter notebook 66 | 67 | 더 자세한 내용은 [설치 가이드](INSTALL.md)를 참고하세요. 68 | 69 | ## 자주하는 질문 70 | 71 | **어떤 파이썬 버전을 사용해야 하나요?** 72 | 73 | 파이썬 3.7을 추천합니다. 위에 소개한 설치 안내를 따랐다면 파이썬 3.7이 설치되었을 것입니다. 파이썬 3의 다른 버전도 대부분 사용할 수 있지만 일부 라이브러리는 파이썬 3.8이나 3.9를 지원하지 않습니다. 그래서 파이썬 3.7을 권장합니다. 74 | 75 | **`load_housing_data()`를 호출할 때 에러가 발생합니다** 76 | 77 | `load_housing_data()`를 호출하기 전에 `fetch_housing_data()`를 호출했는지 확인하세요. HTTP 에러가 발생한다면 작성한 코드가 노트북에 있는 코드와 동일한지 확인하세요(복사해서 붙여넣기해 보세요). 문제가 계속된다면 네트워크 설정을 확인하는 것이 좋습니다. 78 | 79 | **MacOSX에서 SSL 에러가 발생합니다** 80 | 81 | 아마 SSL 인증을 설치해야 합니다([스택오버플로우 질문](https://stackoverflow.com/questions/27835619/urllib-and-ssl-certificate-verify-failed-error)을 참고하세요). 공식 웹사이트에서 파이썬을 다운로드했다면 터미널에서 `/Applications/Python\ 3.7/Install\ Certificates.command`을 실행하세요(설치된 버전이 다르면 `3.7`을 바꿔 주세요). MacPorts로 파이썬을 설치했다면 터미널에서 `sudo port install curl-ca-bundle`를 실행하세요. 82 | 83 | **이 프로젝트를 로컬에 설치했습니다. 어떻게 최신 버전으로 업데이트하나요?** 84 | 85 | [INSTALL.md](INSTALL.md) 문서를 참고하세요. 86 | 87 | **아나콘다를 사용할 때 어떻게 파이썬 라이브러리를 최신 버전으로 업데이트하나요?** 88 | 89 | [INSTALL.md](INSTALL.md) 문서를 참고하세요. 90 | 91 | ## 기여자 92 | 93 | 유용한 피드백을 전달해 주고 이슈를 등록하고 RP을 보내준 모든 분들께 감사합니다. 특별히 일부 연습문제의 답을 도와준 [Haesun Park(박해선)](https://tensorflow.blog/about/)에게 감사합니다. 또 `docker` 디렉토리를 만들어준 Steven Bunkley와 Ziembla에게 감사합니다. 연습문제 답을 도와준 깃허브 유저 SuperYorio에게도 감사합니다. 94 | -------------------------------------------------------------------------------- /apt.txt: -------------------------------------------------------------------------------- 1 | build-essential 2 | cmake 3 | ffmpeg 4 | git 5 | libboost-all-dev 6 | libjpeg-dev 7 | libpq-dev 8 | libsdl2-dev 9 | sudo 10 | swig 11 | unzip 12 | xorg-dev 13 | xvfb 14 | zip 15 | zlib1g-dev 16 | -------------------------------------------------------------------------------- /book_equations.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/book_equations.pdf -------------------------------------------------------------------------------- /changes_in_2nd_edition.md: -------------------------------------------------------------------------------- 1 | # 2판에서 달라진 점 2 | 3 | 2판의 주요 목표 6개는 다음과 같습니다: 4 | 5 | 1. 추가적인 ML 주제를 다룹니다: 비지도 학습 기법(군집, 이상치 탐지, 밀도 추정, 혼합 모델), 심층 신경망 훈련 방법(자기 정규화 신경망), 컴퓨터 비전 기법(Xception, SENet, YOLO를 사용한 객체 탐지, R-CNN을 사용한 시맨틱 분할), CNN으로 시퀀스 다루기(WaveNet), RNN/CNN/트랜스포머를 사용한 자연어 처리, 생성적 적대 신경망. 6 | 2. 추가적인 라이브러리와 API를 다룹니다: 케라스, 데이터 API, 강화 학습을 위한 TF-Agents, 분산 전략 API를 사용한 대규모 TF 모델 훈련과 배포, TF 서빙, 구글 클라우드 AI 플랫폼. 7 | 3. 최근 딥러닝 연구 중에서 중요한 결과를 담습니다. 8 | 4. 모든 텐서플로 관련 장을 텐서플로 2로 업그레이드하고 코드를 간소화하기 위해 가능하면 케라스 API(tf.keras)를 사용해 텐서플로 모델을 구현합니다. 9 | 5. 사이킷런, 넘파이, 판다스, 맷플롯립 등의 최신 라이브러리에 맞춰 예제 코드를 업데이트합니다. 10 | 6. 일부 섹션을 명확하게 바꾸고 에러를 고칩니다. 많은 피드백을 준 독자들에게 감사합니다. 11 | 12 | 추가된 장과 새로 쓰거나 재배치된 장이 있습니다. 다음 표는 1판과 2판의 장 사이에 연관성을 보여줍니다: 13 | 14 | |1판의 장 | 2판의 장 | 변경량(%) | 2판의 제목 15 | |--|--|--|--| 16 | |1|1|<10%|한눈에 보는 머신러닝 17 | |2|2|<10%|머신러닝 프로젝트 처음부터 끝까지 18 | |3|3|<10%|분류 19 | |4|4|<10%|모델 훈련 20 | |5|5|<10%|서포트 벡터 머신 21 | |6|6|<10%|결정 트리 22 | |7|7|<10%|앙상블 학습과 랜덤 포레스트 23 | |8|8|<10%|차원 축소 24 | |N/A|9|100% 추가|비지도 학습 25 | |10|10|~75%|케라스를 사용한 인공 신경망 소개 26 | |11|11|~50%|심층 신경망 훈련하기 27 | |9|12|100% 재작성|텐서플로를 사용한 사용자 정의 모델과 훈련 28 | |12장 일부|13|100% 재작성|텐서플로에서 데이터 적재와 전처리하기 29 | |13|14|~50%|합성곱 신경망을 사용한 컴퓨터 비전 30 | |14장 일부|15|~75%|RNN과 CNN을 사용해 시퀀스 처리하기 31 | |14장 일부|16|~90%|RNN과 어텐션을 사용한 자연어 처리 32 | |15|17|~75%|오토인코더와 GAN을 사용한 표현 학습과 생성적 학습 33 | |16|18|~75%|강화 학습 34 | |12장 일부|19|~75% 추가|대규모 텐서플로 모델 훈련과 배포 35 | 36 | 조금 더 구체적인 2판의 주요 변경 사항은 다음과 같습니다(설명 보완, 오류 수정, 코드 업데이트는 제외합니다): 37 | 38 | * 1장 – 한눈에 보는 머신러닝 39 | * ML 애플리케이션 사례와 알고리즘 추가 40 | * 훈련 세트와 검증 세트/테스트 세트가 다를 때 처리 방법 추가 41 | * 2장 – 머신러닝 프로젝트 처음부터 끝까지 42 | * 신뢰 구간 계산 방법 추가 43 | * 설치 안내 보강 (예를 들어, 윈도우 환경) 44 | * 업그레이드된 `OneHotEncoder`와 새로운 `ColumnTransformer` 소개 45 | * 배포, 모니터링, 유지보수에 관한 상세 내용 추가 46 | * 4장 – 모델 훈련 47 | * 훈련 샘플이 IID를 만족해야 하는 필요성 설명 48 | * 7장 – 앙상블 학습과 랜덤 포레스트 49 | * XGBoost 절 추가 50 | * 9장 – 비지도 학습 (새로운 장) 51 | * K-평균을 사용한 군집, 클러스터 수를 선택하는 방법, 차원 축소 용도로 사용하는 방법, 준지도 학습, 이미지 분할 등 52 | * DBSCAN 군집 알고리즘과 사이킷런에 있는 다른 군집 알고리즘 소개 53 | * 가우시안 혼합 모델, EM 알고리즘, 베이지안 변분 추론, 혼합 모델을 군집, 밀도 추정, 이상치 탐지, 특이치 탐지에 사용하는 방법 54 | * 다른 이상치 탐지와 특이치 탐지 알고리즘 소개 55 | * 10장 – 케라스를 사용한 인공 신경망 소개 (거의 다시 씀) 56 | * 케라스 (시퀀셜, 함수형, 서브클래싱) API 소개, 모델 저장, (`TensorBoard` 콜백을 포함한)콜백 추가 57 | * 11장 – 심층 신경망 훈련하기 (변경 사항 많음) 58 | * 자기 정규화 신경망, SELU 활성화 함수, 알파 드롭아웃 소개 59 | * 자기 지도 학습 소개 60 | * Nadam 최적화 추가 61 | * 몬테 카를로 드롭아웃 추가 62 | * 적응적 최적화 방법의 위험에 관한 노트 추가 63 | * 실용적 가이드라인 업데이트 64 | * 12장 – 텐서플로를 사용한 사용자 정의 모델과 훈련 (완전히 재작성) 65 | * 텐서플로 2 소개 66 | * 텐서플로의 저수준 파이썬 API 67 | * 사용자 정의 손실 함수, 지표, 층, 모델 작성하기 68 | * 자동 미분을 사용하여 사용자 정의 훈련 알고리즘 만들기 69 | * 텐서플로 함수와 그래프 (트레이싱과 오토그래프 포함) 70 | * 13장 – 텐서플로에서 데이터 적재와 전처리하기 (새로운 장) 71 | * 데이터 API 72 | * TFRecord를 사용하여 효율적으로 데이터 적재/저장 73 | * 사용자 정의 전처리 층 작성, 케라스 전처리 층 사용, 원-핫 벡터/BoW/TF-IDF/임베딩을 사용해 범주형 특성과 텍스트 인코딩 74 | * TF 변환과 TF 데이터셋 소개 75 | * 저수준 신경망 구현을 연습문제로 이동 76 | * 데이터 API로 대체된 큐와 리더 내용을 삭제 77 | * 14장 – 합성곱 신경망을 사용한 컴퓨터 비전 78 | * Xception과 SENet 구조 추가 79 | * ResNet-34의 케라스 구현 추가 80 | * 케라스로 사전 훈련된 모델 사용하는 방법 81 | * 엔드-투-엔드 전이 학습 예제 추가 82 | * 분류와 위치 추정(localization) 추가 83 | * 완전 합성곱 신경망(FCN) 소개 84 | * YOLO 구조를 사용한 객체 탐지 소개 85 | * R-CNN을 사용한 시맨틱 분할 소개 86 | * 15장 – RNN과 CNN을 사용해 시퀀스 처리하기 87 | * Wavenet 소개 추가 88 | * 인코더-디코더 구조와 양방향 RNN을 16장으로 이동 89 | * 16장 – RNN과 어텐션을 사용한 자연어 처리 (새로운 장) 90 | * 시퀀셜 데이터를 다루기 위해 데이터 API 사용하는 방법 설명 91 | * 상태가 있는 경우와 상태가 없는 Char-RNN을 사용한 엔드-투-엔트 텍스트 생성 예제 92 | * LSTM을 사용한 엔드-투-엔드 감성 분석 예제 93 | * 케라스 마스킹 설명 94 | * TF 허브를 사용해 사전 훈련된 임베딩 재사용하는 방법 95 | * 텐서플로 애드온의 seq2seq를 사용해 신경망 기계 번역을 위한 인코더-디코더 만드는 방법 96 | * 빔 검색 소개 97 | * 어텐션 메커니즘 소개 98 | * 비주얼 어텐션에 대한 간단한 소개와 설명 가능성에 대한 노트 추가 99 | * 위치 임베딩과 멀티-헤드 어텐션을 포함한 완전한 어텐션 기반 트랜스포머 구조 소개 100 | * 최신 언어 모델에 대한 소개 추가 (2018년) 101 | * 17장 – 오토인코더와 GAN을 사용한 표현 학습과 생성적 학습 102 | * 합성곱 오토인코더와 순환 오토인코더 추가 103 | * 기본 GAN, 심층 합성곱 GAN(DCGAN), ProGAN, StyleGAN을 포함한 생성적 적대 신경망(GAN) 추가 104 | * 18장 – 강화 학습 105 | * 더블 DQN, 듀얼링 DQN, 우선 순위 기반 경험 재생 106 | * TF Agents 소개 107 | * 19장 – 대규모 텐서플로 모델 훈련과 배포 (거의 다시 씀) 108 | * TF 서빙과 구글 클라우드 AI 플랫폼을 사용한 텐서플로 모델 서빙 109 | * TFLite를 사용하여 모바일이나 임베디드 장치에 모델 배포하기 110 | * GPU를 사용하여 계산 속도를 높이기 111 | * 분산 전략 API를 사용해 여러 장치에서 모델 훈련하기 112 | 113 | ## TF 1에서 TF 2로 마이그레이션 114 | 115 | 텐서플로 1.x에서 2.0으로 마이그레이션하는 것은 파이썬 2에서 3으로 바꾸는 것과 비슷합니다. 맨 먼저 해야할 일은 ... 심호흡입니다. 서두르지 마세요. 텐서플로 1.x은 당분간 지원이 계속되므로 아직 시간이 있습니다. 116 | 117 | * 먼저 가장 최근의 텐서플로 1.x 버전으로 업그레이드해야 합니다(이 글을 읽는 시점에는 아마도 1.15일 것입니다). 118 | * tf.keras나 Estimators API와 같은 고수준 API를 사용하도록 가능한 많은 코드를 바꾸세요. Estimators API가 TF 2.0에서 여전히 작동하지만 지금부터는 케라스를 사용하는 것이 좋습니다. TF 팀에서 케라스를 우선한다고 공표했고 케라스 API를 향상하는데 더 많은 노력을 기울일 가능성이 높습니다. 또한 `tf.feature_columns` 대신에 케라스 전처리 층(13장 참조)을 사용하세요. 119 | * 고수준 API만 사용하는 코드라면 최신 TF 1.x과 TF 2.0에서 똑같이 작동하기 때문에 마이그레이션하기 쉽습니다. 120 | * TF 2.0에는 `tf.contrib`가 없기 때문에 이를 사용하지 말아야 합니다. 이 중 일부는 핵심 API로 이동하였고 나머지는 별도의 프로젝트로 옮겨졌습니다. 일부는 더 이상 유지보수되지 않아서 삭제되었습니다. 필요하다면 적당한 라이브러리를 설치하거나 (마지막 수단으로) `tf.contrib`의 레거시 코드를 자신의 프로젝트로 복사하세요. 121 | * 가능한 많은 테스트 케이스를 작성하세요. 마이그레이션이 쉽고 안전해집니다. 122 | * TF 2.0에서 `import tensorflow.compat.v1 as tf`와 `tf.disable_v2_behavior()`로 프로그램을 시작하면 TF 1.x 코드를 실행할 수 있습니다. 123 | * 마이그레이션할 준비가 되면 `tf_upgrade_v2` [업그레이드 스크립트](https://www.tensorflow.org/beta/guide/upgrade)를 실행합니다. 124 | 125 | 마이그레이션에 대해 더 자세한 내용은 텐서플로의 [마이그레이션 가이드](https://www.tensorflow.org/guide/migrate)를 참조하세요. 126 | -------------------------------------------------------------------------------- /cover.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/cover.png -------------------------------------------------------------------------------- /custom_model_in_keras.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "kernelspec": { 6 | "display_name": "TensorFlow 2.3 on Python 3.6 (CUDA 10.1)", 7 | "language": "python", 8 | "name": "python3" 9 | }, 10 | "language_info": { 11 | "codemirror_mode": { 12 | "name": "ipython", 13 | "version": 3 14 | }, 15 | "file_extension": ".py", 16 | "mimetype": "text/x-python", 17 | "name": "python", 18 | "nbconvert_exporter": "python", 19 | "pygments_lexer": "ipython3", 20 | "version": "3.6.9" 21 | }, 22 | "colab": { 23 | "name": "custom_model_in_keras.ipynb", 24 | "provenance": [] 25 | } 26 | }, 27 | "cells": [ 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "TuVJZkooU5x3" 32 | }, 33 | "source": [ 34 | "# 케라스 API를 사용한 사용자 정의 모델 만들기 with 텐서플로 2.3+2.4\n", 35 | "\n", 36 | "DLD(Daejeon Learning Day) 2020을 위해 작성된 노트북입니다.\n", 37 | "\n", 38 | "* 깃허브 주소: https://github.com/rickiepark/handson-ml2/blob/master/custom_model_in_keras.ipynb\n", 39 | "* 코랩 주소: https://colab.research.google.com/github/rickiepark/handson-ml2/blob/master/custom_model_in_keras.ipynb" 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "metadata": { 45 | "id": "i01nJ_jCU5x5", 46 | "outputId": "73f2a283-88e4-48b9-9432-b799e8829c15", 47 | "colab": { 48 | "base_uri": "https://localhost:8080/", 49 | "height": 35 50 | } 51 | }, 52 | "source": [ 53 | "import tensorflow as tf\n", 54 | "\n", 55 | "tf.__version__" 56 | ], 57 | "execution_count": 1, 58 | "outputs": [ 59 | { 60 | "output_type": "execute_result", 61 | "data": { 62 | "application/vnd.google.colaboratory.intrinsic+json": { 63 | "type": "string" 64 | }, 65 | "text/plain": [ 66 | "'2.6.0'" 67 | ] 68 | }, 69 | "metadata": {}, 70 | "execution_count": 1 71 | } 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": { 77 | "id": "T14-oCyRU5x6" 78 | }, 79 | "source": [ 80 | "### MNIST 손글씨 숫자 데이터 적재" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "metadata": { 86 | "id": "JOiyYV5vU5x7", 87 | "outputId": "de65a0cb-3c0a-409b-e5d5-fcfdda280549", 88 | "colab": { 89 | "base_uri": "https://localhost:8080/" 90 | } 91 | }, 92 | "source": [ 93 | "(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()\n", 94 | "\n", 95 | "X_train = X_train.reshape(-1, 784) / 255." 96 | ], 97 | "execution_count": 2, 98 | "outputs": [ 99 | { 100 | "output_type": "stream", 101 | "text": [ 102 | "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz\n", 103 | "11493376/11490434 [==============================] - 0s 0us/step\n", 104 | "11501568/11490434 [==============================] - 0s 0us/step\n" 105 | ], 106 | "name": "stdout" 107 | } 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "metadata": { 113 | "id": "XlHCNho_U5x7", 114 | "outputId": "772f3056-75aa-4424-9f23-8416ccfb7ec9", 115 | "colab": { 116 | "base_uri": "https://localhost:8080/" 117 | } 118 | }, 119 | "source": [ 120 | "X_train.shape" 121 | ], 122 | "execution_count": 3, 123 | "outputs": [ 124 | { 125 | "output_type": "execute_result", 126 | "data": { 127 | "text/plain": [ 128 | "(60000, 784)" 129 | ] 130 | }, 131 | "metadata": {}, 132 | "execution_count": 3 133 | } 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": { 139 | "id": "skabK6kOU5x8" 140 | }, 141 | "source": [ 142 | "### `Sequential()` 클래스와 함수형 API의 관계" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": { 148 | "id": "thdjt3qKU5x8" 149 | }, 150 | "source": [ 151 | "`Sequential()`:" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": { 157 | "id": "odseYIpgU5x9" 158 | }, 159 | "source": [ 160 | "시퀀셜 모델에 10개의 유닛을 가진 완전 연결 층을 추가합니다." 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "metadata": { 166 | "id": "Dde_dt0VU5x9", 167 | "outputId": "4872c912-a03a-4453-96dc-ad37cf6033a8", 168 | "colab": { 169 | "base_uri": "https://localhost:8080/" 170 | } 171 | }, 172 | "source": [ 173 | "seq_model = tf.keras.Sequential()\n", 174 | "\n", 175 | "seq_model.add(tf.keras.layers.Dense(units=10, \n", 176 | " activation='softmax',\n", 177 | " input_shape=(784,)))\n", 178 | "\n", 179 | "seq_model.summary()" 180 | ], 181 | "execution_count": 4, 182 | "outputs": [ 183 | { 184 | "output_type": "stream", 185 | "text": [ 186 | "Model: \"sequential\"\n", 187 | "_________________________________________________________________\n", 188 | "Layer (type) Output Shape Param # \n", 189 | "=================================================================\n", 190 | "dense (Dense) (None, 10) 7850 \n", 191 | "=================================================================\n", 192 | "Total params: 7,850\n", 193 | "Trainable params: 7,850\n", 194 | "Non-trainable params: 0\n", 195 | "_________________________________________________________________\n" 196 | ], 197 | "name": "stdout" 198 | } 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "metadata": { 204 | "id": "UEO2GA4lU5x-", 205 | "outputId": "61a45639-1294-476a-b375-c87746ce366d", 206 | "colab": { 207 | "base_uri": "https://localhost:8080/" 208 | } 209 | }, 210 | "source": [ 211 | "seq_model.compile(loss='sparse_categorical_crossentropy',\n", 212 | " metrics=['accuracy'])\n", 213 | "seq_model.fit(X_train, y_train, batch_size=32, epochs=2)" 214 | ], 215 | "execution_count": 5, 216 | "outputs": [ 217 | { 218 | "output_type": "stream", 219 | "text": [ 220 | "Epoch 1/2\n", 221 | "1875/1875 [==============================] - 4s 2ms/step - loss: 0.4404 - accuracy: 0.8830\n", 222 | "Epoch 2/2\n", 223 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.3028 - accuracy: 0.9154\n" 224 | ], 225 | "name": "stdout" 226 | }, 227 | { 228 | "output_type": "execute_result", 229 | "data": { 230 | "text/plain": [ 231 | "" 232 | ] 233 | }, 234 | "metadata": {}, 235 | "execution_count": 5 236 | } 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": { 242 | "id": "yQtlaqq7U5x-" 243 | }, 244 | "source": [ 245 | "### 함수형 API:\n", 246 | "\n", 247 | "함수형 API를 사용할 때는 `Input()`을 사용해 입력의 크기를 정의해야 합니다. 하지만 `InputLayer` 층이 추가되어 있습니다." 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "metadata": { 253 | "id": "sK9N1D7gU5x-", 254 | "outputId": "45fde0e4-4d8a-456e-8037-716f16c22553", 255 | "colab": { 256 | "base_uri": "https://localhost:8080/" 257 | } 258 | }, 259 | "source": [ 260 | "inputs = tf.keras.layers.Input(784)\n", 261 | "\n", 262 | "outputs = tf.keras.layers.Dense(units=10,\n", 263 | " activation='softmax')(inputs) # __call()__ 메서드 호출\n", 264 | "# dense = tf.keras.layers.Dense(units=10, activation='softmax')\n", 265 | "# outputs = dense(inputs)\n", 266 | "\n", 267 | "func_model = tf.keras.Model(inputs, outputs)\n", 268 | "\n", 269 | "func_model.summary()" 270 | ], 271 | "execution_count": 6, 272 | "outputs": [ 273 | { 274 | "output_type": "stream", 275 | "text": [ 276 | "Model: \"model\"\n", 277 | "_________________________________________________________________\n", 278 | "Layer (type) Output Shape Param # \n", 279 | "=================================================================\n", 280 | "input_1 (InputLayer) [(None, 784)] 0 \n", 281 | "_________________________________________________________________\n", 282 | "dense_1 (Dense) (None, 10) 7850 \n", 283 | "=================================================================\n", 284 | "Total params: 7,850\n", 285 | "Trainable params: 7,850\n", 286 | "Non-trainable params: 0\n", 287 | "_________________________________________________________________\n" 288 | ], 289 | "name": "stdout" 290 | } 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "metadata": { 296 | "id": "-eSCyp0eU5x_", 297 | "outputId": "47b25432-29c6-4992-df41-3a4ee28c0310", 298 | "colab": { 299 | "base_uri": "https://localhost:8080/" 300 | } 301 | }, 302 | "source": [ 303 | "func_model.compile(loss='sparse_categorical_crossentropy',\n", 304 | " metrics=['accuracy'])\n", 305 | "func_model.fit(X_train, y_train, batch_size=32, epochs=2)" 306 | ], 307 | "execution_count": 7, 308 | "outputs": [ 309 | { 310 | "output_type": "stream", 311 | "text": [ 312 | "Epoch 1/2\n", 313 | "1875/1875 [==============================] - 3s 1ms/step - loss: 0.4453 - accuracy: 0.8799\n", 314 | "Epoch 2/2\n", 315 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.3022 - accuracy: 0.9156\n" 316 | ], 317 | "name": "stdout" 318 | }, 319 | { 320 | "output_type": "execute_result", 321 | "data": { 322 | "text/plain": [ 323 | "" 324 | ] 325 | }, 326 | "metadata": {}, 327 | "execution_count": 7 328 | } 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": { 334 | "id": "DxaYANyKU5x_" 335 | }, 336 | "source": [ 337 | "`Input`의 정체는 무엇일까요? 이 함수는 `InputLayer` 클래스의 객체를 만들어 그 결과를 반환합니다." 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "metadata": { 343 | "id": "LwRzr7ryU5x_", 344 | "outputId": "62eef073-1f85-402a-ca98-9b421f976e0d", 345 | "colab": { 346 | "base_uri": "https://localhost:8080/" 347 | } 348 | }, 349 | "source": [ 350 | "type(tf.keras.layers.Input)" 351 | ], 352 | "execution_count": 8, 353 | "outputs": [ 354 | { 355 | "output_type": "execute_result", 356 | "data": { 357 | "text/plain": [ 358 | "function" 359 | ] 360 | }, 361 | "metadata": {}, 362 | "execution_count": 8 363 | } 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": { 369 | "id": "IJJtKIhLU5yA" 370 | }, 371 | "source": [ 372 | "사실 신경망의 입력층은 입력 그 자체입니다. `InputLayer` 객체의 입력 노드 출력을 그대로 `Dense` 층에 주입할 수 있습니다. 모든 층은 입력과 출력 노드를 정의합니다." 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "metadata": { 378 | "scrolled": true, 379 | "id": "5GDNPCE-U5yA", 380 | "outputId": "14dce074-e4ca-4f2b-d8df-f815343452b6", 381 | "colab": { 382 | "base_uri": "https://localhost:8080/" 383 | } 384 | }, 385 | "source": [ 386 | "# inputs = tf.keras.layers.Input(784)\n", 387 | "\n", 388 | "input_layer = tf.keras.layers.InputLayer(784)\n", 389 | "inputs = input_layer._inbound_nodes[0].outputs\n", 390 | "\n", 391 | "outputs = tf.keras.layers.Dense(units=10,\n", 392 | " activation='softmax')(inputs)\n", 393 | "\n", 394 | "input_layer_model = tf.keras.Model(inputs, outputs)\n", 395 | "\n", 396 | "input_layer_model.summary()" 397 | ], 398 | "execution_count": 9, 399 | "outputs": [ 400 | { 401 | "output_type": "stream", 402 | "text": [ 403 | "Model: \"model_1\"\n", 404 | "_________________________________________________________________\n", 405 | "Layer (type) Output Shape Param # \n", 406 | "=================================================================\n", 407 | "input_2 (InputLayer) [(None, 784)] 0 \n", 408 | "_________________________________________________________________\n", 409 | "dense_2 (Dense) (None, 10) 7850 \n", 410 | "=================================================================\n", 411 | "Total params: 7,850\n", 412 | "Trainable params: 7,850\n", 413 | "Non-trainable params: 0\n", 414 | "_________________________________________________________________\n" 415 | ], 416 | "name": "stdout" 417 | } 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "metadata": { 423 | "scrolled": true, 424 | "id": "MsuY6-nHU5yA", 425 | "outputId": "78f71d4a-a340-44e7-f3ab-c4fcaf4302ef", 426 | "colab": { 427 | "base_uri": "https://localhost:8080/" 428 | } 429 | }, 430 | "source": [ 431 | "input_layer_model.compile(loss='sparse_categorical_crossentropy', \n", 432 | " metrics=['accuracy'])\n", 433 | "input_layer_model.fit(X_train, y_train, batch_size=32, epochs=2)" 434 | ], 435 | "execution_count": 10, 436 | "outputs": [ 437 | { 438 | "output_type": "stream", 439 | "text": [ 440 | "Epoch 1/2\n", 441 | "1875/1875 [==============================] - 3s 1ms/step - loss: 0.4364 - accuracy: 0.8832\n", 442 | "Epoch 2/2\n", 443 | "1875/1875 [==============================] - 3s 1ms/step - loss: 0.3020 - accuracy: 0.9161\n" 444 | ], 445 | "name": "stdout" 446 | }, 447 | { 448 | "output_type": "execute_result", 449 | "data": { 450 | "text/plain": [ 451 | "" 452 | ] 453 | }, 454 | "metadata": {}, 455 | "execution_count": 10 456 | } 457 | ] 458 | }, 459 | { 460 | "cell_type": "markdown", 461 | "metadata": { 462 | "id": "tTux1i7bU5yA" 463 | }, 464 | "source": [ 465 | "함수형 API를 사용한 모델은 `layers` 속성에 `InputLayer` 클래스를 포함합니다." 466 | ] 467 | }, 468 | { 469 | "cell_type": "code", 470 | "metadata": { 471 | "id": "6gHU1vi-U5yB", 472 | "outputId": "77a4bfd7-ffea-4a66-d053-bd685654691d", 473 | "colab": { 474 | "base_uri": "https://localhost:8080/" 475 | } 476 | }, 477 | "source": [ 478 | "func_model.layers" 479 | ], 480 | "execution_count": 11, 481 | "outputs": [ 482 | { 483 | "output_type": "execute_result", 484 | "data": { 485 | "text/plain": [ 486 | "[,\n", 487 | " ]" 488 | ] 489 | }, 490 | "metadata": {}, 491 | "execution_count": 11 492 | } 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": { 498 | "id": "cItMzPQ2U5yB" 499 | }, 500 | "source": [ 501 | "하지만 시퀀셜 모델은 `layers` 속성에 `InputLayer` 클래스가 보이지 않습니다." 502 | ] 503 | }, 504 | { 505 | "cell_type": "code", 506 | "metadata": { 507 | "id": "LbJOms9GU5yB", 508 | "outputId": "74a61016-37b4-43d6-9a69-aba423ad7682", 509 | "colab": { 510 | "base_uri": "https://localhost:8080/" 511 | } 512 | }, 513 | "source": [ 514 | "seq_model.layers" 515 | ], 516 | "execution_count": 12, 517 | "outputs": [ 518 | { 519 | "output_type": "execute_result", 520 | "data": { 521 | "text/plain": [ 522 | "[]" 523 | ] 524 | }, 525 | "metadata": {}, 526 | "execution_count": 12 527 | } 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": { 533 | "id": "w9M8YtyOU5yB" 534 | }, 535 | "source": [ 536 | "모델은 감춰진 `_self_tracked_trackables` 속성이 또 있습니다. 여기에서 `InputLayer` 클래스를 확인할 수 있습니다(텐서플로 2.5 이전 버전에서는 `_layers` 속성을 사용합니다)." 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "metadata": { 542 | "id": "-1YPNEUyVNMB", 543 | "outputId": "1f88f3bf-d4f4-405a-fcbb-ccbb539939cb", 544 | "colab": { 545 | "base_uri": "https://localhost:8080/" 546 | } 547 | }, 548 | "source": [ 549 | "seq_model._self_tracked_trackables" 550 | ], 551 | "execution_count": 15, 552 | "outputs": [ 553 | { 554 | "output_type": "execute_result", 555 | "data": { 556 | "text/plain": [ 557 | "[,\n", 558 | " ]" 559 | ] 560 | }, 561 | "metadata": {}, 562 | "execution_count": 15 563 | } 564 | ] 565 | }, 566 | { 567 | "cell_type": "markdown", 568 | "metadata": { 569 | "id": "aicQp3fkU5yB" 570 | }, 571 | "source": [ 572 | "또는 `_input_layers` 속성에서도 확인할 수 있습니다." 573 | ] 574 | }, 575 | { 576 | "cell_type": "code", 577 | "metadata": { 578 | "id": "-W8jyeoMU5yB", 579 | "outputId": "9dc3b4e5-fe48-45d8-a9dc-186c5cd9f1c6", 580 | "colab": { 581 | "base_uri": "https://localhost:8080/" 582 | } 583 | }, 584 | "source": [ 585 | "seq_model._input_layers, func_model._input_layers" 586 | ], 587 | "execution_count": 16, 588 | "outputs": [ 589 | { 590 | "output_type": "execute_result", 591 | "data": { 592 | "text/plain": [ 593 | "([],\n", 594 | " [])" 595 | ] 596 | }, 597 | "metadata": {}, 598 | "execution_count": 16 599 | } 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "metadata": { 605 | "id": "PqRMSkfsU5yC", 606 | "outputId": "8b56322b-25a9-4619-8c70-0fac2cbc2b1e", 607 | "colab": { 608 | "base_uri": "https://localhost:8080/" 609 | } 610 | }, 611 | "source": [ 612 | "seq_model._output_layers, func_model._output_layers" 613 | ], 614 | "execution_count": 17, 615 | "outputs": [ 616 | { 617 | "output_type": "execute_result", 618 | "data": { 619 | "text/plain": [ 620 | "([],\n", 621 | " [])" 622 | ] 623 | }, 624 | "metadata": {}, 625 | "execution_count": 17 626 | } 627 | ] 628 | }, 629 | { 630 | "cell_type": "markdown", 631 | "metadata": { 632 | "id": "6-I50i3xU5yC" 633 | }, 634 | "source": [ 635 | "`Model` 클래스로 만든 `func_model`은 사실 `Functional` 클래스의 객체입니다. `Model` 클래스는 서브클래싱에 사용합니다." 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "metadata": { 641 | "id": "Eia9vw5RU5yC", 642 | "outputId": "37c38096-3b68-456b-c1f1-e3cc1e0299d3", 643 | "colab": { 644 | "base_uri": "https://localhost:8080/" 645 | } 646 | }, 647 | "source": [ 648 | "func_model.__class__" 649 | ], 650 | "execution_count": 18, 651 | "outputs": [ 652 | { 653 | "output_type": "execute_result", 654 | "data": { 655 | "text/plain": [ 656 | "keras.engine.functional.Functional" 657 | ] 658 | }, 659 | "metadata": {}, 660 | "execution_count": 18 661 | } 662 | ] 663 | }, 664 | { 665 | "cell_type": "markdown", 666 | "metadata": { 667 | "id": "iV1mrKLwU5yC" 668 | }, 669 | "source": [ 670 | "시퀀셜 모델은 함수형 모델의 특별한 경우입니다. (`Model` --> `Functional` --> `Sequential`)" 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": { 676 | "id": "lI4sh1V1U5yC" 677 | }, 678 | "source": [ 679 | "### 사용자 정의 층 만들기" 680 | ] 681 | }, 682 | { 683 | "cell_type": "markdown", 684 | "metadata": { 685 | "id": "Gmjr-bOhU5yC" 686 | }, 687 | "source": [ 688 | "`tf.layers.Layer` 클래스를 상속하고 `build()` 메서드에서 가중치를 만든다음 `call()` 메서드에서 연산을 구현합니다." 689 | ] 690 | }, 691 | { 692 | "cell_type": "code", 693 | "metadata": { 694 | "id": "XD-Fqp-3U5yC" 695 | }, 696 | "source": [ 697 | "class MyDense(tf.keras.layers.Layer):\n", 698 | " \n", 699 | " def __init__(self, units, activation=None, **kwargs):\n", 700 | " # units와 activation 매개변수 외에 나머지 변수를 부모 클래스의 생성자로 전달합니다.\n", 701 | " super(MyDense, self).__init__(**kwargs)\n", 702 | " self.units = units\n", 703 | " # 문자열로 미리 정의된 활성화 함수를 선택합니다. e.g., 'softmax', 'relu'\n", 704 | " self.activation = tf.keras.activations.get(activation)\n", 705 | " \n", 706 | " def build(self, input_shape):\n", 707 | " # __call__() 메서드를 호출할 때 호출됩니다. 가중치 생성을 지연합니다.\n", 708 | " # 가중치와 절편을 생성합니다.\n", 709 | " self.kernel = self.add_weight(name='kernel', \n", 710 | " shape=[input_shape[-1], self.units],\n", 711 | " initializer='glorot_uniform' # 케라스의 기본 초기화\n", 712 | " )\n", 713 | " self.bias = self.add_weight(name='bias',\n", 714 | " shape=[self.units],\n", 715 | " initializer='zeros')\n", 716 | " \n", 717 | " def call(self, inputs): # training=None은 training은 배치 정규화나 드롭아웃 같은 경우 사용\n", 718 | " # __call__() 메서드를 호출할 때 호출됩니다.\n", 719 | " # 실제 연산을 수행합니다. [batch_size, units]\n", 720 | " z = tf.matmul(inputs, self.kernel) + self.bias\n", 721 | " if self.activation:\n", 722 | " return self.activation(z)\n", 723 | " return z" 724 | ], 725 | "execution_count": 19, 726 | "outputs": [] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "metadata": { 731 | "scrolled": true, 732 | "id": "Ba4d43vKU5yD", 733 | "outputId": "3514811d-2a0c-411d-a0bb-35d40a905b71", 734 | "colab": { 735 | "base_uri": "https://localhost:8080/" 736 | } 737 | }, 738 | "source": [ 739 | "inputs = tf.keras.layers.Input(784)\n", 740 | "# Layer.__call__() --> MyDense().build() --> Layer.build() --> MyDense().call()\n", 741 | "outputs = MyDense(units=10, activation='softmax')(inputs)\n", 742 | "\n", 743 | "my_dense_model = tf.keras.Model(inputs, outputs)\n", 744 | "\n", 745 | "my_dense_model.summary()" 746 | ], 747 | "execution_count": 20, 748 | "outputs": [ 749 | { 750 | "output_type": "stream", 751 | "text": [ 752 | "Model: \"model_2\"\n", 753 | "_________________________________________________________________\n", 754 | "Layer (type) Output Shape Param # \n", 755 | "=================================================================\n", 756 | "input_3 (InputLayer) [(None, 784)] 0 \n", 757 | "_________________________________________________________________\n", 758 | "my_dense (MyDense) (None, 10) 7850 \n", 759 | "=================================================================\n", 760 | "Total params: 7,850\n", 761 | "Trainable params: 7,850\n", 762 | "Non-trainable params: 0\n", 763 | "_________________________________________________________________\n" 764 | ], 765 | "name": "stdout" 766 | } 767 | ] 768 | }, 769 | { 770 | "cell_type": "code", 771 | "metadata": { 772 | "id": "_DYUeNqeU5yD", 773 | "outputId": "53f266f1-82c9-4857-8836-321113026955", 774 | "colab": { 775 | "base_uri": "https://localhost:8080/" 776 | } 777 | }, 778 | "source": [ 779 | "my_dense_model.compile(loss='sparse_categorical_crossentropy', \n", 780 | " metrics=['accuracy'])\n", 781 | "my_dense_model.fit(X_train, y_train, batch_size=32, epochs=2)" 782 | ], 783 | "execution_count": 21, 784 | "outputs": [ 785 | { 786 | "output_type": "stream", 787 | "text": [ 788 | "Epoch 1/2\n", 789 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.4409 - accuracy: 0.8838\n", 790 | "Epoch 2/2\n", 791 | "1875/1875 [==============================] - 3s 1ms/step - loss: 0.3026 - accuracy: 0.9166\n" 792 | ], 793 | "name": "stdout" 794 | }, 795 | { 796 | "output_type": "execute_result", 797 | "data": { 798 | "text/plain": [ 799 | "" 800 | ] 801 | }, 802 | "metadata": {}, 803 | "execution_count": 21 804 | } 805 | ] 806 | }, 807 | { 808 | "cell_type": "markdown", 809 | "metadata": { 810 | "id": "xwYkLuDPU5yD" 811 | }, 812 | "source": [ 813 | "### 사용자 정의 모델 만들기" 814 | ] 815 | }, 816 | { 817 | "cell_type": "code", 818 | "metadata": { 819 | "id": "Byaow6qdU5yD" 820 | }, 821 | "source": [ 822 | "# fit(), compile(), predict(), evaluate() 등의 메서드 제공\n", 823 | "class MyModel(tf.keras.Model):\n", 824 | " \n", 825 | " def __init__(self):\n", 826 | " super(MyModel, self).__init__()\n", 827 | " self.output_layer = MyDense(units=10, activation='softmax')\n", 828 | " \n", 829 | " def call(self, inputs):\n", 830 | " return self.output_layer(inputs)" 831 | ], 832 | "execution_count": 22, 833 | "outputs": [] 834 | }, 835 | { 836 | "cell_type": "code", 837 | "metadata": { 838 | "scrolled": true, 839 | "id": "7dvJVGZvU5yD", 840 | "outputId": "155141a0-a66e-4b08-bfd5-0b71ec34e692", 841 | "colab": { 842 | "base_uri": "https://localhost:8080/" 843 | } 844 | }, 845 | "source": [ 846 | "my_model = MyModel()\n", 847 | "\n", 848 | "my_model.compile(loss='sparse_categorical_crossentropy', \n", 849 | " metrics=['accuracy'])\n", 850 | "my_model.fit(X_train, y_train, batch_size=32, epochs=2)" 851 | ], 852 | "execution_count": 23, 853 | "outputs": [ 854 | { 855 | "output_type": "stream", 856 | "text": [ 857 | "Epoch 1/2\n", 858 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.4376 - accuracy: 0.8830\n", 859 | "Epoch 2/2\n", 860 | "1875/1875 [==============================] - 3s 1ms/step - loss: 0.3024 - accuracy: 0.9157\n" 861 | ], 862 | "name": "stdout" 863 | }, 864 | { 865 | "output_type": "execute_result", 866 | "data": { 867 | "text/plain": [ 868 | "" 869 | ] 870 | }, 871 | "metadata": {}, 872 | "execution_count": 23 873 | } 874 | ] 875 | }, 876 | { 877 | "cell_type": "markdown", 878 | "metadata": { 879 | "id": "8o-Wrt9WU5yD" 880 | }, 881 | "source": [ 882 | "### 사용자 정의 훈련" 883 | ] 884 | }, 885 | { 886 | "cell_type": "code", 887 | "metadata": { 888 | "id": "VacWTtsnU5yE" 889 | }, 890 | "source": [ 891 | "class MyCustomStep(MyModel):\n", 892 | " \n", 893 | " def train_step(self, data):\n", 894 | " # fit()에서 전달된 데이터\n", 895 | " x, y = data\n", 896 | "\n", 897 | " # 그레이디언트 기록 시작\n", 898 | " with tf.GradientTape() as tape:\n", 899 | " # 정방향 계산\n", 900 | " y_pred = self(x)\n", 901 | " # compile() 메서드에서 지정한 손실 계산\n", 902 | " loss = self.compiled_loss(y, y_pred)\n", 903 | "\n", 904 | " # 훈련가능한 파라미터에 대한 그레이디언트 계산\n", 905 | " gradients = tape.gradient(loss, self.trainable_variables)\n", 906 | " # 파라미터 업데이트\n", 907 | " self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))\n", 908 | " \n", 909 | " # TF 2.4에서는\n", 910 | " # self.optimizer.minimize(loss, self.trainable_variables, tape=tape)\n", 911 | " \n", 912 | " # compile() 메서드에서 지정한 지표 계산\n", 913 | " self.compiled_metrics.update_state(y, y_pred)\n", 914 | " \n", 915 | " # 현재까지 지표와 결괏값을 딕셔너리로 반환\n", 916 | " return {m.name: m.result() for m in self.metrics}" 917 | ], 918 | "execution_count": 24, 919 | "outputs": [] 920 | }, 921 | { 922 | "cell_type": "code", 923 | "metadata": { 924 | "id": "xIfS8tJVU5yE", 925 | "outputId": "fab050e9-0867-4a9f-a7fb-dd3a53ece80a", 926 | "colab": { 927 | "base_uri": "https://localhost:8080/" 928 | } 929 | }, 930 | "source": [ 931 | "my_custom_step = MyCustomStep()\n", 932 | "\n", 933 | "my_custom_step.compile(loss='sparse_categorical_crossentropy', \n", 934 | " metrics=['accuracy'])\n", 935 | "my_custom_step.fit(X_train, y_train, batch_size=32, epochs=2)" 936 | ], 937 | "execution_count": 25, 938 | "outputs": [ 939 | { 940 | "output_type": "stream", 941 | "text": [ 942 | "Epoch 1/2\n", 943 | "1875/1875 [==============================] - 4s 2ms/step - loss: 0.4370 - accuracy: 0.8837\n", 944 | "Epoch 2/2\n", 945 | "1875/1875 [==============================] - 3s 2ms/step - loss: 0.3023 - accuracy: 0.9158\n" 946 | ], 947 | "name": "stdout" 948 | }, 949 | { 950 | "output_type": "execute_result", 951 | "data": { 952 | "text/plain": [ 953 | "" 954 | ] 955 | }, 956 | "metadata": {}, 957 | "execution_count": 25 958 | } 959 | ] 960 | } 961 | ] 962 | } -------------------------------------------------------------------------------- /datasets/housing/README.md: -------------------------------------------------------------------------------- 1 | # California Housing 2 | 3 | ## Source 4 | This dataset is a modified version of the California Housing dataset available from [Luís Torgo's page](http://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html) (University of Porto). Luís Torgo obtained it from the StatLib repository (which is closed now). The dataset may also be downloaded from StatLib mirrors. 5 | 6 | This dataset appeared in a 1997 paper titled *Sparse Spatial Autoregressions* by Pace, R. Kelley and Ronald Barry, published in the *Statistics and Probability Letters* journal. They built it using the 1990 California census data. It contains one row per census block group. A block group is the smallest geographical unit for which the U.S. Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people). 7 | 8 | ## Tweaks 9 | The dataset in this directory is almost identical to the original, with two differences: 10 | 11 | * 207 values were randomly removed from the `total_bedrooms` column, so we can discuss what to do with missing data. 12 | * An additional categorical attribute called `ocean_proximity` was added, indicating (very roughly) whether each block group is near the ocean, near the Bay area, inland or on an island. This allows discussing what to do with categorical data. 13 | 14 | Note that the block groups are called "districts" in the Jupyter notebooks, simply because in some contexts the name "block group" was confusing. 15 | 16 | ## Data description 17 | 18 | >>> housing.info() 19 | 20 | RangeIndex: 20640 entries, 0 to 20639 21 | Data columns (total 10 columns): 22 | longitude 20640 non-null float64 23 | latitude 20640 non-null float64 24 | housing_median_age 20640 non-null float64 25 | total_rooms 20640 non-null float64 26 | total_bedrooms 20433 non-null float64 27 | population 20640 non-null float64 28 | households 20640 non-null float64 29 | median_income 20640 non-null float64 30 | median_house_value 20640 non-null float64 31 | ocean_proximity 20640 non-null object 32 | dtypes: float64(9), object(1) 33 | memory usage: 1.6+ MB 34 | 35 | >>> housing["ocean_proximity"].value_counts() 36 | <1H OCEAN 9136 37 | INLAND 6551 38 | NEAR OCEAN 2658 39 | NEAR BAY 2290 40 | ISLAND 5 41 | Name: ocean_proximity, dtype: int64 42 | 43 | >>> housing.describe() 44 | longitude latitude housing_median_age total_rooms \ 45 | count 16513.000000 16513.000000 16513.000000 16513.000000 46 | mean -119.575972 35.639693 28.652335 2622.347605 47 | std 2.002048 2.138279 12.576306 2138.559393 48 | min -124.350000 32.540000 1.000000 6.000000 49 | 25% -121.800000 33.940000 18.000000 1442.000000 50 | 50% -118.510000 34.260000 29.000000 2119.000000 51 | 75% -118.010000 37.720000 37.000000 3141.000000 52 | max -114.310000 41.950000 52.000000 39320.000000 53 | 54 | total_bedrooms population households median_income 55 | count 16355.000000 16513.000000 16513.000000 16513.000000 56 | mean 534.885112 1419.525465 496.975050 3.875651 57 | std 412.716467 1115.715084 375.737945 1.905088 58 | min 2.000000 3.000000 2.000000 0.499900 59 | 25% 295.000000 784.000000 278.000000 2.566800 60 | 50% 433.000000 1164.000000 408.000000 3.541400 61 | 75% 644.000000 1718.000000 602.000000 4.745000 62 | max 6210.000000 35682.000000 5358.000000 15.000100 63 | 64 | -------------------------------------------------------------------------------- /datasets/housing/housing.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/datasets/housing/housing.tgz -------------------------------------------------------------------------------- /datasets/inception/imagenet_class_names.txt: -------------------------------------------------------------------------------- 1 | n01440764 tench, Tinca tinca 2 | n01443537 goldfish, Carassius auratus 3 | n01484850 great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias 4 | n01491361 tiger shark, Galeocerdo cuvieri 5 | n01494475 hammerhead, hammerhead shark 6 | n01496331 electric ray, crampfish, numbfish, torpedo 7 | n01498041 stingray 8 | n01514668 cock 9 | n01514859 hen 10 | n01518878 ostrich, Struthio camelus 11 | n01530575 brambling, Fringilla montifringilla 12 | n01531178 goldfinch, Carduelis carduelis 13 | n01532829 house finch, linnet, Carpodacus mexicanus 14 | n01534433 junco, snowbird 15 | n01537544 indigo bunting, indigo finch, indigo bird, Passerina cyanea 16 | n01558993 robin, American robin, Turdus migratorius 17 | n01560419 bulbul 18 | n01580077 jay 19 | n01582220 magpie 20 | n01592084 chickadee 21 | n01601694 water ouzel, dipper 22 | n01608432 kite 23 | n01614925 bald eagle, American eagle, Haliaeetus leucocephalus 24 | n01616318 vulture 25 | n01622779 great grey owl, great gray owl, Strix nebulosa 26 | n01629819 European fire salamander, Salamandra salamandra 27 | n01630670 common newt, Triturus vulgaris 28 | n01631663 eft 29 | n01632458 spotted salamander, Ambystoma maculatum 30 | n01632777 axolotl, mud puppy, Ambystoma mexicanum 31 | n01641577 bullfrog, Rana catesbeiana 32 | n01644373 tree frog, tree-frog 33 | n01644900 tailed frog, bell toad, ribbed toad, tailed toad, Ascaphus trui 34 | n01664065 loggerhead, loggerhead turtle, Caretta caretta 35 | n01665541 leatherback turtle, leatherback, leathery turtle, Dermochelys coriacea 36 | n01667114 mud turtle 37 | n01667778 terrapin 38 | n01669191 box turtle, box tortoise 39 | n01675722 banded gecko 40 | n01677366 common iguana, iguana, Iguana iguana 41 | n01682714 American chameleon, anole, Anolis carolinensis 42 | n01685808 whiptail, whiptail lizard 43 | n01687978 agama 44 | n01688243 frilled lizard, Chlamydosaurus kingi 45 | n01689811 alligator lizard 46 | n01692333 Gila monster, Heloderma suspectum 47 | n01693334 green lizard, Lacerta viridis 48 | n01694178 African chameleon, Chamaeleo chamaeleon 49 | n01695060 Komodo dragon, Komodo lizard, dragon lizard, giant lizard, Varanus komodoensis 50 | n01697457 African crocodile, Nile crocodile, Crocodylus niloticus 51 | n01698640 American alligator, Alligator mississipiensis 52 | n01704323 triceratops 53 | n01728572 thunder snake, worm snake, Carphophis amoenus 54 | n01728920 ringneck snake, ring-necked snake, ring snake 55 | n01729322 hognose snake, puff adder, sand viper 56 | n01729977 green snake, grass snake 57 | n01734418 king snake, kingsnake 58 | n01735189 garter snake, grass snake 59 | n01737021 water snake 60 | n01739381 vine snake 61 | n01740131 night snake, Hypsiglena torquata 62 | n01742172 boa constrictor, Constrictor constrictor 63 | n01744401 rock python, rock snake, Python sebae 64 | n01748264 Indian cobra, Naja naja 65 | n01749939 green mamba 66 | n01751748 sea snake 67 | n01753488 horned viper, cerastes, sand viper, horned asp, Cerastes cornutus 68 | n01755581 diamondback, diamondback rattlesnake, Crotalus adamanteus 69 | n01756291 sidewinder, horned rattlesnake, Crotalus cerastes 70 | n01768244 trilobite 71 | n01770081 harvestman, daddy longlegs, Phalangium opilio 72 | n01770393 scorpion 73 | n01773157 black and gold garden spider, Argiope aurantia 74 | n01773549 barn spider, Araneus cavaticus 75 | n01773797 garden spider, Aranea diademata 76 | n01774384 black widow, Latrodectus mactans 77 | n01774750 tarantula 78 | n01775062 wolf spider, hunting spider 79 | n01776313 tick 80 | n01784675 centipede 81 | n01795545 black grouse 82 | n01796340 ptarmigan 83 | n01797886 ruffed grouse, partridge, Bonasa umbellus 84 | n01798484 prairie chicken, prairie grouse, prairie fowl 85 | n01806143 peacock 86 | n01806567 quail 87 | n01807496 partridge 88 | n01817953 African grey, African gray, Psittacus erithacus 89 | n01818515 macaw 90 | n01819313 sulphur-crested cockatoo, Kakatoe galerita, Cacatua galerita 91 | n01820546 lorikeet 92 | n01824575 coucal 93 | n01828970 bee eater 94 | n01829413 hornbill 95 | n01833805 hummingbird 96 | n01843065 jacamar 97 | n01843383 toucan 98 | n01847000 drake 99 | n01855032 red-breasted merganser, Mergus serrator 100 | n01855672 goose 101 | n01860187 black swan, Cygnus atratus 102 | n01871265 tusker 103 | n01872401 echidna, spiny anteater, anteater 104 | n01873310 platypus, duckbill, duckbilled platypus, duck-billed platypus, Ornithorhynchus anatinus 105 | n01877812 wallaby, brush kangaroo 106 | n01882714 koala, koala bear, kangaroo bear, native bear, Phascolarctos cinereus 107 | n01883070 wombat 108 | n01910747 jellyfish 109 | n01914609 sea anemone, anemone 110 | n01917289 brain coral 111 | n01924916 flatworm, platyhelminth 112 | n01930112 nematode, nematode worm, roundworm 113 | n01943899 conch 114 | n01944390 snail 115 | n01945685 slug 116 | n01950731 sea slug, nudibranch 117 | n01955084 chiton, coat-of-mail shell, sea cradle, polyplacophore 118 | n01968897 chambered nautilus, pearly nautilus, nautilus 119 | n01978287 Dungeness crab, Cancer magister 120 | n01978455 rock crab, Cancer irroratus 121 | n01980166 fiddler crab 122 | n01981276 king crab, Alaska crab, Alaskan king crab, Alaska king crab, Paralithodes camtschatica 123 | n01983481 American lobster, Northern lobster, Maine lobster, Homarus americanus 124 | n01984695 spiny lobster, langouste, rock lobster, crawfish, crayfish, sea crawfish 125 | n01985128 crayfish, crawfish, crawdad, crawdaddy 126 | n01986214 hermit crab 127 | n01990800 isopod 128 | n02002556 white stork, Ciconia ciconia 129 | n02002724 black stork, Ciconia nigra 130 | n02006656 spoonbill 131 | n02007558 flamingo 132 | n02009229 little blue heron, Egretta caerulea 133 | n02009912 American egret, great white heron, Egretta albus 134 | n02011460 bittern 135 | n02012849 crane 136 | n02013706 limpkin, Aramus pictus 137 | n02017213 European gallinule, Porphyrio porphyrio 138 | n02018207 American coot, marsh hen, mud hen, water hen, Fulica americana 139 | n02018795 bustard 140 | n02025239 ruddy turnstone, Arenaria interpres 141 | n02027492 red-backed sandpiper, dunlin, Erolia alpina 142 | n02028035 redshank, Tringa totanus 143 | n02033041 dowitcher 144 | n02037110 oystercatcher, oyster catcher 145 | n02051845 pelican 146 | n02056570 king penguin, Aptenodytes patagonica 147 | n02058221 albatross, mollymawk 148 | n02066245 grey whale, gray whale, devilfish, Eschrichtius gibbosus, Eschrichtius robustus 149 | n02071294 killer whale, killer, orca, grampus, sea wolf, Orcinus orca 150 | n02074367 dugong, Dugong dugon 151 | n02077923 sea lion 152 | n02085620 Chihuahua 153 | n02085782 Japanese spaniel 154 | n02085936 Maltese dog, Maltese terrier, Maltese 155 | n02086079 Pekinese, Pekingese, Peke 156 | n02086240 Shih-Tzu 157 | n02086646 Blenheim spaniel 158 | n02086910 papillon 159 | n02087046 toy terrier 160 | n02087394 Rhodesian ridgeback 161 | n02088094 Afghan hound, Afghan 162 | n02088238 basset, basset hound 163 | n02088364 beagle 164 | n02088466 bloodhound, sleuthhound 165 | n02088632 bluetick 166 | n02089078 black-and-tan coonhound 167 | n02089867 Walker hound, Walker foxhound 168 | n02089973 English foxhound 169 | n02090379 redbone 170 | n02090622 borzoi, Russian wolfhound 171 | n02090721 Irish wolfhound 172 | n02091032 Italian greyhound 173 | n02091134 whippet 174 | n02091244 Ibizan hound, Ibizan Podenco 175 | n02091467 Norwegian elkhound, elkhound 176 | n02091635 otterhound, otter hound 177 | n02091831 Saluki, gazelle hound 178 | n02092002 Scottish deerhound, deerhound 179 | n02092339 Weimaraner 180 | n02093256 Staffordshire bullterrier, Staffordshire bull terrier 181 | n02093428 American Staffordshire terrier, Staffordshire terrier, American pit bull terrier, pit bull terrier 182 | n02093647 Bedlington terrier 183 | n02093754 Border terrier 184 | n02093859 Kerry blue terrier 185 | n02093991 Irish terrier 186 | n02094114 Norfolk terrier 187 | n02094258 Norwich terrier 188 | n02094433 Yorkshire terrier 189 | n02095314 wire-haired fox terrier 190 | n02095570 Lakeland terrier 191 | n02095889 Sealyham terrier, Sealyham 192 | n02096051 Airedale, Airedale terrier 193 | n02096177 cairn, cairn terrier 194 | n02096294 Australian terrier 195 | n02096437 Dandie Dinmont, Dandie Dinmont terrier 196 | n02096585 Boston bull, Boston terrier 197 | n02097047 miniature schnauzer 198 | n02097130 giant schnauzer 199 | n02097209 standard schnauzer 200 | n02097298 Scotch terrier, Scottish terrier, Scottie 201 | n02097474 Tibetan terrier, chrysanthemum dog 202 | n02097658 silky terrier, Sydney silky 203 | n02098105 soft-coated wheaten terrier 204 | n02098286 West Highland white terrier 205 | n02098413 Lhasa, Lhasa apso 206 | n02099267 flat-coated retriever 207 | n02099429 curly-coated retriever 208 | n02099601 golden retriever 209 | n02099712 Labrador retriever 210 | n02099849 Chesapeake Bay retriever 211 | n02100236 German short-haired pointer 212 | n02100583 vizsla, Hungarian pointer 213 | n02100735 English setter 214 | n02100877 Irish setter, red setter 215 | n02101006 Gordon setter 216 | n02101388 Brittany spaniel 217 | n02101556 clumber, clumber spaniel 218 | n02102040 English springer, English springer spaniel 219 | n02102177 Welsh springer spaniel 220 | n02102318 cocker spaniel, English cocker spaniel, cocker 221 | n02102480 Sussex spaniel 222 | n02102973 Irish water spaniel 223 | n02104029 kuvasz 224 | n02104365 schipperke 225 | n02105056 groenendael 226 | n02105162 malinois 227 | n02105251 briard 228 | n02105412 kelpie 229 | n02105505 komondor 230 | n02105641 Old English sheepdog, bobtail 231 | n02105855 Shetland sheepdog, Shetland sheep dog, Shetland 232 | n02106030 collie 233 | n02106166 Border collie 234 | n02106382 Bouvier des Flandres, Bouviers des Flandres 235 | n02106550 Rottweiler 236 | n02106662 German shepherd, German shepherd dog, German police dog, alsatian 237 | n02107142 Doberman, Doberman pinscher 238 | n02107312 miniature pinscher 239 | n02107574 Greater Swiss Mountain dog 240 | n02107683 Bernese mountain dog 241 | n02107908 Appenzeller 242 | n02108000 EntleBucher 243 | n02108089 boxer 244 | n02108422 bull mastiff 245 | n02108551 Tibetan mastiff 246 | n02108915 French bulldog 247 | n02109047 Great Dane 248 | n02109525 Saint Bernard, St Bernard 249 | n02109961 Eskimo dog, husky 250 | n02110063 malamute, malemute, Alaskan malamute 251 | n02110185 Siberian husky 252 | n02110341 dalmatian, coach dog, carriage dog 253 | n02110627 affenpinscher, monkey pinscher, monkey dog 254 | n02110806 basenji 255 | n02110958 pug, pug-dog 256 | n02111129 Leonberg 257 | n02111277 Newfoundland, Newfoundland dog 258 | n02111500 Great Pyrenees 259 | n02111889 Samoyed, Samoyede 260 | n02112018 Pomeranian 261 | n02112137 chow, chow chow 262 | n02112350 keeshond 263 | n02112706 Brabancon griffon 264 | n02113023 Pembroke, Pembroke Welsh corgi 265 | n02113186 Cardigan, Cardigan Welsh corgi 266 | n02113624 toy poodle 267 | n02113712 miniature poodle 268 | n02113799 standard poodle 269 | n02113978 Mexican hairless 270 | n02114367 timber wolf, grey wolf, gray wolf, Canis lupus 271 | n02114548 white wolf, Arctic wolf, Canis lupus tundrarum 272 | n02114712 red wolf, maned wolf, Canis rufus, Canis niger 273 | n02114855 coyote, prairie wolf, brush wolf, Canis latrans 274 | n02115641 dingo, warrigal, warragal, Canis dingo 275 | n02115913 dhole, Cuon alpinus 276 | n02116738 African hunting dog, hyena dog, Cape hunting dog, Lycaon pictus 277 | n02117135 hyena, hyaena 278 | n02119022 red fox, Vulpes vulpes 279 | n02119789 kit fox, Vulpes macrotis 280 | n02120079 Arctic fox, white fox, Alopex lagopus 281 | n02120505 grey fox, gray fox, Urocyon cinereoargenteus 282 | n02123045 tabby, tabby cat 283 | n02123159 tiger cat 284 | n02123394 Persian cat 285 | n02123597 Siamese cat, Siamese 286 | n02124075 Egyptian cat 287 | n02125311 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor 288 | n02127052 lynx, catamount 289 | n02128385 leopard, Panthera pardus 290 | n02128757 snow leopard, ounce, Panthera uncia 291 | n02128925 jaguar, panther, Panthera onca, Felis onca 292 | n02129165 lion, king of beasts, Panthera leo 293 | n02129604 tiger, Panthera tigris 294 | n02130308 cheetah, chetah, Acinonyx jubatus 295 | n02132136 brown bear, bruin, Ursus arctos 296 | n02133161 American black bear, black bear, Ursus americanus, Euarctos americanus 297 | n02134084 ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus 298 | n02134418 sloth bear, Melursus ursinus, Ursus ursinus 299 | n02137549 mongoose 300 | n02138441 meerkat, mierkat 301 | n02165105 tiger beetle 302 | n02165456 ladybug, ladybeetle, lady beetle, ladybird, ladybird beetle 303 | n02167151 ground beetle, carabid beetle 304 | n02168699 long-horned beetle, longicorn, longicorn beetle 305 | n02169497 leaf beetle, chrysomelid 306 | n02172182 dung beetle 307 | n02174001 rhinoceros beetle 308 | n02177972 weevil 309 | n02190166 fly 310 | n02206856 bee 311 | n02219486 ant, emmet, pismire 312 | n02226429 grasshopper, hopper 313 | n02229544 cricket 314 | n02231487 walking stick, walkingstick, stick insect 315 | n02233338 cockroach, roach 316 | n02236044 mantis, mantid 317 | n02256656 cicada, cicala 318 | n02259212 leafhopper 319 | n02264363 lacewing, lacewing fly 320 | n02268443 dragonfly, darning needle, devil's darning needle, sewing needle, snake feeder, snake doctor, mosquito hawk, skeeter hawk 321 | n02268853 damselfly 322 | n02276258 admiral 323 | n02277742 ringlet, ringlet butterfly 324 | n02279972 monarch, monarch butterfly, milkweed butterfly, Danaus plexippus 325 | n02280649 cabbage butterfly 326 | n02281406 sulphur butterfly, sulfur butterfly 327 | n02281787 lycaenid, lycaenid butterfly 328 | n02317335 starfish, sea star 329 | n02319095 sea urchin 330 | n02321529 sea cucumber, holothurian 331 | n02325366 wood rabbit, cottontail, cottontail rabbit 332 | n02326432 hare 333 | n02328150 Angora, Angora rabbit 334 | n02342885 hamster 335 | n02346627 porcupine, hedgehog 336 | n02356798 fox squirrel, eastern fox squirrel, Sciurus niger 337 | n02361337 marmot 338 | n02363005 beaver 339 | n02364673 guinea pig, Cavia cobaya 340 | n02389026 sorrel 341 | n02391049 zebra 342 | n02395406 hog, pig, grunter, squealer, Sus scrofa 343 | n02396427 wild boar, boar, Sus scrofa 344 | n02397096 warthog 345 | n02398521 hippopotamus, hippo, river horse, Hippopotamus amphibius 346 | n02403003 ox 347 | n02408429 water buffalo, water ox, Asiatic buffalo, Bubalus bubalis 348 | n02410509 bison 349 | n02412080 ram, tup 350 | n02415577 bighorn, bighorn sheep, cimarron, Rocky Mountain bighorn, Rocky Mountain sheep, Ovis canadensis 351 | n02417914 ibex, Capra ibex 352 | n02422106 hartebeest 353 | n02422699 impala, Aepyceros melampus 354 | n02423022 gazelle 355 | n02437312 Arabian camel, dromedary, Camelus dromedarius 356 | n02437616 llama 357 | n02441942 weasel 358 | n02442845 mink 359 | n02443114 polecat, fitch, foulmart, foumart, Mustela putorius 360 | n02443484 black-footed ferret, ferret, Mustela nigripes 361 | n02444819 otter 362 | n02445715 skunk, polecat, wood pussy 363 | n02447366 badger 364 | n02454379 armadillo 365 | n02457408 three-toed sloth, ai, Bradypus tridactylus 366 | n02480495 orangutan, orang, orangutang, Pongo pygmaeus 367 | n02480855 gorilla, Gorilla gorilla 368 | n02481823 chimpanzee, chimp, Pan troglodytes 369 | n02483362 gibbon, Hylobates lar 370 | n02483708 siamang, Hylobates syndactylus, Symphalangus syndactylus 371 | n02484975 guenon, guenon monkey 372 | n02486261 patas, hussar monkey, Erythrocebus patas 373 | n02486410 baboon 374 | n02487347 macaque 375 | n02488291 langur 376 | n02488702 colobus, colobus monkey 377 | n02489166 proboscis monkey, Nasalis larvatus 378 | n02490219 marmoset 379 | n02492035 capuchin, ringtail, Cebus capucinus 380 | n02492660 howler monkey, howler 381 | n02493509 titi, titi monkey 382 | n02493793 spider monkey, Ateles geoffroyi 383 | n02494079 squirrel monkey, Saimiri sciureus 384 | n02497673 Madagascar cat, ring-tailed lemur, Lemur catta 385 | n02500267 indri, indris, Indri indri, Indri brevicaudatus 386 | n02504013 Indian elephant, Elephas maximus 387 | n02504458 African elephant, Loxodonta africana 388 | n02509815 lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens 389 | n02510455 giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca 390 | n02514041 barracouta, snoek 391 | n02526121 eel 392 | n02536864 coho, cohoe, coho salmon, blue jack, silver salmon, Oncorhynchus kisutch 393 | n02606052 rock beauty, Holocanthus tricolor 394 | n02607072 anemone fish 395 | n02640242 sturgeon 396 | n02641379 gar, garfish, garpike, billfish, Lepisosteus osseus 397 | n02643566 lionfish 398 | n02655020 puffer, pufferfish, blowfish, globefish 399 | n02666196 abacus 400 | n02667093 abaya 401 | n02669723 academic gown, academic robe, judge's robe 402 | n02672831 accordion, piano accordion, squeeze box 403 | n02676566 acoustic guitar 404 | n02687172 aircraft carrier, carrier, flattop, attack aircraft carrier 405 | n02690373 airliner 406 | n02692877 airship, dirigible 407 | n02699494 altar 408 | n02701002 ambulance 409 | n02704792 amphibian, amphibious vehicle 410 | n02708093 analog clock 411 | n02727426 apiary, bee house 412 | n02730930 apron 413 | n02747177 ashcan, trash can, garbage can, wastebin, ash bin, ash-bin, ashbin, dustbin, trash barrel, trash bin 414 | n02749479 assault rifle, assault gun 415 | n02769748 backpack, back pack, knapsack, packsack, rucksack, haversack 416 | n02776631 bakery, bakeshop, bakehouse 417 | n02777292 balance beam, beam 418 | n02782093 balloon 419 | n02783161 ballpoint, ballpoint pen, ballpen, Biro 420 | n02786058 Band Aid 421 | n02787622 banjo 422 | n02788148 bannister, banister, balustrade, balusters, handrail 423 | n02790996 barbell 424 | n02791124 barber chair 425 | n02791270 barbershop 426 | n02793495 barn 427 | n02794156 barometer 428 | n02795169 barrel, cask 429 | n02797295 barrow, garden cart, lawn cart, wheelbarrow 430 | n02799071 baseball 431 | n02802426 basketball 432 | n02804414 bassinet 433 | n02804610 bassoon 434 | n02807133 bathing cap, swimming cap 435 | n02808304 bath towel 436 | n02808440 bathtub, bathing tub, bath, tub 437 | n02814533 beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon 438 | n02814860 beacon, lighthouse, beacon light, pharos 439 | n02815834 beaker 440 | n02817516 bearskin, busby, shako 441 | n02823428 beer bottle 442 | n02823750 beer glass 443 | n02825657 bell cote, bell cot 444 | n02834397 bib 445 | n02835271 bicycle-built-for-two, tandem bicycle, tandem 446 | n02837789 bikini, two-piece 447 | n02840245 binder, ring-binder 448 | n02841315 binoculars, field glasses, opera glasses 449 | n02843684 birdhouse 450 | n02859443 boathouse 451 | n02860847 bobsled, bobsleigh, bob 452 | n02865351 bolo tie, bolo, bola tie, bola 453 | n02869837 bonnet, poke bonnet 454 | n02870880 bookcase 455 | n02871525 bookshop, bookstore, bookstall 456 | n02877765 bottlecap 457 | n02879718 bow 458 | n02883205 bow tie, bow-tie, bowtie 459 | n02892201 brass, memorial tablet, plaque 460 | n02892767 brassiere, bra, bandeau 461 | n02894605 breakwater, groin, groyne, mole, bulwark, seawall, jetty 462 | n02895154 breastplate, aegis, egis 463 | n02906734 broom 464 | n02909870 bucket, pail 465 | n02910353 buckle 466 | n02916936 bulletproof vest 467 | n02917067 bullet train, bullet 468 | n02927161 butcher shop, meat market 469 | n02930766 cab, hack, taxi, taxicab 470 | n02939185 caldron, cauldron 471 | n02948072 candle, taper, wax light 472 | n02950826 cannon 473 | n02951358 canoe 474 | n02951585 can opener, tin opener 475 | n02963159 cardigan 476 | n02965783 car mirror 477 | n02966193 carousel, carrousel, merry-go-round, roundabout, whirligig 478 | n02966687 carpenter's kit, tool kit 479 | n02971356 carton 480 | n02974003 car wheel 481 | n02977058 cash machine, cash dispenser, automated teller machine, automatic teller machine, automated teller, automatic teller, ATM 482 | n02978881 cassette 483 | n02979186 cassette player 484 | n02980441 castle 485 | n02981792 catamaran 486 | n02988304 CD player 487 | n02992211 cello, violoncello 488 | n02992529 cellular telephone, cellular phone, cellphone, cell, mobile phone 489 | n02999410 chain 490 | n03000134 chainlink fence 491 | n03000247 chain mail, ring mail, mail, chain armor, chain armour, ring armor, ring armour 492 | n03000684 chain saw, chainsaw 493 | n03014705 chest 494 | n03016953 chiffonier, commode 495 | n03017168 chime, bell, gong 496 | n03018349 china cabinet, china closet 497 | n03026506 Christmas stocking 498 | n03028079 church, church building 499 | n03032252 cinema, movie theater, movie theatre, movie house, picture palace 500 | n03041632 cleaver, meat cleaver, chopper 501 | n03042490 cliff dwelling 502 | n03045698 cloak 503 | n03047690 clog, geta, patten, sabot 504 | n03062245 cocktail shaker 505 | n03063599 coffee mug 506 | n03063689 coffeepot 507 | n03065424 coil, spiral, volute, whorl, helix 508 | n03075370 combination lock 509 | n03085013 computer keyboard, keypad 510 | n03089624 confectionery, confectionary, candy store 511 | n03095699 container ship, containership, container vessel 512 | n03100240 convertible 513 | n03109150 corkscrew, bottle screw 514 | n03110669 cornet, horn, trumpet, trump 515 | n03124043 cowboy boot 516 | n03124170 cowboy hat, ten-gallon hat 517 | n03125729 cradle 518 | n03126707 crane 519 | n03127747 crash helmet 520 | n03127925 crate 521 | n03131574 crib, cot 522 | n03133878 Crock Pot 523 | n03134739 croquet ball 524 | n03141823 crutch 525 | n03146219 cuirass 526 | n03160309 dam, dike, dyke 527 | n03179701 desk 528 | n03180011 desktop computer 529 | n03187595 dial telephone, dial phone 530 | n03188531 diaper, nappy, napkin 531 | n03196217 digital clock 532 | n03197337 digital watch 533 | n03201208 dining table, board 534 | n03207743 dishrag, dishcloth 535 | n03207941 dishwasher, dish washer, dishwashing machine 536 | n03208938 disk brake, disc brake 537 | n03216828 dock, dockage, docking facility 538 | n03218198 dogsled, dog sled, dog sleigh 539 | n03220513 dome 540 | n03223299 doormat, welcome mat 541 | n03240683 drilling platform, offshore rig 542 | n03249569 drum, membranophone, tympan 543 | n03250847 drumstick 544 | n03255030 dumbbell 545 | n03259280 Dutch oven 546 | n03271574 electric fan, blower 547 | n03272010 electric guitar 548 | n03272562 electric locomotive 549 | n03290653 entertainment center 550 | n03291819 envelope 551 | n03297495 espresso maker 552 | n03314780 face powder 553 | n03325584 feather boa, boa 554 | n03337140 file, file cabinet, filing cabinet 555 | n03344393 fireboat 556 | n03345487 fire engine, fire truck 557 | n03347037 fire screen, fireguard 558 | n03355925 flagpole, flagstaff 559 | n03372029 flute, transverse flute 560 | n03376595 folding chair 561 | n03379051 football helmet 562 | n03384352 forklift 563 | n03388043 fountain 564 | n03388183 fountain pen 565 | n03388549 four-poster 566 | n03393912 freight car 567 | n03394916 French horn, horn 568 | n03400231 frying pan, frypan, skillet 569 | n03404251 fur coat 570 | n03417042 garbage truck, dustcart 571 | n03424325 gasmask, respirator, gas helmet 572 | n03425413 gas pump, gasoline pump, petrol pump, island dispenser 573 | n03443371 goblet 574 | n03444034 go-kart 575 | n03445777 golf ball 576 | n03445924 golfcart, golf cart 577 | n03447447 gondola 578 | n03447721 gong, tam-tam 579 | n03450230 gown 580 | n03452741 grand piano, grand 581 | n03457902 greenhouse, nursery, glasshouse 582 | n03459775 grille, radiator grille 583 | n03461385 grocery store, grocery, food market, market 584 | n03467068 guillotine 585 | n03476684 hair slide 586 | n03476991 hair spray 587 | n03478589 half track 588 | n03481172 hammer 589 | n03482405 hamper 590 | n03483316 hand blower, blow dryer, blow drier, hair dryer, hair drier 591 | n03485407 hand-held computer, hand-held microcomputer 592 | n03485794 handkerchief, hankie, hanky, hankey 593 | n03492542 hard disc, hard disk, fixed disk 594 | n03494278 harmonica, mouth organ, harp, mouth harp 595 | n03495258 harp 596 | n03496892 harvester, reaper 597 | n03498962 hatchet 598 | n03527444 holster 599 | n03529860 home theater, home theatre 600 | n03530642 honeycomb 601 | n03532672 hook, claw 602 | n03534580 hoopskirt, crinoline 603 | n03535780 horizontal bar, high bar 604 | n03538406 horse cart, horse-cart 605 | n03544143 hourglass 606 | n03584254 iPod 607 | n03584829 iron, smoothing iron 608 | n03590841 jack-o'-lantern 609 | n03594734 jean, blue jean, denim 610 | n03594945 jeep, landrover 611 | n03595614 jersey, T-shirt, tee shirt 612 | n03598930 jigsaw puzzle 613 | n03599486 jinrikisha, ricksha, rickshaw 614 | n03602883 joystick 615 | n03617480 kimono 616 | n03623198 knee pad 617 | n03627232 knot 618 | n03630383 lab coat, laboratory coat 619 | n03633091 ladle 620 | n03637318 lampshade, lamp shade 621 | n03642806 laptop, laptop computer 622 | n03649909 lawn mower, mower 623 | n03657121 lens cap, lens cover 624 | n03658185 letter opener, paper knife, paperknife 625 | n03661043 library 626 | n03662601 lifeboat 627 | n03666591 lighter, light, igniter, ignitor 628 | n03670208 limousine, limo 629 | n03673027 liner, ocean liner 630 | n03676483 lipstick, lip rouge 631 | n03680355 Loafer 632 | n03690938 lotion 633 | n03691459 loudspeaker, speaker, speaker unit, loudspeaker system, speaker system 634 | n03692522 loupe, jeweler's loupe 635 | n03697007 lumbermill, sawmill 636 | n03706229 magnetic compass 637 | n03709823 mailbag, postbag 638 | n03710193 mailbox, letter box 639 | n03710637 maillot 640 | n03710721 maillot, tank suit 641 | n03717622 manhole cover 642 | n03720891 maraca 643 | n03721384 marimba, xylophone 644 | n03724870 mask 645 | n03729826 matchstick 646 | n03733131 maypole 647 | n03733281 maze, labyrinth 648 | n03733805 measuring cup 649 | n03742115 medicine chest, medicine cabinet 650 | n03743016 megalith, megalithic structure 651 | n03759954 microphone, mike 652 | n03761084 microwave, microwave oven 653 | n03763968 military uniform 654 | n03764736 milk can 655 | n03769881 minibus 656 | n03770439 miniskirt, mini 657 | n03770679 minivan 658 | n03773504 missile 659 | n03775071 mitten 660 | n03775546 mixing bowl 661 | n03776460 mobile home, manufactured home 662 | n03777568 Model T 663 | n03777754 modem 664 | n03781244 monastery 665 | n03782006 monitor 666 | n03785016 moped 667 | n03786901 mortar 668 | n03787032 mortarboard 669 | n03788195 mosque 670 | n03788365 mosquito net 671 | n03791053 motor scooter, scooter 672 | n03792782 mountain bike, all-terrain bike, off-roader 673 | n03792972 mountain tent 674 | n03793489 mouse, computer mouse 675 | n03794056 mousetrap 676 | n03796401 moving van 677 | n03803284 muzzle 678 | n03804744 nail 679 | n03814639 neck brace 680 | n03814906 necklace 681 | n03825788 nipple 682 | n03832673 notebook, notebook computer 683 | n03837869 obelisk 684 | n03838899 oboe, hautboy, hautbois 685 | n03840681 ocarina, sweet potato 686 | n03841143 odometer, hodometer, mileometer, milometer 687 | n03843555 oil filter 688 | n03854065 organ, pipe organ 689 | n03857828 oscilloscope, scope, cathode-ray oscilloscope, CRO 690 | n03866082 overskirt 691 | n03868242 oxcart 692 | n03868863 oxygen mask 693 | n03871628 packet 694 | n03873416 paddle, boat paddle 695 | n03874293 paddlewheel, paddle wheel 696 | n03874599 padlock 697 | n03876231 paintbrush 698 | n03877472 pajama, pyjama, pj's, jammies 699 | n03877845 palace 700 | n03884397 panpipe, pandean pipe, syrinx 701 | n03887697 paper towel 702 | n03888257 parachute, chute 703 | n03888605 parallel bars, bars 704 | n03891251 park bench 705 | n03891332 parking meter 706 | n03895866 passenger car, coach, carriage 707 | n03899768 patio, terrace 708 | n03902125 pay-phone, pay-station 709 | n03903868 pedestal, plinth, footstall 710 | n03908618 pencil box, pencil case 711 | n03908714 pencil sharpener 712 | n03916031 perfume, essence 713 | n03920288 Petri dish 714 | n03924679 photocopier 715 | n03929660 pick, plectrum, plectron 716 | n03929855 pickelhaube 717 | n03930313 picket fence, paling 718 | n03930630 pickup, pickup truck 719 | n03933933 pier 720 | n03935335 piggy bank, penny bank 721 | n03937543 pill bottle 722 | n03938244 pillow 723 | n03942813 ping-pong ball 724 | n03944341 pinwheel 725 | n03947888 pirate, pirate ship 726 | n03950228 pitcher, ewer 727 | n03954731 plane, carpenter's plane, woodworking plane 728 | n03956157 planetarium 729 | n03958227 plastic bag 730 | n03961711 plate rack 731 | n03967562 plow, plough 732 | n03970156 plunger, plumber's helper 733 | n03976467 Polaroid camera, Polaroid Land camera 734 | n03976657 pole 735 | n03977966 police van, police wagon, paddy wagon, patrol wagon, wagon, black Maria 736 | n03980874 poncho 737 | n03982430 pool table, billiard table, snooker table 738 | n03983396 pop bottle, soda bottle 739 | n03991062 pot, flowerpot 740 | n03992509 potter's wheel 741 | n03995372 power drill 742 | n03998194 prayer rug, prayer mat 743 | n04004767 printer 744 | n04005630 prison, prison house 745 | n04008634 projectile, missile 746 | n04009552 projector 747 | n04019541 puck, hockey puck 748 | n04023962 punching bag, punch bag, punching ball, punchball 749 | n04026417 purse 750 | n04033901 quill, quill pen 751 | n04033995 quilt, comforter, comfort, puff 752 | n04037443 racer, race car, racing car 753 | n04039381 racket, racquet 754 | n04040759 radiator 755 | n04041544 radio, wireless 756 | n04044716 radio telescope, radio reflector 757 | n04049303 rain barrel 758 | n04065272 recreational vehicle, RV, R.V. 759 | n04067472 reel 760 | n04069434 reflex camera 761 | n04070727 refrigerator, icebox 762 | n04074963 remote control, remote 763 | n04081281 restaurant, eating house, eating place, eatery 764 | n04086273 revolver, six-gun, six-shooter 765 | n04090263 rifle 766 | n04099969 rocking chair, rocker 767 | n04111531 rotisserie 768 | n04116512 rubber eraser, rubber, pencil eraser 769 | n04118538 rugby ball 770 | n04118776 rule, ruler 771 | n04120489 running shoe 772 | n04125021 safe 773 | n04127249 safety pin 774 | n04131690 saltshaker, salt shaker 775 | n04133789 sandal 776 | n04136333 sarong 777 | n04141076 sax, saxophone 778 | n04141327 scabbard 779 | n04141975 scale, weighing machine 780 | n04146614 school bus 781 | n04147183 schooner 782 | n04149813 scoreboard 783 | n04152593 screen, CRT screen 784 | n04153751 screw 785 | n04154565 screwdriver 786 | n04162706 seat belt, seatbelt 787 | n04179913 sewing machine 788 | n04192698 shield, buckler 789 | n04200800 shoe shop, shoe-shop, shoe store 790 | n04201297 shoji 791 | n04204238 shopping basket 792 | n04204347 shopping cart 793 | n04208210 shovel 794 | n04209133 shower cap 795 | n04209239 shower curtain 796 | n04228054 ski 797 | n04229816 ski mask 798 | n04235860 sleeping bag 799 | n04238763 slide rule, slipstick 800 | n04239074 sliding door 801 | n04243546 slot, one-armed bandit 802 | n04251144 snorkel 803 | n04252077 snowmobile 804 | n04252225 snowplow, snowplough 805 | n04254120 soap dispenser 806 | n04254680 soccer ball 807 | n04254777 sock 808 | n04258138 solar dish, solar collector, solar furnace 809 | n04259630 sombrero 810 | n04263257 soup bowl 811 | n04264628 space bar 812 | n04265275 space heater 813 | n04266014 space shuttle 814 | n04270147 spatula 815 | n04273569 speedboat 816 | n04275548 spider web, spider's web 817 | n04277352 spindle 818 | n04285008 sports car, sport car 819 | n04286575 spotlight, spot 820 | n04296562 stage 821 | n04310018 steam locomotive 822 | n04311004 steel arch bridge 823 | n04311174 steel drum 824 | n04317175 stethoscope 825 | n04325704 stole 826 | n04326547 stone wall 827 | n04328186 stopwatch, stop watch 828 | n04330267 stove 829 | n04332243 strainer 830 | n04335435 streetcar, tram, tramcar, trolley, trolley car 831 | n04336792 stretcher 832 | n04344873 studio couch, day bed 833 | n04346328 stupa, tope 834 | n04347754 submarine, pigboat, sub, U-boat 835 | n04350905 suit, suit of clothes 836 | n04355338 sundial 837 | n04355933 sunglass 838 | n04356056 sunglasses, dark glasses, shades 839 | n04357314 sunscreen, sunblock, sun blocker 840 | n04366367 suspension bridge 841 | n04367480 swab, swob, mop 842 | n04370456 sweatshirt 843 | n04371430 swimming trunks, bathing trunks 844 | n04371774 swing 845 | n04372370 switch, electric switch, electrical switch 846 | n04376876 syringe 847 | n04380533 table lamp 848 | n04389033 tank, army tank, armored combat vehicle, armoured combat vehicle 849 | n04392985 tape player 850 | n04398044 teapot 851 | n04399382 teddy, teddy bear 852 | n04404412 television, television system 853 | n04409515 tennis ball 854 | n04417672 thatch, thatched roof 855 | n04418357 theater curtain, theatre curtain 856 | n04423845 thimble 857 | n04428191 thresher, thrasher, threshing machine 858 | n04429376 throne 859 | n04435653 tile roof 860 | n04442312 toaster 861 | n04443257 tobacco shop, tobacconist shop, tobacconist 862 | n04447861 toilet seat 863 | n04456115 torch 864 | n04458633 totem pole 865 | n04461696 tow truck, tow car, wrecker 866 | n04462240 toyshop 867 | n04465501 tractor 868 | n04467665 trailer truck, tractor trailer, trucking rig, rig, articulated lorry, semi 869 | n04476259 tray 870 | n04479046 trench coat 871 | n04482393 tricycle, trike, velocipede 872 | n04483307 trimaran 873 | n04485082 tripod 874 | n04486054 triumphal arch 875 | n04487081 trolleybus, trolley coach, trackless trolley 876 | n04487394 trombone 877 | n04493381 tub, vat 878 | n04501370 turnstile 879 | n04505470 typewriter keyboard 880 | n04507155 umbrella 881 | n04509417 unicycle, monocycle 882 | n04515003 upright, upright piano 883 | n04517823 vacuum, vacuum cleaner 884 | n04522168 vase 885 | n04523525 vault 886 | n04525038 velvet 887 | n04525305 vending machine 888 | n04532106 vestment 889 | n04532670 viaduct 890 | n04536866 violin, fiddle 891 | n04540053 volleyball 892 | n04542943 waffle iron 893 | n04548280 wall clock 894 | n04548362 wallet, billfold, notecase, pocketbook 895 | n04550184 wardrobe, closet, press 896 | n04552348 warplane, military plane 897 | n04553703 washbasin, handbasin, washbowl, lavabo, wash-hand basin 898 | n04554684 washer, automatic washer, washing machine 899 | n04557648 water bottle 900 | n04560804 water jug 901 | n04562935 water tower 902 | n04579145 whiskey jug 903 | n04579432 whistle 904 | n04584207 wig 905 | n04589890 window screen 906 | n04590129 window shade 907 | n04591157 Windsor tie 908 | n04591713 wine bottle 909 | n04592741 wing 910 | n04596742 wok 911 | n04597913 wooden spoon 912 | n04599235 wool, woolen, woollen 913 | n04604644 worm fence, snake fence, snake-rail fence, Virginia fence 914 | n04606251 wreck 915 | n04612504 yawl 916 | n04613696 yurt 917 | n06359193 web site, website, internet site, site 918 | n06596364 comic book 919 | n06785654 crossword puzzle, crossword 920 | n06794110 street sign 921 | n06874185 traffic light, traffic signal, stoplight 922 | n07248320 book jacket, dust cover, dust jacket, dust wrapper 923 | n07565083 menu 924 | n07579787 plate 925 | n07583066 guacamole 926 | n07584110 consomme 927 | n07590611 hot pot, hotpot 928 | n07613480 trifle 929 | n07614500 ice cream, icecream 930 | n07615774 ice lolly, lolly, lollipop, popsicle 931 | n07684084 French loaf 932 | n07693725 bagel, beigel 933 | n07695742 pretzel 934 | n07697313 cheeseburger 935 | n07697537 hotdog, hot dog, red hot 936 | n07711569 mashed potato 937 | n07714571 head cabbage 938 | n07714990 broccoli 939 | n07715103 cauliflower 940 | n07716358 zucchini, courgette 941 | n07716906 spaghetti squash 942 | n07717410 acorn squash 943 | n07717556 butternut squash 944 | n07718472 cucumber, cuke 945 | n07718747 artichoke, globe artichoke 946 | n07720875 bell pepper 947 | n07730033 cardoon 948 | n07734744 mushroom 949 | n07742313 Granny Smith 950 | n07745940 strawberry 951 | n07747607 orange 952 | n07749582 lemon 953 | n07753113 fig 954 | n07753275 pineapple, ananas 955 | n07753592 banana 956 | n07754684 jackfruit, jak, jack 957 | n07760859 custard apple 958 | n07768694 pomegranate 959 | n07802026 hay 960 | n07831146 carbonara 961 | n07836838 chocolate sauce, chocolate syrup 962 | n07860988 dough 963 | n07871810 meat loaf, meatloaf 964 | n07873807 pizza, pizza pie 965 | n07875152 potpie 966 | n07880968 burrito 967 | n07892512 red wine 968 | n07920052 espresso 969 | n07930864 cup 970 | n07932039 eggnog 971 | n09193705 alp 972 | n09229709 bubble 973 | n09246464 cliff, drop, drop-off 974 | n09256479 coral reef 975 | n09288635 geyser 976 | n09332890 lakeside, lakeshore 977 | n09399592 promontory, headland, head, foreland 978 | n09421951 sandbar, sand bar 979 | n09428293 seashore, coast, seacoast, sea-coast 980 | n09468604 valley, vale 981 | n09472597 volcano 982 | n09835506 ballplayer, baseball player 983 | n10148035 groom, bridegroom 984 | n10565667 scuba diver 985 | n11879895 rapeseed 986 | n11939491 daisy 987 | n12057211 yellow lady's slipper, yellow lady-slipper, Cypripedium calceolus, Cypripedium parviflorum 988 | n12144580 corn 989 | n12267677 acorn 990 | n12620546 hip, rose hip, rosehip 991 | n12768682 buckeye, horse chestnut, conker 992 | n12985857 coral fungus 993 | n12998815 agaric 994 | n13037406 gyromitra 995 | n13040303 stinkhorn, carrion fungus 996 | n13044778 earthstar 997 | n13052670 hen-of-the-woods, hen of the woods, Polyporus frondosus, Grifola frondosa 998 | n13054560 bolete 999 | n13133613 ear, spike, capitulum 1000 | n15075141 toilet tissue, toilet paper, bathroom tissue -------------------------------------------------------------------------------- /datasets/jsb_chorales/README.md: -------------------------------------------------------------------------------- 1 | # Johann Sebastian Bach Chorales Dataset 2 | 3 | ## Source 4 | This dataset contains 382 chorales by Johann Sebastian Bach (in the public domain), where each chorale is composed of 100 to 640 chords with a temporal resolution of 1/16th. Each chord is composed of 4 integers, each indicating the index of a note on a piano, except for the value 0 which means "no note played". 5 | 6 | This dataset is based on [czhuang's JSB-Chorales-dataset](https://github.com/czhuang/JSB-Chorales-dataset/blob/master/README.md) (`Jsb16thSeparated.npz`) which used the train, validation, test split from Boulanger-Lewandowski (2012). 7 | 8 | Motivation: I thought it would be nice to have a version of this dataset in CSV format. 9 | 10 | ## Reference 11 | Boulanger-Lewandowski, N., Vincent, P., & Bengio, Y. (2012). Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. Proceedings of the 29th International Conference on Machine Learning (ICML-12), 1159–1166. 12 | 13 | ## Usage 14 | Download `jsb_chorales.tgz` and untar it: 15 | 16 | ```bash 17 | $ tar xvzf jsb_chorales.tgz 18 | ``` 19 | 20 | ## Data structure 21 | The dataset is split in three (train, valid, test), with a total of 382 CSV files: 22 | 23 | ``` 24 | $ tree 25 | . 26 | ├── train 27 | │   ├── chorale_000.csv 28 | │   ├── chorale_001.csv 29 | │   ├── chorale_002.csv 30 | │ │ ... 31 | │   ├── chorale_227.csv 32 | │   └── chorale_228.csv 33 | ├── valid 34 | │ ├── chorale_229.csv 35 | │ ├── chorale_230.csv 36 | │ ├── chorale_231.csv 37 | │ │ ... 38 | │   ├── chorale_303.csv 39 | │   └── chorale_304.csv 40 | └── test 41 |    ├── chorale_305.csv 42 |    ├── chorale_306.csv 43 |    ├── chorale_307.csv 44 | │ ... 45 |    ├── chorale_380.csv 46 |    └── chorale_381.csv 47 | ``` 48 | 49 | ## Data sample 50 | 51 | ``` 52 | $ head train/chorale_000.csv 53 | note0,note1,note2,note3 54 | 74,70,65,58 55 | 74,70,65,58 56 | 74,70,65,58 57 | 74,70,65,58 58 | 75,70,58,55 59 | 75,70,58,55 60 | 75,70,60,55 61 | 75,70,60,55 62 | 77,69,62,50 63 | ``` 64 | 65 | Enjoy! 66 | -------------------------------------------------------------------------------- /datasets/jsb_chorales/jsb_chorales.tgz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/datasets/jsb_chorales/jsb_chorales.tgz -------------------------------------------------------------------------------- /datasets/lifesat/README.md: -------------------------------------------------------------------------------- 1 | # Life satisfaction and GDP per capita 2 | ## Life satisfaction 3 | ### Source 4 | This dataset was obtained from the OECD's website at: http://stats.oecd.org/index.aspx?DataSetCode=BLI 5 | 6 | ### Data description 7 | 8 | Int64Index: 3292 entries, 0 to 3291 9 | Data columns (total 17 columns): 10 | "LOCATION" 3292 non-null object 11 | Country 3292 non-null object 12 | INDICATOR 3292 non-null object 13 | Indicator 3292 non-null object 14 | MEASURE 3292 non-null object 15 | Measure 3292 non-null object 16 | INEQUALITY 3292 non-null object 17 | Inequality 3292 non-null object 18 | Unit Code 3292 non-null object 19 | Unit 3292 non-null object 20 | PowerCode Code 3292 non-null int64 21 | PowerCode 3292 non-null object 22 | Reference Period Code 0 non-null float64 23 | Reference Period 0 non-null float64 24 | Value 3292 non-null float64 25 | Flag Codes 1120 non-null object 26 | Flags 1120 non-null object 27 | dtypes: float64(3), int64(1), object(13) 28 | memory usage: 462.9+ KB 29 | 30 | ### Example usage using python Pandas 31 | 32 | >>> life_sat = pd.read_csv("oecd_bli_2015.csv", thousands=',') 33 | 34 | >>> life_sat_total = life_sat[life_sat["INEQUALITY"]=="TOT"] 35 | 36 | >>> life_sat_total = life_sat_total.pivot(index="Country", columns="Indicator", values="Value") 37 | 38 | >>> life_sat_total.info() 39 | 40 | Index: 37 entries, Australia to United States 41 | Data columns (total 24 columns): 42 | Air pollution 37 non-null float64 43 | Assault rate 37 non-null float64 44 | Consultation on rule-making 37 non-null float64 45 | Dwellings without basic facilities 37 non-null float64 46 | Educational attainment 37 non-null float64 47 | Employees working very long hours 37 non-null float64 48 | Employment rate 37 non-null float64 49 | Homicide rate 37 non-null float64 50 | Household net adjusted disposable income 37 non-null float64 51 | Household net financial wealth 37 non-null float64 52 | Housing expenditure 37 non-null float64 53 | Job security 37 non-null float64 54 | Life expectancy 37 non-null float64 55 | Life satisfaction 37 non-null float64 56 | Long-term unemployment rate 37 non-null float64 57 | Personal earnings 37 non-null float64 58 | Quality of support network 37 non-null float64 59 | Rooms per person 37 non-null float64 60 | Self-reported health 37 non-null float64 61 | Student skills 37 non-null float64 62 | Time devoted to leisure and personal care 37 non-null float64 63 | Voter turnout 37 non-null float64 64 | Water quality 37 non-null float64 65 | Years in education 37 non-null float64 66 | dtypes: float64(24) 67 | memory usage: 7.2+ KB 68 | 69 | ## GDP per capita 70 | ### Source 71 | Dataset obtained from the IMF's website at: http://goo.gl/j1MSKe 72 | 73 | ### Data description 74 | 75 | Int64Index: 190 entries, 0 to 189 76 | Data columns (total 7 columns): 77 | Country 190 non-null object 78 | Subject Descriptor 189 non-null object 79 | Units 189 non-null object 80 | Scale 189 non-null object 81 | Country/Series-specific Notes 188 non-null object 82 | 2015 187 non-null float64 83 | Estimates Start After 188 non-null float64 84 | dtypes: float64(2), object(5) 85 | memory usage: 11.9+ KB 86 | 87 | ### Example usage using python Pandas 88 | 89 | >>> gdp_per_capita = pd.read_csv( 90 | ... datapath+"gdp_per_capita.csv", thousands=',', delimiter='\t', 91 | ... encoding='latin1', na_values="n/a", index_col="Country") 92 | ... 93 | >>> gdp_per_capita.rename(columns={"2015": "GDP per capita"}, inplace=True) 94 | 95 | -------------------------------------------------------------------------------- /datasets/lifesat/gdp_per_capita.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/datasets/lifesat/gdp_per_capita.csv -------------------------------------------------------------------------------- /datasets/titanic/test.csv: -------------------------------------------------------------------------------- 1 | PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 2 | 892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q 3 | 893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47,1,0,363272,7,,S 4 | 894,2,"Myles, Mr. Thomas Francis",male,62,0,0,240276,9.6875,,Q 5 | 895,3,"Wirz, Mr. Albert",male,27,0,0,315154,8.6625,,S 6 | 896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22,1,1,3101298,12.2875,,S 7 | 897,3,"Svensson, Mr. Johan Cervin",male,14,0,0,7538,9.225,,S 8 | 898,3,"Connolly, Miss. Kate",female,30,0,0,330972,7.6292,,Q 9 | 899,2,"Caldwell, Mr. Albert Francis",male,26,1,1,248738,29,,S 10 | 900,3,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",female,18,0,0,2657,7.2292,,C 11 | 901,3,"Davies, Mr. John Samuel",male,21,2,0,A/4 48871,24.15,,S 12 | 902,3,"Ilieff, Mr. Ylio",male,,0,0,349220,7.8958,,S 13 | 903,1,"Jones, Mr. Charles Cresson",male,46,0,0,694,26,,S 14 | 904,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23,1,0,21228,82.2667,B45,S 15 | 905,2,"Howard, Mr. Benjamin",male,63,1,0,24065,26,,S 16 | 906,1,"Chaffee, Mrs. Herbert Fuller (Carrie Constance Toogood)",female,47,1,0,W.E.P. 5734,61.175,E31,S 17 | 907,2,"del Carlo, Mrs. Sebastiano (Argenia Genovesi)",female,24,1,0,SC/PARIS 2167,27.7208,,C 18 | 908,2,"Keane, Mr. Daniel",male,35,0,0,233734,12.35,,Q 19 | 909,3,"Assaf, Mr. Gerios",male,21,0,0,2692,7.225,,C 20 | 910,3,"Ilmakangas, Miss. Ida Livija",female,27,1,0,STON/O2. 3101270,7.925,,S 21 | 911,3,"Assaf Khalil, Mrs. Mariana (Miriam"")""",female,45,0,0,2696,7.225,,C 22 | 912,1,"Rothschild, Mr. Martin",male,55,1,0,PC 17603,59.4,,C 23 | 913,3,"Olsen, Master. Artur Karl",male,9,0,1,C 17368,3.1708,,S 24 | 914,1,"Flegenheim, Mrs. Alfred (Antoinette)",female,,0,0,PC 17598,31.6833,,S 25 | 915,1,"Williams, Mr. Richard Norris II",male,21,0,1,PC 17597,61.3792,,C 26 | 916,1,"Ryerson, Mrs. Arthur Larned (Emily Maria Borie)",female,48,1,3,PC 17608,262.375,B57 B59 B63 B66,C 27 | 917,3,"Robins, Mr. Alexander A",male,50,1,0,A/5. 3337,14.5,,S 28 | 918,1,"Ostby, Miss. Helene Ragnhild",female,22,0,1,113509,61.9792,B36,C 29 | 919,3,"Daher, Mr. Shedid",male,22.5,0,0,2698,7.225,,C 30 | 920,1,"Brady, Mr. John Bertram",male,41,0,0,113054,30.5,A21,S 31 | 921,3,"Samaan, Mr. Elias",male,,2,0,2662,21.6792,,C 32 | 922,2,"Louch, Mr. Charles Alexander",male,50,1,0,SC/AH 3085,26,,S 33 | 923,2,"Jefferys, Mr. Clifford Thomas",male,24,2,0,C.A. 31029,31.5,,S 34 | 924,3,"Dean, Mrs. Bertram (Eva Georgetta Light)",female,33,1,2,C.A. 2315,20.575,,S 35 | 925,3,"Johnston, Mrs. Andrew G (Elizabeth Lily"" Watson)""",female,,1,2,W./C. 6607,23.45,,S 36 | 926,1,"Mock, Mr. Philipp Edmund",male,30,1,0,13236,57.75,C78,C 37 | 927,3,"Katavelas, Mr. Vassilios (Catavelas Vassilios"")""",male,18.5,0,0,2682,7.2292,,C 38 | 928,3,"Roth, Miss. Sarah A",female,,0,0,342712,8.05,,S 39 | 929,3,"Cacic, Miss. Manda",female,21,0,0,315087,8.6625,,S 40 | 930,3,"Sap, Mr. Julius",male,25,0,0,345768,9.5,,S 41 | 931,3,"Hee, Mr. Ling",male,,0,0,1601,56.4958,,S 42 | 932,3,"Karun, Mr. Franz",male,39,0,1,349256,13.4167,,C 43 | 933,1,"Franklin, Mr. Thomas Parham",male,,0,0,113778,26.55,D34,S 44 | 934,3,"Goldsmith, Mr. Nathan",male,41,0,0,SOTON/O.Q. 3101263,7.85,,S 45 | 935,2,"Corbett, Mrs. Walter H (Irene Colvin)",female,30,0,0,237249,13,,S 46 | 936,1,"Kimball, Mrs. Edwin Nelson Jr (Gertrude Parsons)",female,45,1,0,11753,52.5542,D19,S 47 | 937,3,"Peltomaki, Mr. Nikolai Johannes",male,25,0,0,STON/O 2. 3101291,7.925,,S 48 | 938,1,"Chevre, Mr. Paul Romaine",male,45,0,0,PC 17594,29.7,A9,C 49 | 939,3,"Shaughnessy, Mr. Patrick",male,,0,0,370374,7.75,,Q 50 | 940,1,"Bucknell, Mrs. William Robert (Emma Eliza Ward)",female,60,0,0,11813,76.2917,D15,C 51 | 941,3,"Coutts, Mrs. William (Winnie Minnie"" Treanor)""",female,36,0,2,C.A. 37671,15.9,,S 52 | 942,1,"Smith, Mr. Lucien Philip",male,24,1,0,13695,60,C31,S 53 | 943,2,"Pulbaum, Mr. Franz",male,27,0,0,SC/PARIS 2168,15.0333,,C 54 | 944,2,"Hocking, Miss. Ellen Nellie""""",female,20,2,1,29105,23,,S 55 | 945,1,"Fortune, Miss. Ethel Flora",female,28,3,2,19950,263,C23 C25 C27,S 56 | 946,2,"Mangiavacchi, Mr. Serafino Emilio",male,,0,0,SC/A.3 2861,15.5792,,C 57 | 947,3,"Rice, Master. Albert",male,10,4,1,382652,29.125,,Q 58 | 948,3,"Cor, Mr. Bartol",male,35,0,0,349230,7.8958,,S 59 | 949,3,"Abelseth, Mr. Olaus Jorgensen",male,25,0,0,348122,7.65,F G63,S 60 | 950,3,"Davison, Mr. Thomas Henry",male,,1,0,386525,16.1,,S 61 | 951,1,"Chaudanson, Miss. Victorine",female,36,0,0,PC 17608,262.375,B61,C 62 | 952,3,"Dika, Mr. Mirko",male,17,0,0,349232,7.8958,,S 63 | 953,2,"McCrae, Mr. Arthur Gordon",male,32,0,0,237216,13.5,,S 64 | 954,3,"Bjorklund, Mr. Ernst Herbert",male,18,0,0,347090,7.75,,S 65 | 955,3,"Bradley, Miss. Bridget Delia",female,22,0,0,334914,7.725,,Q 66 | 956,1,"Ryerson, Master. John Borie",male,13,2,2,PC 17608,262.375,B57 B59 B63 B66,C 67 | 957,2,"Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)",female,,0,0,F.C.C. 13534,21,,S 68 | 958,3,"Burns, Miss. Mary Delia",female,18,0,0,330963,7.8792,,Q 69 | 959,1,"Moore, Mr. Clarence Bloomfield",male,47,0,0,113796,42.4,,S 70 | 960,1,"Tucker, Mr. Gilbert Milligan Jr",male,31,0,0,2543,28.5375,C53,C 71 | 961,1,"Fortune, Mrs. Mark (Mary McDougald)",female,60,1,4,19950,263,C23 C25 C27,S 72 | 962,3,"Mulvihill, Miss. Bertha E",female,24,0,0,382653,7.75,,Q 73 | 963,3,"Minkoff, Mr. Lazar",male,21,0,0,349211,7.8958,,S 74 | 964,3,"Nieminen, Miss. Manta Josefina",female,29,0,0,3101297,7.925,,S 75 | 965,1,"Ovies y Rodriguez, Mr. Servando",male,28.5,0,0,PC 17562,27.7208,D43,C 76 | 966,1,"Geiger, Miss. Amalie",female,35,0,0,113503,211.5,C130,C 77 | 967,1,"Keeping, Mr. Edwin",male,32.5,0,0,113503,211.5,C132,C 78 | 968,3,"Miles, Mr. Frank",male,,0,0,359306,8.05,,S 79 | 969,1,"Cornell, Mrs. Robert Clifford (Malvina Helen Lamson)",female,55,2,0,11770,25.7,C101,S 80 | 970,2,"Aldworth, Mr. Charles Augustus",male,30,0,0,248744,13,,S 81 | 971,3,"Doyle, Miss. Elizabeth",female,24,0,0,368702,7.75,,Q 82 | 972,3,"Boulos, Master. Akar",male,6,1,1,2678,15.2458,,C 83 | 973,1,"Straus, Mr. Isidor",male,67,1,0,PC 17483,221.7792,C55 C57,S 84 | 974,1,"Case, Mr. Howard Brown",male,49,0,0,19924,26,,S 85 | 975,3,"Demetri, Mr. Marinko",male,,0,0,349238,7.8958,,S 86 | 976,2,"Lamb, Mr. John Joseph",male,,0,0,240261,10.7083,,Q 87 | 977,3,"Khalil, Mr. Betros",male,,1,0,2660,14.4542,,C 88 | 978,3,"Barry, Miss. Julia",female,27,0,0,330844,7.8792,,Q 89 | 979,3,"Badman, Miss. Emily Louisa",female,18,0,0,A/4 31416,8.05,,S 90 | 980,3,"O'Donoghue, Ms. Bridget",female,,0,0,364856,7.75,,Q 91 | 981,2,"Wells, Master. Ralph Lester",male,2,1,1,29103,23,,S 92 | 982,3,"Dyker, Mrs. Adolf Fredrik (Anna Elisabeth Judith Andersson)",female,22,1,0,347072,13.9,,S 93 | 983,3,"Pedersen, Mr. Olaf",male,,0,0,345498,7.775,,S 94 | 984,1,"Davidson, Mrs. Thornton (Orian Hays)",female,27,1,2,F.C. 12750,52,B71,S 95 | 985,3,"Guest, Mr. Robert",male,,0,0,376563,8.05,,S 96 | 986,1,"Birnbaum, Mr. Jakob",male,25,0,0,13905,26,,C 97 | 987,3,"Tenglin, Mr. Gunnar Isidor",male,25,0,0,350033,7.7958,,S 98 | 988,1,"Cavendish, Mrs. Tyrell William (Julia Florence Siegel)",female,76,1,0,19877,78.85,C46,S 99 | 989,3,"Makinen, Mr. Kalle Edvard",male,29,0,0,STON/O 2. 3101268,7.925,,S 100 | 990,3,"Braf, Miss. Elin Ester Maria",female,20,0,0,347471,7.8542,,S 101 | 991,3,"Nancarrow, Mr. William Henry",male,33,0,0,A./5. 3338,8.05,,S 102 | 992,1,"Stengel, Mrs. Charles Emil Henry (Annie May Morris)",female,43,1,0,11778,55.4417,C116,C 103 | 993,2,"Weisz, Mr. Leopold",male,27,1,0,228414,26,,S 104 | 994,3,"Foley, Mr. William",male,,0,0,365235,7.75,,Q 105 | 995,3,"Johansson Palmquist, Mr. Oskar Leander",male,26,0,0,347070,7.775,,S 106 | 996,3,"Thomas, Mrs. Alexander (Thamine Thelma"")""",female,16,1,1,2625,8.5167,,C 107 | 997,3,"Holthen, Mr. Johan Martin",male,28,0,0,C 4001,22.525,,S 108 | 998,3,"Buckley, Mr. Daniel",male,21,0,0,330920,7.8208,,Q 109 | 999,3,"Ryan, Mr. Edward",male,,0,0,383162,7.75,,Q 110 | 1000,3,"Willer, Mr. Aaron (Abi Weller"")""",male,,0,0,3410,8.7125,,S 111 | 1001,2,"Swane, Mr. George",male,18.5,0,0,248734,13,F,S 112 | 1002,2,"Stanton, Mr. Samuel Ward",male,41,0,0,237734,15.0458,,C 113 | 1003,3,"Shine, Miss. Ellen Natalia",female,,0,0,330968,7.7792,,Q 114 | 1004,1,"Evans, Miss. Edith Corse",female,36,0,0,PC 17531,31.6792,A29,C 115 | 1005,3,"Buckley, Miss. Katherine",female,18.5,0,0,329944,7.2833,,Q 116 | 1006,1,"Straus, Mrs. Isidor (Rosalie Ida Blun)",female,63,1,0,PC 17483,221.7792,C55 C57,S 117 | 1007,3,"Chronopoulos, Mr. Demetrios",male,18,1,0,2680,14.4542,,C 118 | 1008,3,"Thomas, Mr. John",male,,0,0,2681,6.4375,,C 119 | 1009,3,"Sandstrom, Miss. Beatrice Irene",female,1,1,1,PP 9549,16.7,G6,S 120 | 1010,1,"Beattie, Mr. Thomson",male,36,0,0,13050,75.2417,C6,C 121 | 1011,2,"Chapman, Mrs. John Henry (Sara Elizabeth Lawry)",female,29,1,0,SC/AH 29037,26,,S 122 | 1012,2,"Watt, Miss. Bertha J",female,12,0,0,C.A. 33595,15.75,,S 123 | 1013,3,"Kiernan, Mr. John",male,,1,0,367227,7.75,,Q 124 | 1014,1,"Schabert, Mrs. Paul (Emma Mock)",female,35,1,0,13236,57.75,C28,C 125 | 1015,3,"Carver, Mr. Alfred John",male,28,0,0,392095,7.25,,S 126 | 1016,3,"Kennedy, Mr. John",male,,0,0,368783,7.75,,Q 127 | 1017,3,"Cribb, Miss. Laura Alice",female,17,0,1,371362,16.1,,S 128 | 1018,3,"Brobeck, Mr. Karl Rudolf",male,22,0,0,350045,7.7958,,S 129 | 1019,3,"McCoy, Miss. Alicia",female,,2,0,367226,23.25,,Q 130 | 1020,2,"Bowenur, Mr. Solomon",male,42,0,0,211535,13,,S 131 | 1021,3,"Petersen, Mr. Marius",male,24,0,0,342441,8.05,,S 132 | 1022,3,"Spinner, Mr. Henry John",male,32,0,0,STON/OQ. 369943,8.05,,S 133 | 1023,1,"Gracie, Col. Archibald IV",male,53,0,0,113780,28.5,C51,C 134 | 1024,3,"Lefebre, Mrs. Frank (Frances)",female,,0,4,4133,25.4667,,S 135 | 1025,3,"Thomas, Mr. Charles P",male,,1,0,2621,6.4375,,C 136 | 1026,3,"Dintcheff, Mr. Valtcho",male,43,0,0,349226,7.8958,,S 137 | 1027,3,"Carlsson, Mr. Carl Robert",male,24,0,0,350409,7.8542,,S 138 | 1028,3,"Zakarian, Mr. Mapriededer",male,26.5,0,0,2656,7.225,,C 139 | 1029,2,"Schmidt, Mr. August",male,26,0,0,248659,13,,S 140 | 1030,3,"Drapkin, Miss. Jennie",female,23,0,0,SOTON/OQ 392083,8.05,,S 141 | 1031,3,"Goodwin, Mr. Charles Frederick",male,40,1,6,CA 2144,46.9,,S 142 | 1032,3,"Goodwin, Miss. Jessie Allis",female,10,5,2,CA 2144,46.9,,S 143 | 1033,1,"Daniels, Miss. Sarah",female,33,0,0,113781,151.55,,S 144 | 1034,1,"Ryerson, Mr. Arthur Larned",male,61,1,3,PC 17608,262.375,B57 B59 B63 B66,C 145 | 1035,2,"Beauchamp, Mr. Henry James",male,28,0,0,244358,26,,S 146 | 1036,1,"Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey"")""",male,42,0,0,17475,26.55,,S 147 | 1037,3,"Vander Planke, Mr. Julius",male,31,3,0,345763,18,,S 148 | 1038,1,"Hilliard, Mr. Herbert Henry",male,,0,0,17463,51.8625,E46,S 149 | 1039,3,"Davies, Mr. Evan",male,22,0,0,SC/A4 23568,8.05,,S 150 | 1040,1,"Crafton, Mr. John Bertram",male,,0,0,113791,26.55,,S 151 | 1041,2,"Lahtinen, Rev. William",male,30,1,1,250651,26,,S 152 | 1042,1,"Earnshaw, Mrs. Boulton (Olive Potter)",female,23,0,1,11767,83.1583,C54,C 153 | 1043,3,"Matinoff, Mr. Nicola",male,,0,0,349255,7.8958,,C 154 | 1044,3,"Storey, Mr. Thomas",male,60.5,0,0,3701,,,S 155 | 1045,3,"Klasen, Mrs. (Hulda Kristina Eugenia Lofqvist)",female,36,0,2,350405,12.1833,,S 156 | 1046,3,"Asplund, Master. Filip Oscar",male,13,4,2,347077,31.3875,,S 157 | 1047,3,"Duquemin, Mr. Joseph",male,24,0,0,S.O./P.P. 752,7.55,,S 158 | 1048,1,"Bird, Miss. Ellen",female,29,0,0,PC 17483,221.7792,C97,S 159 | 1049,3,"Lundin, Miss. Olga Elida",female,23,0,0,347469,7.8542,,S 160 | 1050,1,"Borebank, Mr. John James",male,42,0,0,110489,26.55,D22,S 161 | 1051,3,"Peacock, Mrs. Benjamin (Edith Nile)",female,26,0,2,SOTON/O.Q. 3101315,13.775,,S 162 | 1052,3,"Smyth, Miss. Julia",female,,0,0,335432,7.7333,,Q 163 | 1053,3,"Touma, Master. Georges Youssef",male,7,1,1,2650,15.2458,,C 164 | 1054,2,"Wright, Miss. Marion",female,26,0,0,220844,13.5,,S 165 | 1055,3,"Pearce, Mr. Ernest",male,,0,0,343271,7,,S 166 | 1056,2,"Peruschitz, Rev. Joseph Maria",male,41,0,0,237393,13,,S 167 | 1057,3,"Kink-Heilmann, Mrs. Anton (Luise Heilmann)",female,26,1,1,315153,22.025,,S 168 | 1058,1,"Brandeis, Mr. Emil",male,48,0,0,PC 17591,50.4958,B10,C 169 | 1059,3,"Ford, Mr. Edward Watson",male,18,2,2,W./C. 6608,34.375,,S 170 | 1060,1,"Cassebeer, Mrs. Henry Arthur Jr (Eleanor Genevieve Fosdick)",female,,0,0,17770,27.7208,,C 171 | 1061,3,"Hellstrom, Miss. Hilda Maria",female,22,0,0,7548,8.9625,,S 172 | 1062,3,"Lithman, Mr. Simon",male,,0,0,S.O./P.P. 251,7.55,,S 173 | 1063,3,"Zakarian, Mr. Ortin",male,27,0,0,2670,7.225,,C 174 | 1064,3,"Dyker, Mr. Adolf Fredrik",male,23,1,0,347072,13.9,,S 175 | 1065,3,"Torfa, Mr. Assad",male,,0,0,2673,7.2292,,C 176 | 1066,3,"Asplund, Mr. Carl Oscar Vilhelm Gustafsson",male,40,1,5,347077,31.3875,,S 177 | 1067,2,"Brown, Miss. Edith Eileen",female,15,0,2,29750,39,,S 178 | 1068,2,"Sincock, Miss. Maude",female,20,0,0,C.A. 33112,36.75,,S 179 | 1069,1,"Stengel, Mr. Charles Emil Henry",male,54,1,0,11778,55.4417,C116,C 180 | 1070,2,"Becker, Mrs. Allen Oliver (Nellie E Baumgardner)",female,36,0,3,230136,39,F4,S 181 | 1071,1,"Compton, Mrs. Alexander Taylor (Mary Eliza Ingersoll)",female,64,0,2,PC 17756,83.1583,E45,C 182 | 1072,2,"McCrie, Mr. James Matthew",male,30,0,0,233478,13,,S 183 | 1073,1,"Compton, Mr. Alexander Taylor Jr",male,37,1,1,PC 17756,83.1583,E52,C 184 | 1074,1,"Marvin, Mrs. Daniel Warner (Mary Graham Carmichael Farquarson)",female,18,1,0,113773,53.1,D30,S 185 | 1075,3,"Lane, Mr. Patrick",male,,0,0,7935,7.75,,Q 186 | 1076,1,"Douglas, Mrs. Frederick Charles (Mary Helene Baxter)",female,27,1,1,PC 17558,247.5208,B58 B60,C 187 | 1077,2,"Maybery, Mr. Frank Hubert",male,40,0,0,239059,16,,S 188 | 1078,2,"Phillips, Miss. Alice Frances Louisa",female,21,0,1,S.O./P.P. 2,21,,S 189 | 1079,3,"Davies, Mr. Joseph",male,17,2,0,A/4 48873,8.05,,S 190 | 1080,3,"Sage, Miss. Ada",female,,8,2,CA. 2343,69.55,,S 191 | 1081,2,"Veal, Mr. James",male,40,0,0,28221,13,,S 192 | 1082,2,"Angle, Mr. William A",male,34,1,0,226875,26,,S 193 | 1083,1,"Salomon, Mr. Abraham L",male,,0,0,111163,26,,S 194 | 1084,3,"van Billiard, Master. Walter John",male,11.5,1,1,A/5. 851,14.5,,S 195 | 1085,2,"Lingane, Mr. John",male,61,0,0,235509,12.35,,Q 196 | 1086,2,"Drew, Master. Marshall Brines",male,8,0,2,28220,32.5,,S 197 | 1087,3,"Karlsson, Mr. Julius Konrad Eugen",male,33,0,0,347465,7.8542,,S 198 | 1088,1,"Spedden, Master. Robert Douglas",male,6,0,2,16966,134.5,E34,C 199 | 1089,3,"Nilsson, Miss. Berta Olivia",female,18,0,0,347066,7.775,,S 200 | 1090,2,"Baimbrigge, Mr. Charles Robert",male,23,0,0,C.A. 31030,10.5,,S 201 | 1091,3,"Rasmussen, Mrs. (Lena Jacobsen Solvang)",female,,0,0,65305,8.1125,,S 202 | 1092,3,"Murphy, Miss. Nora",female,,0,0,36568,15.5,,Q 203 | 1093,3,"Danbom, Master. Gilbert Sigvard Emanuel",male,0.33,0,2,347080,14.4,,S 204 | 1094,1,"Astor, Col. John Jacob",male,47,1,0,PC 17757,227.525,C62 C64,C 205 | 1095,2,"Quick, Miss. Winifred Vera",female,8,1,1,26360,26,,S 206 | 1096,2,"Andrew, Mr. Frank Thomas",male,25,0,0,C.A. 34050,10.5,,S 207 | 1097,1,"Omont, Mr. Alfred Fernand",male,,0,0,F.C. 12998,25.7417,,C 208 | 1098,3,"McGowan, Miss. Katherine",female,35,0,0,9232,7.75,,Q 209 | 1099,2,"Collett, Mr. Sidney C Stuart",male,24,0,0,28034,10.5,,S 210 | 1100,1,"Rosenbaum, Miss. Edith Louise",female,33,0,0,PC 17613,27.7208,A11,C 211 | 1101,3,"Delalic, Mr. Redjo",male,25,0,0,349250,7.8958,,S 212 | 1102,3,"Andersen, Mr. Albert Karvin",male,32,0,0,C 4001,22.525,,S 213 | 1103,3,"Finoli, Mr. Luigi",male,,0,0,SOTON/O.Q. 3101308,7.05,,S 214 | 1104,2,"Deacon, Mr. Percy William",male,17,0,0,S.O.C. 14879,73.5,,S 215 | 1105,2,"Howard, Mrs. Benjamin (Ellen Truelove Arman)",female,60,1,0,24065,26,,S 216 | 1106,3,"Andersson, Miss. Ida Augusta Margareta",female,38,4,2,347091,7.775,,S 217 | 1107,1,"Head, Mr. Christopher",male,42,0,0,113038,42.5,B11,S 218 | 1108,3,"Mahon, Miss. Bridget Delia",female,,0,0,330924,7.8792,,Q 219 | 1109,1,"Wick, Mr. George Dennick",male,57,1,1,36928,164.8667,,S 220 | 1110,1,"Widener, Mrs. George Dunton (Eleanor Elkins)",female,50,1,1,113503,211.5,C80,C 221 | 1111,3,"Thomson, Mr. Alexander Morrison",male,,0,0,32302,8.05,,S 222 | 1112,2,"Duran y More, Miss. Florentina",female,30,1,0,SC/PARIS 2148,13.8583,,C 223 | 1113,3,"Reynolds, Mr. Harold J",male,21,0,0,342684,8.05,,S 224 | 1114,2,"Cook, Mrs. (Selena Rogers)",female,22,0,0,W./C. 14266,10.5,F33,S 225 | 1115,3,"Karlsson, Mr. Einar Gervasius",male,21,0,0,350053,7.7958,,S 226 | 1116,1,"Candee, Mrs. Edward (Helen Churchill Hungerford)",female,53,0,0,PC 17606,27.4458,,C 227 | 1117,3,"Moubarek, Mrs. George (Omine Amenia"" Alexander)""",female,,0,2,2661,15.2458,,C 228 | 1118,3,"Asplund, Mr. Johan Charles",male,23,0,0,350054,7.7958,,S 229 | 1119,3,"McNeill, Miss. Bridget",female,,0,0,370368,7.75,,Q 230 | 1120,3,"Everett, Mr. Thomas James",male,40.5,0,0,C.A. 6212,15.1,,S 231 | 1121,2,"Hocking, Mr. Samuel James Metcalfe",male,36,0,0,242963,13,,S 232 | 1122,2,"Sweet, Mr. George Frederick",male,14,0,0,220845,65,,S 233 | 1123,1,"Willard, Miss. Constance",female,21,0,0,113795,26.55,,S 234 | 1124,3,"Wiklund, Mr. Karl Johan",male,21,1,0,3101266,6.4958,,S 235 | 1125,3,"Linehan, Mr. Michael",male,,0,0,330971,7.8792,,Q 236 | 1126,1,"Cumings, Mr. John Bradley",male,39,1,0,PC 17599,71.2833,C85,C 237 | 1127,3,"Vendel, Mr. Olof Edvin",male,20,0,0,350416,7.8542,,S 238 | 1128,1,"Warren, Mr. Frank Manley",male,64,1,0,110813,75.25,D37,C 239 | 1129,3,"Baccos, Mr. Raffull",male,20,0,0,2679,7.225,,C 240 | 1130,2,"Hiltunen, Miss. Marta",female,18,1,1,250650,13,,S 241 | 1131,1,"Douglas, Mrs. Walter Donald (Mahala Dutton)",female,48,1,0,PC 17761,106.425,C86,C 242 | 1132,1,"Lindstrom, Mrs. Carl Johan (Sigrid Posse)",female,55,0,0,112377,27.7208,,C 243 | 1133,2,"Christy, Mrs. (Alice Frances)",female,45,0,2,237789,30,,S 244 | 1134,1,"Spedden, Mr. Frederic Oakley",male,45,1,1,16966,134.5,E34,C 245 | 1135,3,"Hyman, Mr. Abraham",male,,0,0,3470,7.8875,,S 246 | 1136,3,"Johnston, Master. William Arthur Willie""""",male,,1,2,W./C. 6607,23.45,,S 247 | 1137,1,"Kenyon, Mr. Frederick R",male,41,1,0,17464,51.8625,D21,S 248 | 1138,2,"Karnes, Mrs. J Frank (Claire Bennett)",female,22,0,0,F.C.C. 13534,21,,S 249 | 1139,2,"Drew, Mr. James Vivian",male,42,1,1,28220,32.5,,S 250 | 1140,2,"Hold, Mrs. Stephen (Annie Margaret Hill)",female,29,1,0,26707,26,,S 251 | 1141,3,"Khalil, Mrs. Betros (Zahie Maria"" Elias)""",female,,1,0,2660,14.4542,,C 252 | 1142,2,"West, Miss. Barbara J",female,0.92,1,2,C.A. 34651,27.75,,S 253 | 1143,3,"Abrahamsson, Mr. Abraham August Johannes",male,20,0,0,SOTON/O2 3101284,7.925,,S 254 | 1144,1,"Clark, Mr. Walter Miller",male,27,1,0,13508,136.7792,C89,C 255 | 1145,3,"Salander, Mr. Karl Johan",male,24,0,0,7266,9.325,,S 256 | 1146,3,"Wenzel, Mr. Linhart",male,32.5,0,0,345775,9.5,,S 257 | 1147,3,"MacKay, Mr. George William",male,,0,0,C.A. 42795,7.55,,S 258 | 1148,3,"Mahon, Mr. John",male,,0,0,AQ/4 3130,7.75,,Q 259 | 1149,3,"Niklasson, Mr. Samuel",male,28,0,0,363611,8.05,,S 260 | 1150,2,"Bentham, Miss. Lilian W",female,19,0,0,28404,13,,S 261 | 1151,3,"Midtsjo, Mr. Karl Albert",male,21,0,0,345501,7.775,,S 262 | 1152,3,"de Messemaeker, Mr. Guillaume Joseph",male,36.5,1,0,345572,17.4,,S 263 | 1153,3,"Nilsson, Mr. August Ferdinand",male,21,0,0,350410,7.8542,,S 264 | 1154,2,"Wells, Mrs. Arthur Henry (Addie"" Dart Trevaskis)""",female,29,0,2,29103,23,,S 265 | 1155,3,"Klasen, Miss. Gertrud Emilia",female,1,1,1,350405,12.1833,,S 266 | 1156,2,"Portaluppi, Mr. Emilio Ilario Giuseppe",male,30,0,0,C.A. 34644,12.7375,,C 267 | 1157,3,"Lyntakoff, Mr. Stanko",male,,0,0,349235,7.8958,,S 268 | 1158,1,"Chisholm, Mr. Roderick Robert Crispin",male,,0,0,112051,0,,S 269 | 1159,3,"Warren, Mr. Charles William",male,,0,0,C.A. 49867,7.55,,S 270 | 1160,3,"Howard, Miss. May Elizabeth",female,,0,0,A. 2. 39186,8.05,,S 271 | 1161,3,"Pokrnic, Mr. Mate",male,17,0,0,315095,8.6625,,S 272 | 1162,1,"McCaffry, Mr. Thomas Francis",male,46,0,0,13050,75.2417,C6,C 273 | 1163,3,"Fox, Mr. Patrick",male,,0,0,368573,7.75,,Q 274 | 1164,1,"Clark, Mrs. Walter Miller (Virginia McDowell)",female,26,1,0,13508,136.7792,C89,C 275 | 1165,3,"Lennon, Miss. Mary",female,,1,0,370371,15.5,,Q 276 | 1166,3,"Saade, Mr. Jean Nassr",male,,0,0,2676,7.225,,C 277 | 1167,2,"Bryhl, Miss. Dagmar Jenny Ingeborg ",female,20,1,0,236853,26,,S 278 | 1168,2,"Parker, Mr. Clifford Richard",male,28,0,0,SC 14888,10.5,,S 279 | 1169,2,"Faunthorpe, Mr. Harry",male,40,1,0,2926,26,,S 280 | 1170,2,"Ware, Mr. John James",male,30,1,0,CA 31352,21,,S 281 | 1171,2,"Oxenham, Mr. Percy Thomas",male,22,0,0,W./C. 14260,10.5,,S 282 | 1172,3,"Oreskovic, Miss. Jelka",female,23,0,0,315085,8.6625,,S 283 | 1173,3,"Peacock, Master. Alfred Edward",male,0.75,1,1,SOTON/O.Q. 3101315,13.775,,S 284 | 1174,3,"Fleming, Miss. Honora",female,,0,0,364859,7.75,,Q 285 | 1175,3,"Touma, Miss. Maria Youssef",female,9,1,1,2650,15.2458,,C 286 | 1176,3,"Rosblom, Miss. Salli Helena",female,2,1,1,370129,20.2125,,S 287 | 1177,3,"Dennis, Mr. William",male,36,0,0,A/5 21175,7.25,,S 288 | 1178,3,"Franklin, Mr. Charles (Charles Fardon)",male,,0,0,SOTON/O.Q. 3101314,7.25,,S 289 | 1179,1,"Snyder, Mr. John Pillsbury",male,24,1,0,21228,82.2667,B45,S 290 | 1180,3,"Mardirosian, Mr. Sarkis",male,,0,0,2655,7.2292,F E46,C 291 | 1181,3,"Ford, Mr. Arthur",male,,0,0,A/5 1478,8.05,,S 292 | 1182,1,"Rheims, Mr. George Alexander Lucien",male,,0,0,PC 17607,39.6,,S 293 | 1183,3,"Daly, Miss. Margaret Marcella Maggie""""",female,30,0,0,382650,6.95,,Q 294 | 1184,3,"Nasr, Mr. Mustafa",male,,0,0,2652,7.2292,,C 295 | 1185,1,"Dodge, Dr. Washington",male,53,1,1,33638,81.8583,A34,S 296 | 1186,3,"Wittevrongel, Mr. Camille",male,36,0,0,345771,9.5,,S 297 | 1187,3,"Angheloff, Mr. Minko",male,26,0,0,349202,7.8958,,S 298 | 1188,2,"Laroche, Miss. Louise",female,1,1,2,SC/Paris 2123,41.5792,,C 299 | 1189,3,"Samaan, Mr. Hanna",male,,2,0,2662,21.6792,,C 300 | 1190,1,"Loring, Mr. Joseph Holland",male,30,0,0,113801,45.5,,S 301 | 1191,3,"Johansson, Mr. Nils",male,29,0,0,347467,7.8542,,S 302 | 1192,3,"Olsson, Mr. Oscar Wilhelm",male,32,0,0,347079,7.775,,S 303 | 1193,2,"Malachard, Mr. Noel",male,,0,0,237735,15.0458,D,C 304 | 1194,2,"Phillips, Mr. Escott Robert",male,43,0,1,S.O./P.P. 2,21,,S 305 | 1195,3,"Pokrnic, Mr. Tome",male,24,0,0,315092,8.6625,,S 306 | 1196,3,"McCarthy, Miss. Catherine Katie""""",female,,0,0,383123,7.75,,Q 307 | 1197,1,"Crosby, Mrs. Edward Gifford (Catherine Elizabeth Halstead)",female,64,1,1,112901,26.55,B26,S 308 | 1198,1,"Allison, Mr. Hudson Joshua Creighton",male,30,1,2,113781,151.55,C22 C26,S 309 | 1199,3,"Aks, Master. Philip Frank",male,0.83,0,1,392091,9.35,,S 310 | 1200,1,"Hays, Mr. Charles Melville",male,55,1,1,12749,93.5,B69,S 311 | 1201,3,"Hansen, Mrs. Claus Peter (Jennie L Howard)",female,45,1,0,350026,14.1083,,S 312 | 1202,3,"Cacic, Mr. Jego Grga",male,18,0,0,315091,8.6625,,S 313 | 1203,3,"Vartanian, Mr. David",male,22,0,0,2658,7.225,,C 314 | 1204,3,"Sadowitz, Mr. Harry",male,,0,0,LP 1588,7.575,,S 315 | 1205,3,"Carr, Miss. Jeannie",female,37,0,0,368364,7.75,,Q 316 | 1206,1,"White, Mrs. John Stuart (Ella Holmes)",female,55,0,0,PC 17760,135.6333,C32,C 317 | 1207,3,"Hagardon, Miss. Kate",female,17,0,0,AQ/3. 30631,7.7333,,Q 318 | 1208,1,"Spencer, Mr. William Augustus",male,57,1,0,PC 17569,146.5208,B78,C 319 | 1209,2,"Rogers, Mr. Reginald Harry",male,19,0,0,28004,10.5,,S 320 | 1210,3,"Jonsson, Mr. Nils Hilding",male,27,0,0,350408,7.8542,,S 321 | 1211,2,"Jefferys, Mr. Ernest Wilfred",male,22,2,0,C.A. 31029,31.5,,S 322 | 1212,3,"Andersson, Mr. Johan Samuel",male,26,0,0,347075,7.775,,S 323 | 1213,3,"Krekorian, Mr. Neshan",male,25,0,0,2654,7.2292,F E57,C 324 | 1214,2,"Nesson, Mr. Israel",male,26,0,0,244368,13,F2,S 325 | 1215,1,"Rowe, Mr. Alfred G",male,33,0,0,113790,26.55,,S 326 | 1216,1,"Kreuchen, Miss. Emilie",female,39,0,0,24160,211.3375,,S 327 | 1217,3,"Assam, Mr. Ali",male,23,0,0,SOTON/O.Q. 3101309,7.05,,S 328 | 1218,2,"Becker, Miss. Ruth Elizabeth",female,12,2,1,230136,39,F4,S 329 | 1219,1,"Rosenshine, Mr. George (Mr George Thorne"")""",male,46,0,0,PC 17585,79.2,,C 330 | 1220,2,"Clarke, Mr. Charles Valentine",male,29,1,0,2003,26,,S 331 | 1221,2,"Enander, Mr. Ingvar",male,21,0,0,236854,13,,S 332 | 1222,2,"Davies, Mrs. John Morgan (Elizabeth Agnes Mary White) ",female,48,0,2,C.A. 33112,36.75,,S 333 | 1223,1,"Dulles, Mr. William Crothers",male,39,0,0,PC 17580,29.7,A18,C 334 | 1224,3,"Thomas, Mr. Tannous",male,,0,0,2684,7.225,,C 335 | 1225,3,"Nakid, Mrs. Said (Waika Mary"" Mowad)""",female,19,1,1,2653,15.7417,,C 336 | 1226,3,"Cor, Mr. Ivan",male,27,0,0,349229,7.8958,,S 337 | 1227,1,"Maguire, Mr. John Edward",male,30,0,0,110469,26,C106,S 338 | 1228,2,"de Brito, Mr. Jose Joaquim",male,32,0,0,244360,13,,S 339 | 1229,3,"Elias, Mr. Joseph",male,39,0,2,2675,7.2292,,C 340 | 1230,2,"Denbury, Mr. Herbert",male,25,0,0,C.A. 31029,31.5,,S 341 | 1231,3,"Betros, Master. Seman",male,,0,0,2622,7.2292,,C 342 | 1232,2,"Fillbrook, Mr. Joseph Charles",male,18,0,0,C.A. 15185,10.5,,S 343 | 1233,3,"Lundstrom, Mr. Thure Edvin",male,32,0,0,350403,7.5792,,S 344 | 1234,3,"Sage, Mr. John George",male,,1,9,CA. 2343,69.55,,S 345 | 1235,1,"Cardeza, Mrs. James Warburton Martinez (Charlotte Wardle Drake)",female,58,0,1,PC 17755,512.3292,B51 B53 B55,C 346 | 1236,3,"van Billiard, Master. James William",male,,1,1,A/5. 851,14.5,,S 347 | 1237,3,"Abelseth, Miss. Karen Marie",female,16,0,0,348125,7.65,,S 348 | 1238,2,"Botsford, Mr. William Hull",male,26,0,0,237670,13,,S 349 | 1239,3,"Whabee, Mrs. George Joseph (Shawneene Abi-Saab)",female,38,0,0,2688,7.2292,,C 350 | 1240,2,"Giles, Mr. Ralph",male,24,0,0,248726,13.5,,S 351 | 1241,2,"Walcroft, Miss. Nellie",female,31,0,0,F.C.C. 13528,21,,S 352 | 1242,1,"Greenfield, Mrs. Leo David (Blanche Strouse)",female,45,0,1,PC 17759,63.3583,D10 D12,C 353 | 1243,2,"Stokes, Mr. Philip Joseph",male,25,0,0,F.C.C. 13540,10.5,,S 354 | 1244,2,"Dibden, Mr. William",male,18,0,0,S.O.C. 14879,73.5,,S 355 | 1245,2,"Herman, Mr. Samuel",male,49,1,2,220845,65,,S 356 | 1246,3,"Dean, Miss. Elizabeth Gladys Millvina""""",female,0.17,1,2,C.A. 2315,20.575,,S 357 | 1247,1,"Julian, Mr. Henry Forbes",male,50,0,0,113044,26,E60,S 358 | 1248,1,"Brown, Mrs. John Murray (Caroline Lane Lamson)",female,59,2,0,11769,51.4792,C101,S 359 | 1249,3,"Lockyer, Mr. Edward",male,,0,0,1222,7.8792,,S 360 | 1250,3,"O'Keefe, Mr. Patrick",male,,0,0,368402,7.75,,Q 361 | 1251,3,"Lindell, Mrs. Edvard Bengtsson (Elin Gerda Persson)",female,30,1,0,349910,15.55,,S 362 | 1252,3,"Sage, Master. William Henry",male,14.5,8,2,CA. 2343,69.55,,S 363 | 1253,2,"Mallet, Mrs. Albert (Antoinette Magnin)",female,24,1,1,S.C./PARIS 2079,37.0042,,C 364 | 1254,2,"Ware, Mrs. John James (Florence Louise Long)",female,31,0,0,CA 31352,21,,S 365 | 1255,3,"Strilic, Mr. Ivan",male,27,0,0,315083,8.6625,,S 366 | 1256,1,"Harder, Mrs. George Achilles (Dorothy Annan)",female,25,1,0,11765,55.4417,E50,C 367 | 1257,3,"Sage, Mrs. John (Annie Bullen)",female,,1,9,CA. 2343,69.55,,S 368 | 1258,3,"Caram, Mr. Joseph",male,,1,0,2689,14.4583,,C 369 | 1259,3,"Riihivouri, Miss. Susanna Juhantytar Sanni""""",female,22,0,0,3101295,39.6875,,S 370 | 1260,1,"Gibson, Mrs. Leonard (Pauline C Boeson)",female,45,0,1,112378,59.4,,C 371 | 1261,2,"Pallas y Castello, Mr. Emilio",male,29,0,0,SC/PARIS 2147,13.8583,,C 372 | 1262,2,"Giles, Mr. Edgar",male,21,1,0,28133,11.5,,S 373 | 1263,1,"Wilson, Miss. Helen Alice",female,31,0,0,16966,134.5,E39 E41,C 374 | 1264,1,"Ismay, Mr. Joseph Bruce",male,49,0,0,112058,0,B52 B54 B56,S 375 | 1265,2,"Harbeck, Mr. William H",male,44,0,0,248746,13,,S 376 | 1266,1,"Dodge, Mrs. Washington (Ruth Vidaver)",female,54,1,1,33638,81.8583,A34,S 377 | 1267,1,"Bowen, Miss. Grace Scott",female,45,0,0,PC 17608,262.375,,C 378 | 1268,3,"Kink, Miss. Maria",female,22,2,0,315152,8.6625,,S 379 | 1269,2,"Cotterill, Mr. Henry Harry""""",male,21,0,0,29107,11.5,,S 380 | 1270,1,"Hipkins, Mr. William Edward",male,55,0,0,680,50,C39,S 381 | 1271,3,"Asplund, Master. Carl Edgar",male,5,4,2,347077,31.3875,,S 382 | 1272,3,"O'Connor, Mr. Patrick",male,,0,0,366713,7.75,,Q 383 | 1273,3,"Foley, Mr. Joseph",male,26,0,0,330910,7.8792,,Q 384 | 1274,3,"Risien, Mrs. Samuel (Emma)",female,,0,0,364498,14.5,,S 385 | 1275,3,"McNamee, Mrs. Neal (Eileen O'Leary)",female,19,1,0,376566,16.1,,S 386 | 1276,2,"Wheeler, Mr. Edwin Frederick""""",male,,0,0,SC/PARIS 2159,12.875,,S 387 | 1277,2,"Herman, Miss. Kate",female,24,1,2,220845,65,,S 388 | 1278,3,"Aronsson, Mr. Ernst Axel Algot",male,24,0,0,349911,7.775,,S 389 | 1279,2,"Ashby, Mr. John",male,57,0,0,244346,13,,S 390 | 1280,3,"Canavan, Mr. Patrick",male,21,0,0,364858,7.75,,Q 391 | 1281,3,"Palsson, Master. Paul Folke",male,6,3,1,349909,21.075,,S 392 | 1282,1,"Payne, Mr. Vivian Ponsonby",male,23,0,0,12749,93.5,B24,S 393 | 1283,1,"Lines, Mrs. Ernest H (Elizabeth Lindsey James)",female,51,0,1,PC 17592,39.4,D28,S 394 | 1284,3,"Abbott, Master. Eugene Joseph",male,13,0,2,C.A. 2673,20.25,,S 395 | 1285,2,"Gilbert, Mr. William",male,47,0,0,C.A. 30769,10.5,,S 396 | 1286,3,"Kink-Heilmann, Mr. Anton",male,29,3,1,315153,22.025,,S 397 | 1287,1,"Smith, Mrs. Lucien Philip (Mary Eloise Hughes)",female,18,1,0,13695,60,C31,S 398 | 1288,3,"Colbert, Mr. Patrick",male,24,0,0,371109,7.25,,Q 399 | 1289,1,"Frolicher-Stehli, Mrs. Maxmillian (Margaretha Emerentia Stehli)",female,48,1,1,13567,79.2,B41,C 400 | 1290,3,"Larsson-Rondberg, Mr. Edvard A",male,22,0,0,347065,7.775,,S 401 | 1291,3,"Conlon, Mr. Thomas Henry",male,31,0,0,21332,7.7333,,Q 402 | 1292,1,"Bonnell, Miss. Caroline",female,30,0,0,36928,164.8667,C7,S 403 | 1293,2,"Gale, Mr. Harry",male,38,1,0,28664,21,,S 404 | 1294,1,"Gibson, Miss. Dorothy Winifred",female,22,0,1,112378,59.4,,C 405 | 1295,1,"Carrau, Mr. Jose Pedro",male,17,0,0,113059,47.1,,S 406 | 1296,1,"Frauenthal, Mr. Isaac Gerald",male,43,1,0,17765,27.7208,D40,C 407 | 1297,2,"Nourney, Mr. Alfred (Baron von Drachstedt"")""",male,20,0,0,SC/PARIS 2166,13.8625,D38,C 408 | 1298,2,"Ware, Mr. William Jeffery",male,23,1,0,28666,10.5,,S 409 | 1299,1,"Widener, Mr. George Dunton",male,50,1,1,113503,211.5,C80,C 410 | 1300,3,"Riordan, Miss. Johanna Hannah""""",female,,0,0,334915,7.7208,,Q 411 | 1301,3,"Peacock, Miss. Treasteall",female,3,1,1,SOTON/O.Q. 3101315,13.775,,S 412 | 1302,3,"Naughton, Miss. Hannah",female,,0,0,365237,7.75,,Q 413 | 1303,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37,1,0,19928,90,C78,Q 414 | 1304,3,"Henriksson, Miss. Jenny Lovisa",female,28,0,0,347086,7.775,,S 415 | 1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.05,,S 416 | 1306,1,"Oliva y Ocana, Dona. Fermina",female,39,0,0,PC 17758,108.9,C105,C 417 | 1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.25,,S 418 | 1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.05,,S 419 | 1309,3,"Peter, Master. Michael J",male,,1,1,2668,22.3583,,C 420 | -------------------------------------------------------------------------------- /docker/.env: -------------------------------------------------------------------------------- 1 | COMPOSE_PROJECT_NAME=handson-ml2 2 | -------------------------------------------------------------------------------- /docker/Dockerfile: -------------------------------------------------------------------------------- 1 | FROM continuumio/miniconda3:latest 2 | 3 | RUN apt-get update && apt-get upgrade -y \ 4 | && apt-get install -y \ 5 | libpq-dev \ 6 | build-essential \ 7 | git \ 8 | sudo \ 9 | cmake zlib1g-dev libjpeg-dev xvfb ffmpeg xorg-dev libboost-all-dev libsdl2-dev swig \ 10 | unzip zip \ 11 | && rm -rf /var/lib/apt/lists/* 12 | 13 | COPY environment.yml /tmp/ 14 | RUN conda update -y -n base conda \ 15 | && echo ' - pyvirtualdisplay' >> /tmp/environment.yml \ 16 | && conda env create -f /tmp/environment.yml \ 17 | && conda clean -y -t \ 18 | && rm /tmp/environment.yml 19 | 20 | ARG username 21 | ARG userid 22 | 23 | ARG home=/home/${username} 24 | ARG workdir=${home}/handson-ml2 25 | 26 | RUN adduser ${username} --uid ${userid} --gecos '' --disabled-password \ 27 | && echo "${username} ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/${username} \ 28 | && chmod 0440 /etc/sudoers.d/${username} 29 | 30 | WORKDIR ${workdir} 31 | RUN chown ${username}:${username} ${workdir} 32 | 33 | USER ${username} 34 | WORKDIR ${workdir} 35 | 36 | ENV PATH /opt/conda/envs/tf2/bin:$PATH 37 | 38 | # The config below enables diffing notebooks with nbdiff (and nbdiff support 39 | # in git diff command) after connecting to the container by "make exec" (or 40 | # "docker-compose exec handson-ml2 bash") 41 | # You may also try running: 42 | # nbdiff NOTEBOOK_NAME.ipynb 43 | # to get nbdiff between checkpointed version and current version of the 44 | # given notebook. 45 | 46 | RUN git-nbdiffdriver config --enable --global 47 | 48 | # INFO: Optionally uncomment any (one) of the following RUN commands below to ignore either 49 | # metadata or details in nbdiff within git diff 50 | #RUN git config --global diff.jupyternotebook.command 'git-nbdiffdriver diff --ignore-metadata' 51 | RUN git config --global diff.jupyternotebook.command 'git-nbdiffdriver diff --ignore-details' 52 | 53 | 54 | COPY docker/bashrc.bash /tmp/ 55 | RUN cat /tmp/bashrc.bash >> ${home}/.bashrc \ 56 | && echo "export PATH=\"${workdir}/docker/bin:$PATH\"" >> ${home}/.bashrc \ 57 | && sudo rm /tmp/bashrc.bash 58 | 59 | 60 | # INFO: Uncomment lines below to enable automatic save of python-only and html-only 61 | # exports alongside the notebook 62 | #COPY docker/jupyter_notebook_config.py /tmp/ 63 | #RUN cat /tmp/jupyter_notebook_config.py >> ${home}/.jupyter/jupyter_notebook_config.py 64 | #RUN sudo rm /tmp/jupyter_notebook_config.py 65 | 66 | 67 | # INFO: Uncomment the RUN command below to disable git diff paging 68 | #RUN git config --global core.pager '' 69 | 70 | 71 | # INFO: Uncomment the RUN command below for easy and constant notebook URL (just localhost:8888) 72 | # That will switch Jupyter to using empty password instead of a token. 73 | # To avoid making a security hole you SHOULD in fact not only uncomment but 74 | # regenerate the hash for your own non-empty password and replace the hash below. 75 | # You can compute a password hash in any notebook, just run the code: 76 | # from notebook.auth import passwd 77 | # passwd() 78 | # and take the hash from the output 79 | #RUN mkdir -p ${home}/.jupyter && \ 80 | # echo 'c.NotebookApp.password = u"sha1:c6bbcba2d04b:f969e403db876dcfbe26f47affe41909bd53392e"' \ 81 | # >> ${home}/.jupyter/jupyter_notebook_config.py 82 | -------------------------------------------------------------------------------- /docker/Makefile: -------------------------------------------------------------------------------- 1 | 2 | help: 3 | cat Makefile 4 | run: 5 | docker-compose up 6 | exec: 7 | docker-compose exec handson-ml2 bash 8 | build: stop .FORCE 9 | docker-compose build 10 | rebuild: stop .FORCE 11 | docker-compose build --force-rm 12 | stop: 13 | docker stop handson-ml2 || true; docker rm handson-ml2 || true; 14 | .FORCE: 15 | -------------------------------------------------------------------------------- /docker/README.md: -------------------------------------------------------------------------------- 1 | 2 | # Hands-on Machine Learning in Docker 3 | 4 | This is the Docker configuration which allows you to run and tweak the book's notebooks without installing any dependencies on your machine!
OK, any except `docker` and `docker-compose`.
And optionally `make`.
And a few more things if you want GPU support (see below for details). 5 | 6 | ## Prerequisites 7 | 8 | Follow the instructions on [Install Docker](https://docs.docker.com/engine/installation/) and [Install Docker Compose](https://docs.docker.com/compose/install/) for your environment if you haven't got `docker` and `docker-compose` already. 9 | 10 | Some general knowledge about `docker` infrastructure might be useful (that's an interesting topic on its own) but is not strictly *required* to just run the notebooks. 11 | 12 | ## Usage 13 | 14 | ### Prepare the image (once) 15 | 16 | The first option is to pull the image from Docker Hub (this will download about 1.9 GB of compressed data): 17 | 18 | ```bash 19 | $ docker pull ageron/handson-ml2 20 | ``` 21 | 22 | **Note**: this is the CPU-only image. For GPU support, read the GPU section below. 23 | 24 | Alternatively, you can build the image yourself. This will be slower, but it will ensure the image is up to date, with the latest libraries. For this, assuming you already downloaded this project into the directory `/path/to/project/handson-ml2`: 25 | 26 | ```bash 27 | $ cd /path/to/project/handson-ml2/docker 28 | $ docker-compose build 29 | ``` 30 | 31 | This will take quite a while, but is only required once. 32 | 33 | After the process is finished you have an `ageron/handson-ml2:latest` image, that will be the base for your experiments. You can confirm that by running the following command: 34 | 35 | ```bash 36 | $ docker images 37 | REPOSITORY TAG IMAGE ID CREATED SIZE 38 | ageron/handson-ml2 latest 3ebafebc604a 2 minutes ago 4.87GB 39 | ``` 40 | 41 | ### Run the notebooks 42 | 43 | Still assuming you already downloaded this project into the directory `/path/to/project/handson-ml2`, run the following commands to start the Jupyter server inside the container, which is named `handson-ml2`: 44 | 45 | ```bash 46 | $ cd /path/to/project/handson-ml2/docker 47 | $ docker-compose up 48 | ``` 49 | 50 | Next, just point your browser to the URL printed on the screen (or go to if you enabled password authentication inside the `jupyter_notebook_config.py` file, before building the image) and you're ready to play with the book's code! 51 | 52 | The server runs in the directory containing the notebooks, and the changes you make from the browser will be persisted there. 53 | 54 | You can close the server just by pressing `Ctrl-C` in the terminal window. 55 | 56 | ### Using `make` (optional) 57 | 58 | If you have `make` installed on your computer, you can use it as a thin layer to run `docker-compose` commands. For example, executing `make rebuild` will actually run `docker-compose build --no-cache`, which will rebuild the image without using the cache. This ensures that your image is based on the latest version of the `continuumio/miniconda3` image which the `ageron/handson-ml2` image is based on. 59 | 60 | If you don't have `make` (and you don't want to install it), just examine the contents of `Makefile` to see which `docker-compose` commands you can run instead. 61 | 62 | ### Run additional commands in the container 63 | 64 | Run `make exec` (or `docker-compose exec handson-ml2 bash`) while the server is running to run an additional `bash` shell inside the `handson-ml2` container. Now you're inside the environment prepared within the image. 65 | 66 | One of the useful things that can be done there would be starting TensorBoard (for example with simple `tb` command, see bashrc file). 67 | 68 | Another one may be comparing versions of the notebooks using the `nbdiff` command if you haven't got `nbdime` installed locally (it is **way** better than plain `diff` for notebooks). See [Tools for diffing and merging of Jupyter notebooks](https://github.com/jupyter/nbdime) for more details. 69 | 70 | You can see changes you made relative to the version in git using `git diff` which is integrated with `nbdiff`. 71 | 72 | You may also try `nbd NOTEBOOK_NAME.ipynb` command (custom, see bashrc file) to compare one of your notebooks with its `checkpointed` version.
73 | To be precise, the output will tell you *what modifications should be re-played on the **manually saved** version of the notebook (located in `.ipynb_checkpoints` subdirectory) to update it to the **current** i.e. **auto-saved** version (given as command's argument - located in working directory)*. 74 | 75 | ## GPU Support on Linux (experimental) 76 | 77 | ### Prerequisites 78 | 79 | If you're running on Linux, and you have a TensorFlow-compatible GPU card (NVidia card with Compute Capability ≥ 3.5) that you would like TensorFlow to use inside the Docker container, then you should download and install the latest driver for your card from [nvidia.com](https://www.nvidia.com/Download/index.aspx?lang=en-us). You will also need to install [NVidia Docker support](https://github.com/NVIDIA/nvidia-docker): if you are using Docker 19.03 or above, you must install the `nvidia-container-toolkit` package, and for earlier versions, you must install `nvidia-docker2`. 80 | 81 | Next, edit the `docker-compose.yml` file: 82 | 83 | ```bash 84 | $ cd /path/to/project/handson-ml2/docker 85 | $ edit docker-compose.yml # use your favorite editor 86 | ``` 87 | 88 | * Replace `dockerfile: ./docker/Dockerfile` with `dockerfile: ./docker/Dockerfile.gpu` 89 | * Replace `image: ageron/handson-ml2:latest` with `image: ageron/handson-ml2:latest-gpu` 90 | * If you want to use `docker-compose`, you will need version 1.28 or above for GPU support, and you must uncomment the whole `deploy` section in `docker-compose.yml`. 91 | 92 | ### Prepare the image (once) 93 | 94 | If you want to pull the prebuilt image from Docker Hub (this will download over 3.5 GB of compressed data): 95 | 96 | ```bash 97 | $ docker pull ageron/handson-ml2:latest-gpu 98 | ``` 99 | 100 | If you prefer to build the image yourself: 101 | 102 | ```bash 103 | $ cd /path/to/project/handson-ml2/docker 104 | $ docker-compose build 105 | ``` 106 | 107 | ### Run the notebooks with `docker-compose` (version 1.28 or above) 108 | 109 | If you have `docker-compose` version 1.28 or above, that's great! You can simply run: 110 | 111 | ```bash 112 | $ cd /path/to/project/handson-ml2/docker 113 | $ docker-compose up 114 | [...] 115 | or http://127.0.0.1:8888/?token=[...] 116 | ``` 117 | 118 | Then point your browser to the URL and Jupyter should appear. If you then open or create a notebook and execute the following code, a list containing your GPU device(s) should be displayed (success!): 119 | 120 | ```python 121 | import tensorflow as tf 122 | 123 | tf.config.list_physical_devices("GPU") 124 | ``` 125 | 126 | To stop the server, just press Ctrl-C. 127 | 128 | ### Run the notebooks without `docker-compose` 129 | 130 | If you have a version of `docker-compose` earlier than 1.28, you will have to use `docker run` directly. 131 | 132 | If you are using Docker 19.03 or above, you can run: 133 | 134 | ```bash 135 | $ cd /path/to/project/handson-ml2 136 | $ docker run --name handson-ml2 --gpus all -p 8888:8888 -p 6006:6006 --log-opt mode=non-blocking --log-opt max-buffer-size=50m -v `pwd`:/home/devel/handson-ml2 ageron/handson-ml2:latest-gpu /opt/conda/envs/tf2/bin/jupyter notebook --ip='0.0.0.0' --port=8888 --no-browser 137 | ``` 138 | 139 | If you are using an older version of Docker, then replace `--gpus all` with `--runtime=nvidia`. 140 | 141 | Now point your browser to the displayed URL: Jupyter should appear, and you can open a notebook and run `import tensorflow as tf` and `tf.config.list_physical_devices("GPU)` as above to confirm that TensorFlow does indeed see your GPU device(s). 142 | 143 | Lastly, to interrupt the server, press Ctrl-C, then run: 144 | 145 | ```bash 146 | $ docker rm handson-ml2 147 | ``` 148 | 149 | This will remove the container so you can start a new one later (but it will not remove the image or the notebooks, don't worry!). 150 | 151 | Have fun! 152 | -------------------------------------------------------------------------------- /docker/bashrc.bash: -------------------------------------------------------------------------------- 1 | alias ll="ls -alF" 2 | alias nbd="nbdiff_checkpoint" 3 | alias tb="tensorboard --logdir=tf_logs" 4 | -------------------------------------------------------------------------------- /docker/bin/nbclean_checkpoints: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import collections 4 | import glob 5 | import hashlib 6 | import os 7 | import subprocess 8 | 9 | 10 | class NotebookAnalyser: 11 | 12 | def __init__(self, dry_run=False, verbose=False, colorful=False): 13 | self._dry_run = dry_run 14 | self._verbose = verbose 15 | self._colors = collections.defaultdict(lambda: "") 16 | if colorful: 17 | for color in [ 18 | NotebookAnalyser.COLOR_WHITE, 19 | NotebookAnalyser.COLOR_RED, 20 | NotebookAnalyser.COLOR_GREEN, 21 | NotebookAnalyser.COLOR_YELLOW, 22 | ]: 23 | self._colors[color] = "\033[{}m".format(color) 24 | 25 | NOTEBOOK_SUFFIX = ".ipynb" 26 | CHECKPOINT_DIR = NOTEBOOK_SUFFIX + "_checkpoints" 27 | CHECKPOINT_MASK = "*-checkpoint" + NOTEBOOK_SUFFIX 28 | CHECKPOINT_MASK_LEN = len(CHECKPOINT_MASK) - 1 29 | 30 | @staticmethod 31 | def get_hash(file_path): 32 | with open(file_path, "rb") as input: 33 | hash = hashlib.md5() 34 | for chunk in iter(lambda: input.read(4096), b""): 35 | hash.update(chunk) 36 | return hash.hexdigest() 37 | 38 | MESSAGE_ORPHANED = "missing " 39 | MESSAGE_MODIFIED = "modified" 40 | MESSAGE_DELETED = "DELETING" 41 | 42 | COLOR_WHITE = "0" 43 | COLOR_RED = "31" 44 | COLOR_GREEN = "32" 45 | COLOR_YELLOW = "33" 46 | 47 | def log(self, message, file, color=COLOR_WHITE): 48 | color_on = self._colors[color] 49 | color_off = self._colors[NotebookAnalyser.COLOR_WHITE] 50 | print("{}{}{}: {}".format(color_on, message, color_off, file)) 51 | 52 | def clean_checkpoints(self, directory): 53 | for checkpoint_path in sorted(glob.glob(os.path.join(directory, NotebookAnalyser.CHECKPOINT_MASK))): 54 | 55 | workfile_dir = os.path.dirname(os.path.dirname(checkpoint_path)) 56 | workfile_name = os.path.basename(checkpoint_path)[:-NotebookAnalyser.CHECKPOINT_MASK_LEN] + NotebookAnalyser.NOTEBOOK_SUFFIX 57 | workfile_path = os.path.join(workfile_dir, workfile_name) 58 | 59 | status = "" 60 | if not os.path.isfile(workfile_path): 61 | if self._verbose: 62 | self.log(NotebookAnalyser.MESSAGE_ORPHANED, workfile_path, NotebookAnalyser.COLOR_RED) 63 | else: 64 | checkpoint_stat = os.stat(checkpoint_path) 65 | workfile_stat = os.stat(workfile_path) 66 | 67 | modified = workfile_stat.st_size != checkpoint_stat.st_size 68 | 69 | if not modified: 70 | checkpoint_hash = NotebookAnalyser.get_hash(checkpoint_path) 71 | workfile_hash = NotebookAnalyser.get_hash(workfile_path) 72 | modified = checkpoint_hash != workfile_hash 73 | 74 | if modified: 75 | if self._verbose: 76 | self.log(NotebookAnalyser.MESSAGE_MODIFIED, workfile_path, NotebookAnalyser.COLOR_YELLOW) 77 | else: 78 | self.log(NotebookAnalyser.MESSAGE_DELETED, checkpoint_path, NotebookAnalyser.COLOR_GREEN) 79 | if not self._dry_run: 80 | os.remove(checkpoint_path) 81 | 82 | if not self._dry_run and not os.listdir(directory): 83 | self.log(NotebookAnalyser.MESSAGE_DELETED, directory, NotebookAnalyser.COLOR_GREEN) 84 | os.rmdir(directory) 85 | 86 | def clean_checkpoints_recursively(self, directory): 87 | for (root, subdirs, files) in os.walk(directory): 88 | subdirs.sort() # INFO: traverse alphabetically 89 | if NotebookAnalyser.CHECKPOINT_DIR in subdirs: 90 | subdirs.remove(NotebookAnalyser.CHECKPOINT_DIR) # INFO: don't recurse there 91 | self.clean_checkpoints(os.path.join(root, NotebookAnalyser.CHECKPOINT_DIR)) 92 | 93 | 94 | def main(): 95 | import argparse 96 | parser = argparse.ArgumentParser(description="Remove checkpointed versions of those jupyter notebooks that are identical to their working copies.", 97 | epilog="""Notebooks will be reported as either 98 | "DELETED" if the working copy and checkpointed version are identical 99 | (checkpoint will be deleted), 100 | "missing" if there is a checkpoint but no corresponding working file can be found 101 | or "modified" if notebook and the checkpoint are not byte-to-byte identical. 102 | If removal of checkpoints results in empty ".ipynb_checkpoints" directory 103 | that directory is also deleted. 104 | """) #, formatter_class=argparse.RawDescriptionHelpFormatter) 105 | parser.add_argument("dirs", metavar="DIR", type=str, nargs="*", default=".", help="directories to search") 106 | parser.add_argument("-d", "--dry-run", action="store_true", help="only print messages, don't perform any removals") 107 | parser.add_argument("-v", "--verbose", action="store_true", help="verbose mode") 108 | parser.add_argument("-c", "--color", action="store_true", help="colorful mode") 109 | args = parser.parse_args() 110 | 111 | analyser = NotebookAnalyser(args.dry_run, args.verbose, args.color) 112 | for directory in args.dirs: 113 | analyser.clean_checkpoints_recursively(directory) 114 | 115 | if __name__ == "__main__": 116 | main() 117 | -------------------------------------------------------------------------------- /docker/bin/nbdiff_checkpoint: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | if [[ "$#" -lt 1 || "$1" =~ ^((-h)|(--help))$ ]] ; then 3 | echo "usage: nbdiff_checkpoint NOTEBOOK.ipynb" 4 | echo 5 | echo "Show differences between given jupyter notebook and its checkpointed version (in .ipynb_checkpoints subdirectory)" 6 | exit 7 | fi 8 | 9 | DIRNAME=$(dirname "$1") 10 | BASENAME=$(basename "$1" .ipynb) 11 | shift 12 | 13 | WORKING_COPY=$DIRNAME/$BASENAME.ipynb 14 | CHECKPOINT_COPY=$DIRNAME/.ipynb_checkpoints/$BASENAME-checkpoint.ipynb 15 | 16 | echo "----- Analysing how to change $CHECKPOINT_COPY into $WORKING_COPY -----" 17 | nbdiff "$CHECKPOINT_COPY" "$WORKING_COPY" --ignore-details "$@" 18 | -------------------------------------------------------------------------------- /docker/bin/rm_empty_subdirs: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import os 4 | 5 | def remove_empty_directories(initial_dir, 6 | allow_initial_delete=False, ignore_nonexistant_initial=False, 7 | dry_run=False, quiet=False): 8 | 9 | FORBIDDEN_SUBDIRS = set([".git"]) 10 | 11 | if not os.path.isdir(initial_dir) and not ignore_nonexistant_initial: 12 | raise RuntimeError("Initial directory '{}' not found!".format(initial_dir)) 13 | 14 | message = "removed" 15 | if dry_run: 16 | message = "to be " + message 17 | 18 | deleted = set() 19 | 20 | for (directory, subdirs, files) in os.walk(initial_dir, topdown=False): 21 | forbidden = False 22 | parent = directory 23 | while parent: 24 | parent, dirname = os.path.split(parent) 25 | if dirname in FORBIDDEN_SUBDIRS: 26 | forbidden = True 27 | break 28 | if forbidden: 29 | continue 30 | 31 | is_empty = len(files) < 1 and len(set([os.path.join(directory, s) for s in subdirs]) - deleted) < 1 32 | 33 | if is_empty and (initial_dir != directory or allow_initial_delete): 34 | if not quiet: 35 | print("{}: {}".format(message, directory)) 36 | deleted.add(directory) 37 | if not dry_run: 38 | os.rmdir(directory) 39 | 40 | def main(): 41 | import argparse 42 | parser = argparse.ArgumentParser(description="Remove empty directories recursively in subtree.") 43 | parser.add_argument("dir", metavar="DIR", type=str, nargs="+", help="directory to be searched") 44 | parser.add_argument("-r", "--allow-dir-removal", action="store_true", help="allow deletion of DIR itself") 45 | parser.add_argument("-i", "--ignore-nonexistent-dir", action="store_true", help="don't throw an error if DIR doesn't exist") 46 | parser.add_argument("-d", "--dry-run", action="store_true", help="only print messages, don't perform any removals") 47 | parser.add_argument("-q", "--quiet", action="store_true", help="don't print names of directories being removed") 48 | args = parser.parse_args() 49 | for directory in args.dir: 50 | remove_empty_directories(directory, args.allow_dir_removal, args.ignore_nonexistent_dir, 51 | args.dry_run, args.quiet) 52 | 53 | if __name__ == "__main__": 54 | main() 55 | -------------------------------------------------------------------------------- /docker/bin/tensorboard: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | python -m tensorboard.main "$@" 3 | -------------------------------------------------------------------------------- /docker/docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "3" 2 | services: 3 | handson-ml2: 4 | build: 5 | context: ../ 6 | dockerfile: ./docker/Dockerfile 7 | args: 8 | - username=devel 9 | - userid=1000 10 | container_name: handson-ml2 11 | image: handson-ml2 12 | restart: unless-stopped 13 | logging: 14 | driver: json-file 15 | options: 16 | max-size: 50m 17 | ports: 18 | - "8888:8888" 19 | - "6006:6006" 20 | volumes: 21 | - ../:/home/devel/handson-ml2 22 | command: /opt/conda/envs/tf2/bin/jupyter notebook --ip='0.0.0.0' --port=8888 --no-browser 23 | -------------------------------------------------------------------------------- /docker/jupyter_notebook_config.py: -------------------------------------------------------------------------------- 1 | import os 2 | import subprocess 3 | 4 | def export_script_and_view(model, os_path, contents_manager): 5 | if model["type"] != "notebook": 6 | return 7 | dir_name, file_name = os.path.split(os_path) 8 | file_base, file_ext = os.path.splitext(file_name) 9 | if file_base.startswith("Untitled"): 10 | return 11 | export_name = file_base if file_ext == ".ipynb" else file_name 12 | subprocess.check_call(["jupyter", "nbconvert", "--to", "script", file_name, "--output", export_name + "_script"], cwd=dir_name) 13 | subprocess.check_call(["jupyter", "nbconvert", "--to", "html", file_name, "--output", export_name + "_view"], cwd=dir_name) 14 | 15 | c.FileContentsManager.post_save_hook = export_script_and_view 16 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: tf2 2 | channels: 3 | - conda-forge 4 | - defaults 5 | dependencies: 6 | - box2d-py # used only in chapter 18, exercise 8 7 | - ftfy=6.0 # used only in chapter 16 by the transformers library 8 | - graphviz # used only in chapter 6 for dot files 9 | - ipython=7.28 # a powerful Python shell 10 | - ipywidgets=7.6 # optionally used only in chapter 12 for tqdm in Jupyter 11 | - joblib=0.14 # used only in chapter 2 to save/load Scikit-Learn models 12 | - jupyter=1.0 # to edit and run Jupyter notebooks 13 | - matplotlib=3.4 # beautiful plots. See tutorial tools_matplotlib.ipynb 14 | - nbdime=3.1 # optional tool to diff Jupyter notebooks 15 | - nltk=3.6 # optionally used in chapter 3, exercise 4 16 | - numexpr=2.7 # used only in the Pandas tutorial for numerical expressions 17 | - numpy=1.19 # Powerful n-dimensional arrays and numerical computing tools 18 | - opencv=4.5 # used only in chapter 18 by TF Agents for image preprocessing 19 | - pandas=1.3 # data analysis and manipulation tool 20 | - pillow=8.3 # image manipulation library, (used by matplotlib.image.imread) 21 | - pip # Python's package-management system 22 | - py-xgboost=1.4 # used only in chapter 7 for optimized Gradient Boosting 23 | - pyglet=1.5 # used only in chapter 18 to render environments 24 | - pyopengl=3.1 # used only in chapter 18 to render environments 25 | - python=3.8 # Python! Not using latest version as some libs lack support 26 | - python-graphviz # used only in chapter 6 for dot files 27 | #- pyvirtualdisplay=2.2 # used only in chapter 18 if on headless server 28 | - requests=2.26 # used only in chapter 19 for REST API queries 29 | - scikit-learn=1.0 # machine learning library 30 | - scipy=1.7 # scientific/technical computing library 31 | - tqdm=4.62 # a progress bar library 32 | - wheel # built-package format for pip 33 | - widgetsnbextension=3.5 # interactive HTML widgets for Jupyter notebooks 34 | - pip: 35 | - tensorboard-plugin-profile~=2.5.0 # profiling plugin for TensorBoard 36 | - tensorboard~=2.7.0 # TensorFlow's visualization toolkit 37 | - tensorflow-addons~=0.14.0 # used only in chapter 16 for a seq2seq impl. 38 | - tensorflow-datasets~=4.4.0 # datasets repository, ready to use 39 | - tensorflow-hub~=0.12.0 # trained ML models repository, ready to use 40 | - tensorflow-probability~=0.14.1 # Optional. Probability/Stats lib. 41 | - tensorflow-serving-api~=2.6.0 # or tensorflow-serving-api-gpu if gpu 42 | - tensorflow~=2.6.0 # Deep Learning library 43 | - tf-agents~=0.10.0 # Reinforcement Learning lib based on TensorFlow 44 | - tfx~=1.3.0 # platform to deploy production ML pipelines 45 | - transformers~=4.11.3 # Natural Language Processing lib for TF or PyTorch 46 | - urlextract~=1.4.0 # optionally used in chapter 3, exercise 4 47 | - gym[atari,accept-rom-license]~=0.21.0 # used only in chapter 18 48 | 49 | # Specific lib versions to avoid conflicts 50 | - attrs=20.3 51 | - click=7.1 52 | - packaging=20.9 53 | - six=1.15 54 | - typing-extensions=3.7 55 | -------------------------------------------------------------------------------- /extra_autodiff.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "kernelspec": { 6 | "display_name": "TensorFlow 2.4 on Python 3.8 & CUDA 11.1", 7 | "language": "python", 8 | "name": "python3" 9 | }, 10 | "language_info": { 11 | "codemirror_mode": { 12 | "name": "ipython", 13 | "version": 3 14 | }, 15 | "file_extension": ".py", 16 | "mimetype": "text/x-python", 17 | "name": "python", 18 | "nbconvert_exporter": "python", 19 | "pygments_lexer": "ipython3", 20 | "version": "3.8.7" 21 | }, 22 | "nav_menu": { 23 | "height": "603px", 24 | "width": "616px" 25 | }, 26 | "toc": { 27 | "navigate_menu": true, 28 | "number_sections": true, 29 | "sideBar": true, 30 | "threshold": 6, 31 | "toc_cell": false, 32 | "toc_section_display": "block", 33 | "toc_window_display": true 34 | }, 35 | "colab": { 36 | "name": "extra_autodiff.ipynb", 37 | "provenance": [] 38 | } 39 | }, 40 | "cells": [ 41 | { 42 | "cell_type": "markdown", 43 | "metadata": { 44 | "id": "xsVRRxcTUsWL" 45 | }, 46 | "source": [ 47 | "**부록 D – 자동 미분**" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": { 53 | "id": "vfg6xlLYUsWN" 54 | }, 55 | "source": [ 56 | "_이 노트북은 간단한 예제를 통해 여러 가지 자동 미분 기법의 작동 원리를 설명합니다._" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": { 62 | "id": "uYtbaDr3UsWN" 63 | }, 64 | "source": [ 65 | "\n", 66 | " \n", 69 | "
\n", 67 | " 구글 코랩에서 실행하기\n", 68 | "
" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": { 75 | "id": "j-_LVAW2UsWN" 76 | }, 77 | "source": [ 78 | "# 설정" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": { 84 | "id": "E1B3cWMhUsWO" 85 | }, 86 | "source": [ 87 | "# 소개" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": { 93 | "id": "T55fuaCkUsWO" 94 | }, 95 | "source": [ 96 | "파라미터 x와 y에 대한 함수 $f(x,y)=x^2y + y + 2$의 그래디언트를 계산한다고 가정합시다:" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "metadata": { 102 | "id": "YzqqI5Z9UsWO" 103 | }, 104 | "source": [ 105 | "def f(x,y):\n", 106 | " return x*x*y + y + 2" 107 | ], 108 | "execution_count": 1, 109 | "outputs": [] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": { 114 | "id": "-qoHgmIDUsWP" 115 | }, 116 | "source": [ 117 | "해석적으로 푸는 방법이 하나 있습니다:\n", 118 | "\n", 119 | "$\\dfrac{\\partial f}{\\partial x} = 2xy$\n", 120 | "\n", 121 | "$\\dfrac{\\partial f}{\\partial y} = x^2 + 1$" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "metadata": { 127 | "id": "SzNqLg5cUsWQ" 128 | }, 129 | "source": [ 130 | "def df(x,y):\n", 131 | " return 2*x*y, x*x + 1" 132 | ], 133 | "execution_count": 2, 134 | "outputs": [] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": { 139 | "id": "AVLVW9BbUsWQ" 140 | }, 141 | "source": [ 142 | "예를 들어 $\\dfrac{\\partial f}{\\partial x}(3,4) = 24$ 이고, $\\dfrac{\\partial f}{\\partial y}(3,4) = 10$ 입니다." 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "metadata": { 148 | "id": "m5RiyU4pUsWR", 149 | "outputId": "6873b519-2165-4af9-f9b3-9d4b91e25207", 150 | "colab": { 151 | "base_uri": "https://localhost:8080/" 152 | } 153 | }, 154 | "source": [ 155 | "df(3, 4)" 156 | ], 157 | "execution_count": 3, 158 | "outputs": [ 159 | { 160 | "output_type": "execute_result", 161 | "data": { 162 | "text/plain": [ 163 | "(24, 10)" 164 | ] 165 | }, 166 | "metadata": {}, 167 | "execution_count": 3 168 | } 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": { 174 | "id": "440rcV0cUsWR" 175 | }, 176 | "source": [ 177 | "완벽합니다! 2차 도함수(헤시안이라고도 부릅니다)를 위한 식도 구할 수 있습니다:\n", 178 | "\n", 179 | "$\\dfrac{\\partial^2 f}{\\partial x \\partial x} = \\dfrac{\\partial (2xy)}{\\partial x} = 2y$\n", 180 | "\n", 181 | "$\\dfrac{\\partial^2 f}{\\partial x \\partial y} = \\dfrac{\\partial (2xy)}{\\partial y} = 2x$\n", 182 | "\n", 183 | "$\\dfrac{\\partial^2 f}{\\partial y \\partial x} = \\dfrac{\\partial (x^2 + 1)}{\\partial x} = 2x$\n", 184 | "\n", 185 | "$\\dfrac{\\partial^2 f}{\\partial y \\partial y} = \\dfrac{\\partial (x^2 + 1)}{\\partial y} = 0$" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": { 191 | "id": "WxEb6FXTUsWS" 192 | }, 193 | "source": [ 194 | "x=3이고 y=4일 때, 헤시안은 각각 8, 6, 6, 0입니다. 위 식을 사용해 이를 계산해 보죠:" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "metadata": { 200 | "id": "rsynMVvRUsWS" 201 | }, 202 | "source": [ 203 | "def d2f(x, y):\n", 204 | " return [2*y, 2*x], [2*x, 0]" 205 | ], 206 | "execution_count": 4, 207 | "outputs": [] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "metadata": { 212 | "id": "D2XTZyONUsWS", 213 | "outputId": "0436c72a-d783-4e35-b291-e00949f3c9d7", 214 | "colab": { 215 | "base_uri": "https://localhost:8080/" 216 | } 217 | }, 218 | "source": [ 219 | "d2f(3, 4)" 220 | ], 221 | "execution_count": 5, 222 | "outputs": [ 223 | { 224 | "output_type": "execute_result", 225 | "data": { 226 | "text/plain": [ 227 | "([8, 6], [6, 0])" 228 | ] 229 | }, 230 | "metadata": {}, 231 | "execution_count": 5 232 | } 233 | ] 234 | }, 235 | { 236 | "cell_type": "markdown", 237 | "metadata": { 238 | "id": "ZvOlzmhfUsWT" 239 | }, 240 | "source": [ 241 | "좋습니다. 하지만 이렇게 하려면 수학 지식이 필요합니다. 이 경우에는 아주 어렵지 않지만 심층 신경망일 때 이런 식으로 도함수를 계산하는 것은 현실적으로 불가능합니다. 자동화해서 계산할 수 있는 여러 방법을 살펴 보겠습니다!" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": { 247 | "id": "25lBNu6NUsWT" 248 | }, 249 | "source": [ 250 | "# 수치 미분" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": { 256 | "id": "ReIsf4CNUsWT" 257 | }, 258 | "source": [ 259 | "여기서는 다음 식을 사용하여 그래디언트 근사값을 계산합니다. $\\dfrac{\\partial f}{\\partial x} = \\displaystyle{\\lim_{\\epsilon \\to 0}}\\dfrac{f(x+\\epsilon, y) - f(x, y)}{\\epsilon}$ (그리고 $\\dfrac{\\partial f}{\\partial y}$에 대해서도 비슷합니다)." 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "metadata": { 265 | "id": "5lyNyMidUsWT" 266 | }, 267 | "source": [ 268 | "def gradients(func, vars_list, eps=0.0001):\n", 269 | " partial_derivatives = []\n", 270 | " base_func_eval = func(*vars_list)\n", 271 | " for idx in range(len(vars_list)):\n", 272 | " tweaked_vars = vars_list[:]\n", 273 | " tweaked_vars[idx] += eps\n", 274 | " tweaked_func_eval = func(*tweaked_vars)\n", 275 | " derivative = (tweaked_func_eval - base_func_eval) / eps\n", 276 | " partial_derivatives.append(derivative)\n", 277 | " return partial_derivatives" 278 | ], 279 | "execution_count": 6, 280 | "outputs": [] 281 | }, 282 | { 283 | "cell_type": "code", 284 | "metadata": { 285 | "id": "iwQa6l_zUsWT" 286 | }, 287 | "source": [ 288 | "def df(x, y):\n", 289 | " return gradients(f, [x, y])" 290 | ], 291 | "execution_count": 7, 292 | "outputs": [] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "metadata": { 297 | "id": "RGZfP6ZeUsWU", 298 | "outputId": "7f38f180-ec9d-4ff3-8604-c1d2420fc00c", 299 | "colab": { 300 | "base_uri": "https://localhost:8080/" 301 | } 302 | }, 303 | "source": [ 304 | "df(3, 4)" 305 | ], 306 | "execution_count": 8, 307 | "outputs": [ 308 | { 309 | "output_type": "execute_result", 310 | "data": { 311 | "text/plain": [ 312 | "[24.000400000048216, 10.000000000047748]" 313 | ] 314 | }, 315 | "metadata": {}, 316 | "execution_count": 8 317 | } 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": { 323 | "id": "O0dQ-3pvUsWU" 324 | }, 325 | "source": [ 326 | "잘 작동하네요!" 327 | ] 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "metadata": { 332 | "id": "d7055sIvUsWU" 333 | }, 334 | "source": [ 335 | "이 방식의 장점은 헤시안 계산이 쉽다는 것입니다. 먼저 1차 편도함수(야코비안이라고도 부릅니다)를 계산하는 함수를 만듭니다:" 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "metadata": { 341 | "id": "0N-N98oxUsWU", 342 | "outputId": "adb25554-e28f-4dd4-dbf6-1f15e3c1910d", 343 | "colab": { 344 | "base_uri": "https://localhost:8080/" 345 | } 346 | }, 347 | "source": [ 348 | "def dfdx(x, y):\n", 349 | " return gradients(f, [x,y])[0]\n", 350 | "\n", 351 | "def dfdy(x, y):\n", 352 | " return gradients(f, [x,y])[1]\n", 353 | "\n", 354 | "dfdx(3., 4.), dfdy(3., 4.)" 355 | ], 356 | "execution_count": 9, 357 | "outputs": [ 358 | { 359 | "output_type": "execute_result", 360 | "data": { 361 | "text/plain": [ 362 | "(24.000400000048216, 10.000000000047748)" 363 | ] 364 | }, 365 | "metadata": {}, 366 | "execution_count": 9 367 | } 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": { 373 | "id": "FVD6zl5cUsWU" 374 | }, 375 | "source": [ 376 | "이제 간단하게 이 함수에 `grandients()` 함수를 적용하면 됩니다:" 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "metadata": { 382 | "id": "mRmcEnCgUsWU" 383 | }, 384 | "source": [ 385 | "def d2f(x, y):\n", 386 | " return [gradients(dfdx, [3., 4.]), gradients(dfdy, [3., 4.])]" 387 | ], 388 | "execution_count": 10, 389 | "outputs": [] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "metadata": { 394 | "id": "Tqml4TdWUsWU", 395 | "outputId": "a48bb434-ccff-4fdf-d833-fae7daf6028b", 396 | "colab": { 397 | "base_uri": "https://localhost:8080/" 398 | } 399 | }, 400 | "source": [ 401 | "d2f(3, 4)" 402 | ], 403 | "execution_count": 11, 404 | "outputs": [ 405 | { 406 | "output_type": "execute_result", 407 | "data": { 408 | "text/plain": [ 409 | "[[7.999999951380232, 6.000099261882497],\n", 410 | " [6.000099261882497, -1.4210854715202004e-06]]" 411 | ] 412 | }, 413 | "metadata": {}, 414 | "execution_count": 11 415 | } 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": { 421 | "id": "jmNTAZA_UsWV" 422 | }, 423 | "source": [ 424 | "모두 잘 계산되었지만 이 결과는 근사값입니다. $n$개의 변수에 대한 함수의 그래디언트를 계산하러면 이 함수를 $n$번 호출해야 합니다. 심층 신경망에서는 경사 하강법을 사용해 수정할 파라미터가 수천 개가 있기 때문에 이런 방법은 매우 느릴 수 있습니다(경사 하강법은 각 파라미터에 대한 손실 함수의 그래디언트를 계산해야 합니다)." 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": { 430 | "id": "ToQUq6EUUsWV" 431 | }, 432 | "source": [ 433 | "## 간단한 계산 그래프 구현하기" 434 | ] 435 | }, 436 | { 437 | "cell_type": "markdown", 438 | "metadata": { 439 | "id": "JpSVmoeXUsWV" 440 | }, 441 | "source": [ 442 | "수치적인 방법 대신에 기호 미분 기법을 구현해 보죠. 이를 위해 상수, 변수, 연산을 표현할 클래스를 정의하겠습니다." 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "metadata": { 448 | "id": "YePfA_78UsWV" 449 | }, 450 | "source": [ 451 | "class Const(object):\n", 452 | " def __init__(self, value):\n", 453 | " self.value = value\n", 454 | " def evaluate(self):\n", 455 | " return self.value\n", 456 | " def __str__(self):\n", 457 | " return str(self.value)\n", 458 | "\n", 459 | "class Var(object):\n", 460 | " def __init__(self, name, init_value=0):\n", 461 | " self.value = init_value\n", 462 | " self.name = name\n", 463 | " def evaluate(self):\n", 464 | " return self.value\n", 465 | " def __str__(self):\n", 466 | " return self.name\n", 467 | "\n", 468 | "class BinaryOperator(object):\n", 469 | " def __init__(self, a, b):\n", 470 | " self.a = a\n", 471 | " self.b = b\n", 472 | "\n", 473 | "class Add(BinaryOperator):\n", 474 | " def evaluate(self):\n", 475 | " return self.a.evaluate() + self.b.evaluate()\n", 476 | " def __str__(self):\n", 477 | " return \"{} + {}\".format(self.a, self.b)\n", 478 | "\n", 479 | "class Mul(BinaryOperator):\n", 480 | " def evaluate(self):\n", 481 | " return self.a.evaluate() * self.b.evaluate()\n", 482 | " def __str__(self):\n", 483 | " return \"({}) * ({})\".format(self.a, self.b)" 484 | ], 485 | "execution_count": 12, 486 | "outputs": [] 487 | }, 488 | { 489 | "cell_type": "markdown", 490 | "metadata": { 491 | "id": "k7HKCt6WUsWV" 492 | }, 493 | "source": [ 494 | "좋습니다. 이제 함수 $f$를 나타내는 계산 그래프를 만들 수 있습니다:" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "metadata": { 500 | "id": "nyfWj-i1UsWV" 501 | }, 502 | "source": [ 503 | "x = Var(\"x\")\n", 504 | "y = Var(\"y\")\n", 505 | "f = Add(Mul(Mul(x, x), y), Add(y, Const(2))) # f(x,y) = x²y + y + 2" 506 | ], 507 | "execution_count": 13, 508 | "outputs": [] 509 | }, 510 | { 511 | "cell_type": "markdown", 512 | "metadata": { 513 | "id": "nhEwBml5UsWV" 514 | }, 515 | "source": [ 516 | "이 그래프를 실행하여 어떤 포인트에서도 $f$를 계산할 수 있습니다. 예를 들면 $f(3, 4)$는 다음과 같습니다." 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "metadata": { 522 | "id": "qZm-2r-PUsWV", 523 | "outputId": "b38d7697-99fc-42c2-bf65-1d90b07c4860", 524 | "colab": { 525 | "base_uri": "https://localhost:8080/" 526 | } 527 | }, 528 | "source": [ 529 | "x.value = 3\n", 530 | "y.value = 4\n", 531 | "f.evaluate()" 532 | ], 533 | "execution_count": 14, 534 | "outputs": [ 535 | { 536 | "output_type": "execute_result", 537 | "data": { 538 | "text/plain": [ 539 | "42" 540 | ] 541 | }, 542 | "metadata": {}, 543 | "execution_count": 14 544 | } 545 | ] 546 | }, 547 | { 548 | "cell_type": "markdown", 549 | "metadata": { 550 | "id": "e9QDzB7wUsWW" 551 | }, 552 | "source": [ 553 | "완벽한 정답을 찾았네요." 554 | ] 555 | }, 556 | { 557 | "cell_type": "markdown", 558 | "metadata": { 559 | "id": "JvH4JChwUsWW" 560 | }, 561 | "source": [ 562 | "## 그래디언트 계산하기" 563 | ] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "metadata": { 568 | "id": "R7B_IRLbUsWW" 569 | }, 570 | "source": [ 571 | "여기서 제시할 자동 미분 방법은 모두 *연쇄 법칙(chain rule)*을 기반으로 합니다." 572 | ] 573 | }, 574 | { 575 | "cell_type": "markdown", 576 | "metadata": { 577 | "id": "nX11YymxUsWW" 578 | }, 579 | "source": [ 580 | "두 개의 함수 $u$와 $v$가 있고 어떤 입력 $x$에 연속적으로 적용하여 결과 $v$를 얻었다고 가정합시다. 즉, $z = v(u(x))$이고, $z = v(s)$와 $s = u(x)$로 나누어 쓸 수 있습니다. 연쇄 법칙을 적용하면 입력 $x$에 대한 출력 $z$의 편도 함수를 계산할 수 있습니다:\n", 581 | "\n", 582 | "$ \\dfrac{\\partial z}{\\partial x} = \\dfrac{\\partial s}{\\partial x} \\cdot \\dfrac{\\partial z}{\\partial s}$" 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": { 588 | "id": "DcZe8qtmUsWW" 589 | }, 590 | "source": [ 591 | "$z$가 중간 출력이 $s_1, s_2, ..., s_n$인 연속 함수의 출력이라면, 연쇄 법칙이 다음과 같이 적용됩니다:\n", 592 | "\n", 593 | "$ \\dfrac{\\partial z}{\\partial x} = \\dfrac{\\partial s_1}{\\partial x} \\cdot \\dfrac{\\partial s_2}{\\partial s_1} \\cdot \\dfrac{\\partial s_3}{\\partial s_2} \\cdot \\dots \\cdot \\dfrac{\\partial s_{n-1}}{\\partial s_{n-2}} \\cdot \\dfrac{\\partial s_n}{\\partial s_{n-1}} \\cdot \\dfrac{\\partial z}{\\partial s_n}$" 594 | ] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": { 599 | "id": "DOOC06BVUsWW" 600 | }, 601 | "source": [ 602 | "전진 모드 자동 미분에서는 알고리즘이 이 항들을 \"진행 순서대로\"(즉, 출력 $z$을 계산하기 위해 필요한 계산 순서와 동일하게), 즉 왼쪽에서 오른쪽으로 계산합니다. 먼저 $\\dfrac{\\partial s_1}{\\partial x}$를 계산하고, 그다음 $\\dfrac{\\partial s_2}{\\partial s_1}$을 계산하는 식입니다. 후진 모드 자동 미분에서는 알고리즘이 이 항들을 \"진행 반대 순서로\", 즉 오른쪽에서 왼쪽으로 계산합니다. 먼저 $\\dfrac{\\partial z}{\\partial s_n}$을 계산하고, 그다음 $\\dfrac{\\partial s_n}{\\partial s_{n-1}}$을 계산하는 식입니다.\n", 603 | "\n", 604 | "예를 들어, x=3에서 함수 $z(x)=\\sin(x^2)$의 도함수를 전진 모드 자동 미분을 사용하여 계산한다고 가정합시다. 알고리즘은 먼저 편도함수 $\\dfrac{\\partial s_1}{\\partial x}=\\dfrac{\\partial x^2}{\\partial x}=2x=6$을 계산합니다. 다음, $\\dfrac{\\partial z}{\\partial x}=\\dfrac{\\partial s_1}{\\partial x}\\cdot\\dfrac{\\partial z}{\\partial s_1}= 6 \\cdot \\dfrac{\\partial \\sin(s_1)}{\\partial s_1}=6 \\cdot \\cos(s_1)=6 \\cdot \\cos(3^2)\\approx-5.46$을 계산합니다." 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": { 610 | "id": "ogF_TUwtUsWW" 611 | }, 612 | "source": [ 613 | "앞서 정의한 `gradients()` 함수를 사용해 결과를 검증해 보겠습니다:" 614 | ] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "metadata": { 619 | "id": "Xe-VsmamUsWX", 620 | "outputId": "244d0cd5-85a0-467a-8fcb-59f2982c3a24", 621 | "colab": { 622 | "base_uri": "https://localhost:8080/" 623 | } 624 | }, 625 | "source": [ 626 | "from math import sin\n", 627 | "\n", 628 | "def z(x):\n", 629 | " return sin(x**2)\n", 630 | "\n", 631 | "gradients(z, [3])" 632 | ], 633 | "execution_count": 15, 634 | "outputs": [ 635 | { 636 | "output_type": "execute_result", 637 | "data": { 638 | "text/plain": [ 639 | "[-5.46761419430053]" 640 | ] 641 | }, 642 | "metadata": {}, 643 | "execution_count": 15 644 | } 645 | ] 646 | }, 647 | { 648 | "cell_type": "markdown", 649 | "metadata": { 650 | "id": "6AKeI6VDUsWX" 651 | }, 652 | "source": [ 653 | "훌륭하네요. 이제 후진 모드 자동 미분을 사용해 동일한 계산을 해보겠습니다. 이번에는 알고리즘이 오른쪽부터 시작하므로 $\\dfrac{\\partial z}{\\partial s_1} = \\dfrac{\\partial \\sin(s_1)}{\\partial s_1}=\\cos(s_1)=\\cos(3^2)\\approx -0.91$을 계산합니다. 다음 $\\dfrac{\\partial z}{\\partial x}=\\dfrac{\\partial s_1}{\\partial x}\\cdot\\dfrac{\\partial z}{\\partial s_1} \\approx \\dfrac{\\partial s_1}{\\partial x} \\cdot -0.91 = \\dfrac{\\partial x^2}{\\partial x} \\cdot -0.91=2x \\cdot -0.91 = 6\\cdot-0.91=-5.46$을 계산합니다." 654 | ] 655 | }, 656 | { 657 | "cell_type": "markdown", 658 | "metadata": { 659 | "id": "h0cQz3UTUsWX" 660 | }, 661 | "source": [ 662 | "당연히 두 방법 모두 같은 결과를 냅니다(반올림 오차는 제외하고). 하나의 입력과 하나의 출력이 있는 경우에는 둘 다 동일한 횟수의 계산이 필요합니다. 하지만 입력과 출력의 개수가 여러 개이면 두 방법의 성능이 매우 달라집니다. 입력이 많다면 가장 오른쪽에 있는 항은 각 입력마다 편도 함수를 계산하기 위해 필요할 것입니다. 그러므로 가장 오른쪽에 있는 항을 먼저 계산하는 것이 좋습니다. 이것은 후진 모드 자동 미분을 의미합니다. 가장 오른쪽의 항을 한번 계산해서 모든 편도 함수를 계산하는데 사용할 수 있습니다. 반대로 출력이 많을 경우에는 가장 왼쪽의 항을 한번 계산해서 여러 출력의 편도 함수를 계산할 수 있는 전진 모드가 더 좋습니다. 딥러닝에서는 전형적으로 수천 개의 모델 파라미터가 있고 입력은 많지만 출력은 적습니다. 사실 훈련하는 동안 일반적으로 출력은 손실 단 하나입니다. 그래서 텐서플로와 주요 딥러닝 라이브러리들은 후진 모드 자동 미분을 사용합니다." 663 | ] 664 | }, 665 | { 666 | "cell_type": "markdown", 667 | "metadata": { 668 | "id": "KutUQgyfUsWX" 669 | }, 670 | "source": [ 671 | "후진 모드 자동 미분에는 복잡도가 한가지 추가됩니다. $s_i$의 값은 일반적으로 $\\dfrac{\\partial s_{i+1}}{\\partial s_i}$를 계산할 때 필요하고, $s_i$는 먼저 $s_{i-1}$를 계산해야 합니다. 이는 또 $s_{i-2}$를 계산해야 하는 식입니다. 그래서 $s_1$, $s_2$, $s_3$, $\\dots$, $s_{n-1}$ 그리고 $s_n$를 계산하기 위해 기본적으로 전진 방향으로 한번 네트워크를 실행해야 합니다. 그다음에 알고리즘이 오른쪽에서 왼쪽으로 편도 함수를 계산할 수 있습니다. RAM에 모든 $s_i$의 중간값을 저장하는 것은 가끔 문제가 됩니다. 특히 이미지를 다룰 때와 RAM이 부족한 GPU를 사용할 때 입니다. 이 문제를 완화하기 위해 신경망의 층 개수를 줄이거나, 텐서플로가 GPU RAM에서 CPU RAM으로 중간값들을 스왑(swap)하도록 설정할 수 있습니다. 다른 방법은 홀수 번째 중간값인 $s_1$, $s_3$, $s_5$, $\\dots$, $s_{n-4}$, $s_{n-2}$ 그리고 $s_n$만 캐싱하는 것입니다. 알고리즘이 편도 함수를 계산할 때 중간값 $s_i$가 없으면, 이전 중간값 $s_{i-1}$를 사용하여 다시 계산해야 합니다. 이는 CPU와 RAM 사이의 트레이드오프입니다(관심이 있다면 [이 논문](https://pdfs.semanticscholar.org/f61e/9fd5a4878e1493f7a6b03774a61c17b7e9a4.pdf)을 확인해 보세요)." 672 | ] 673 | }, 674 | { 675 | "cell_type": "markdown", 676 | "metadata": { 677 | "id": "gekxz0VwUsWX" 678 | }, 679 | "source": [ 680 | "### 전진 모드 자동 미분" 681 | ] 682 | }, 683 | { 684 | "cell_type": "code", 685 | "metadata": { 686 | "id": "roLEdZEvUsWX" 687 | }, 688 | "source": [ 689 | "Const.gradient = lambda self, var: Const(0)\n", 690 | "Var.gradient = lambda self, var: Const(1) if self is var else Const(0)\n", 691 | "Add.gradient = lambda self, var: Add(self.a.gradient(var), self.b.gradient(var))\n", 692 | "Mul.gradient = lambda self, var: Add(Mul(self.a, self.b.gradient(var)), Mul(self.a.gradient(var), self.b))\n", 693 | "\n", 694 | "x = Var(name=\"x\", init_value=3.)\n", 695 | "y = Var(name=\"y\", init_value=4.)\n", 696 | "f = Add(Mul(Mul(x, x), y), Add(y, Const(2))) # f(x,y) = x²y + y + 2\n", 697 | "\n", 698 | "dfdx = f.gradient(x) # 2xy\n", 699 | "dfdy = f.gradient(y) # x² + 1" 700 | ], 701 | "execution_count": 16, 702 | "outputs": [] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "metadata": { 707 | "id": "RCk9TSp5UsWX", 708 | "outputId": "e281cb2d-89ea-442b-8720-aee8e0e68573", 709 | "colab": { 710 | "base_uri": "https://localhost:8080/" 711 | } 712 | }, 713 | "source": [ 714 | "dfdx.evaluate(), dfdy.evaluate()" 715 | ], 716 | "execution_count": 17, 717 | "outputs": [ 718 | { 719 | "output_type": "execute_result", 720 | "data": { 721 | "text/plain": [ 722 | "(24.0, 10.0)" 723 | ] 724 | }, 725 | "metadata": {}, 726 | "execution_count": 17 727 | } 728 | ] 729 | }, 730 | { 731 | "cell_type": "markdown", 732 | "metadata": { 733 | "id": "ozxB9bXVUsWY" 734 | }, 735 | "source": [ 736 | "`gradient()` 메서드의 출력은 완전한 기호 미분이므로 1차 도함수에 국한되지 않고 2차 도함수도 계산할 수 있습니다:" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "metadata": { 742 | "id": "QFuzfLxFUsWY" 743 | }, 744 | "source": [ 745 | "d2fdxdx = dfdx.gradient(x) # 2y\n", 746 | "d2fdxdy = dfdx.gradient(y) # 2x\n", 747 | "d2fdydx = dfdy.gradient(x) # 2x\n", 748 | "d2fdydy = dfdy.gradient(y) # 0" 749 | ], 750 | "execution_count": 18, 751 | "outputs": [] 752 | }, 753 | { 754 | "cell_type": "code", 755 | "metadata": { 756 | "id": "PSUfXZqMUsWY", 757 | "outputId": "dd3d214c-fde8-42d7-d936-6751bb43c1c1", 758 | "colab": { 759 | "base_uri": "https://localhost:8080/" 760 | } 761 | }, 762 | "source": [ 763 | "[[d2fdxdx.evaluate(), d2fdxdy.evaluate()],\n", 764 | " [d2fdydx.evaluate(), d2fdydy.evaluate()]]" 765 | ], 766 | "execution_count": 19, 767 | "outputs": [ 768 | { 769 | "output_type": "execute_result", 770 | "data": { 771 | "text/plain": [ 772 | "[[8.0, 6.0], [6.0, 0.0]]" 773 | ] 774 | }, 775 | "metadata": {}, 776 | "execution_count": 19 777 | } 778 | ] 779 | }, 780 | { 781 | "cell_type": "markdown", 782 | "metadata": { 783 | "id": "ly3X1l2rUsWY" 784 | }, 785 | "source": [ 786 | "결과는 근사값이 아니고 완벽하게 맞습니다(물론 컴퓨터의 부동 소수 정밀도 한계까지만)." 787 | ] 788 | }, 789 | { 790 | "cell_type": "markdown", 791 | "metadata": { 792 | "id": "sDPLzj4dUsWY" 793 | }, 794 | "source": [ 795 | "### 이원수(dual number)를 사용한 전진 모드 자동 미분" 796 | ] 797 | }, 798 | { 799 | "cell_type": "markdown", 800 | "metadata": { 801 | "id": "HXx54pXkUsWY" 802 | }, 803 | "source": [ 804 | "전진 모드 자동 미분을 적용하는 좋은 한가지 방법은 [이원수](https://ko.wikipedia.org/wiki/%EC%9D%B4%EC%9B%90%EC%88%98_(%EC%88%98%ED%95%99))를 사용하는 것입니다. 간단하게 말하면 이원수 $z$는 $z = a + b\\epsilon$의 형태를 가집니다. 여기에서 $a$와 $b$는 실수입니다. $\\epsilon$은 아주 작은 양수 이지만 모든 실수보다 작기 때문에 $\\epsilon^2=0$입니다. $f(x + \\epsilon) = f(x) + \\dfrac{\\partial f}{\\partial x}\\epsilon$로 쓸 수 있으므로, $f(x + \\epsilon)$를 계산하여 $f(x)$와 $x$에 대한 $f$의 편도 함수를 구할 수 있습니다." 805 | ] 806 | }, 807 | { 808 | "cell_type": "markdown", 809 | "metadata": { 810 | "id": "xOMkg90aUsWY" 811 | }, 812 | "source": [ 813 | "이원수는 자체적인 산술 규칙을 가집니다. 일반적으로 매우 직관적입니다. 예를 들면:\n", 814 | "\n", 815 | "**덧셈**\n", 816 | "\n", 817 | "$(a_1 + b_1\\epsilon) + (a_2 + b_2\\epsilon) = (a_1 + a_2) + (b_1 + b_2)\\epsilon$\n", 818 | "\n", 819 | "**뺄셈**\n", 820 | "\n", 821 | "$(a_1 + b_1\\epsilon) - (a_2 + b_2\\epsilon) = (a_1 - a_2) + (b_1 - b_2)\\epsilon$\n", 822 | "\n", 823 | "**곱셈**\n", 824 | "\n", 825 | "$(a_1 + b_1\\epsilon) \\times (a_2 + b_2\\epsilon) = (a_1 a_2) + (a_1 b_2 + a_2 b_1)\\epsilon + b_1 b_2\\epsilon^2 = (a_1 a_2) + (a_1b_2 + a_2b_1)\\epsilon$\n", 826 | "\n", 827 | "**나눗셈**\n", 828 | "\n", 829 | "$\\dfrac{a_1 + b_1\\epsilon}{a_2 + b_2\\epsilon} = \\dfrac{a_1 + b_1\\epsilon}{a_2 + b_2\\epsilon} \\cdot \\dfrac{a_2 - b_2\\epsilon}{a_2 - b_2\\epsilon} = \\dfrac{a_1 a_2 + (b_1 a_2 - a_1 b_2)\\epsilon - b_1 b_2\\epsilon^2}{{a_2}^2 + (a_2 b_2 - a_2 b_2)\\epsilon - {b_2}^2\\epsilon} = \\dfrac{a_1}{a_2} + \\dfrac{a_1 b_2 - b_1 a_2}{{a_2}^2}\\epsilon$\n", 830 | "\n", 831 | "**거듭제곱**\n", 832 | "\n", 833 | "$(a + b\\epsilon)^n = a^n + (n a^{n-1}b)\\epsilon$\n", 834 | "\n", 835 | "등." 836 | ] 837 | }, 838 | { 839 | "cell_type": "markdown", 840 | "metadata": { 841 | "id": "UAqhlew4UsWZ" 842 | }, 843 | "source": [ 844 | "이원수를 표현할 클래스를 만들고 몇 개의 연산(덧셈과 곱셈)을 구현해 보죠. 필요하면 다른 연산을 더 추가해도 됩니다." 845 | ] 846 | }, 847 | { 848 | "cell_type": "code", 849 | "metadata": { 850 | "id": "WQWye-2EUsWZ" 851 | }, 852 | "source": [ 853 | "class DualNumber(object):\n", 854 | " def __init__(self, value=0.0, eps=0.0):\n", 855 | " self.value = value\n", 856 | " self.eps = eps\n", 857 | " def __add__(self, b):\n", 858 | " return DualNumber(self.value + self.to_dual(b).value,\n", 859 | " self.eps + self.to_dual(b).eps)\n", 860 | " def __radd__(self, a):\n", 861 | " return self.to_dual(a).__add__(self)\n", 862 | " def __mul__(self, b):\n", 863 | " return DualNumber(self.value * self.to_dual(b).value,\n", 864 | " self.eps * self.to_dual(b).value + self.value * self.to_dual(b).eps)\n", 865 | " def __rmul__(self, a):\n", 866 | " return self.to_dual(a).__mul__(self)\n", 867 | " def __str__(self):\n", 868 | " if self.eps:\n", 869 | " return \"{:.1f} + {:.1f}ε\".format(self.value, self.eps)\n", 870 | " else:\n", 871 | " return \"{:.1f}\".format(self.value)\n", 872 | " def __repr__(self):\n", 873 | " return str(self)\n", 874 | " @classmethod\n", 875 | " def to_dual(cls, n):\n", 876 | " if hasattr(n, \"value\"):\n", 877 | " return n\n", 878 | " else:\n", 879 | " return cls(n)" 880 | ], 881 | "execution_count": 20, 882 | "outputs": [] 883 | }, 884 | { 885 | "cell_type": "markdown", 886 | "metadata": { 887 | "id": "S8va8F9aUsWZ" 888 | }, 889 | "source": [ 890 | "$3 + (3 + 4 \\epsilon) = 6 + 4\\epsilon$" 891 | ] 892 | }, 893 | { 894 | "cell_type": "code", 895 | "metadata": { 896 | "id": "y-QsLWn1UsWZ", 897 | "outputId": "a346b1b1-9b20-4178-9606-4c50288feb65", 898 | "colab": { 899 | "base_uri": "https://localhost:8080/" 900 | } 901 | }, 902 | "source": [ 903 | "3 + DualNumber(3, 4)" 904 | ], 905 | "execution_count": 21, 906 | "outputs": [ 907 | { 908 | "output_type": "execute_result", 909 | "data": { 910 | "text/plain": [ 911 | "6.0 + 4.0ε" 912 | ] 913 | }, 914 | "metadata": {}, 915 | "execution_count": 21 916 | } 917 | ] 918 | }, 919 | { 920 | "cell_type": "markdown", 921 | "metadata": { 922 | "id": "Mwr64BLGUsWZ" 923 | }, 924 | "source": [ 925 | "$(3 + 4ε)\\times(5 + 7ε)$ = $3 \\times 5 + 3 \\times 7ε + 4ε \\times 5 + 4ε \\times 7ε$ = $15 + 21ε + 20ε + 28ε^2$ = $15 + 41ε + 28 \\times 0$ = $15 + 41ε$" 926 | ] 927 | }, 928 | { 929 | "cell_type": "code", 930 | "metadata": { 931 | "id": "ZstGLpWgUsWZ", 932 | "outputId": "54772437-1f73-49ea-d4ef-5c97a55c787b", 933 | "colab": { 934 | "base_uri": "https://localhost:8080/" 935 | } 936 | }, 937 | "source": [ 938 | "DualNumber(3, 4) * DualNumber(5, 7)" 939 | ], 940 | "execution_count": 22, 941 | "outputs": [ 942 | { 943 | "output_type": "execute_result", 944 | "data": { 945 | "text/plain": [ 946 | "15.0 + 41.0ε" 947 | ] 948 | }, 949 | "metadata": {}, 950 | "execution_count": 22 951 | } 952 | ] 953 | }, 954 | { 955 | "cell_type": "markdown", 956 | "metadata": { 957 | "id": "RWQTPUEIUsWZ" 958 | }, 959 | "source": [ 960 | "이제 이원수가 우리가 만든 계산 프레임워크와 함께 쓸 수 있는지 확인해 보죠:" 961 | ] 962 | }, 963 | { 964 | "cell_type": "code", 965 | "metadata": { 966 | "id": "P0YUvNa1UsWa", 967 | "outputId": "f8c13119-f8a1-459b-a6da-0a00990e430d", 968 | "colab": { 969 | "base_uri": "https://localhost:8080/" 970 | } 971 | }, 972 | "source": [ 973 | "x.value = DualNumber(3.0)\n", 974 | "y.value = DualNumber(4.0)\n", 975 | "\n", 976 | "f.evaluate()" 977 | ], 978 | "execution_count": 23, 979 | "outputs": [ 980 | { 981 | "output_type": "execute_result", 982 | "data": { 983 | "text/plain": [ 984 | "42.0" 985 | ] 986 | }, 987 | "metadata": {}, 988 | "execution_count": 23 989 | } 990 | ] 991 | }, 992 | { 993 | "cell_type": "markdown", 994 | "metadata": { 995 | "id": "RXtjqG7yUsWa" 996 | }, 997 | "source": [ 998 | "오, 잘 되네요. 이를 사용해 x=3이고 y=4에서 $x$와 $y$에 대한 $f$의 편도 함수를 계산해 보겠습니다:" 999 | ] 1000 | }, 1001 | { 1002 | "cell_type": "code", 1003 | "metadata": { 1004 | "id": "DZj4aFqOUsWa" 1005 | }, 1006 | "source": [ 1007 | "x.value = DualNumber(3.0, 1.0) # 3 + ε\n", 1008 | "y.value = DualNumber(4.0) # 4\n", 1009 | "\n", 1010 | "dfdx = f.evaluate().eps\n", 1011 | "\n", 1012 | "x.value = DualNumber(3.0) # 3\n", 1013 | "y.value = DualNumber(4.0, 1.0) # 4 + ε\n", 1014 | "\n", 1015 | "dfdy = f.evaluate().eps" 1016 | ], 1017 | "execution_count": 24, 1018 | "outputs": [] 1019 | }, 1020 | { 1021 | "cell_type": "code", 1022 | "metadata": { 1023 | "id": "swo2KWrUUsWa", 1024 | "outputId": "84274e12-c4c9-4603-f3a2-9d6318ec7612", 1025 | "colab": { 1026 | "base_uri": "https://localhost:8080/" 1027 | } 1028 | }, 1029 | "source": [ 1030 | "dfdx" 1031 | ], 1032 | "execution_count": 25, 1033 | "outputs": [ 1034 | { 1035 | "output_type": "execute_result", 1036 | "data": { 1037 | "text/plain": [ 1038 | "24.0" 1039 | ] 1040 | }, 1041 | "metadata": {}, 1042 | "execution_count": 25 1043 | } 1044 | ] 1045 | }, 1046 | { 1047 | "cell_type": "code", 1048 | "metadata": { 1049 | "id": "LY7Q3ZWjUsWa", 1050 | "outputId": "e8581ac2-de16-419f-e729-91246ae8b499", 1051 | "colab": { 1052 | "base_uri": "https://localhost:8080/" 1053 | } 1054 | }, 1055 | "source": [ 1056 | "dfdy" 1057 | ], 1058 | "execution_count": 26, 1059 | "outputs": [ 1060 | { 1061 | "output_type": "execute_result", 1062 | "data": { 1063 | "text/plain": [ 1064 | "10.0" 1065 | ] 1066 | }, 1067 | "metadata": {}, 1068 | "execution_count": 26 1069 | } 1070 | ] 1071 | }, 1072 | { 1073 | "cell_type": "markdown", 1074 | "metadata": { 1075 | "id": "4JjJBtVnUsWa" 1076 | }, 1077 | "source": [ 1078 | "훌륭합니다! 하지만 이 구현에서는 1차 도함수만 가능합니다. 이제 후진 모드를 살펴 보죠." 1079 | ] 1080 | }, 1081 | { 1082 | "cell_type": "markdown", 1083 | "metadata": { 1084 | "id": "McrNV9kWUsWa" 1085 | }, 1086 | "source": [ 1087 | "### 후진 모드 자동 미분" 1088 | ] 1089 | }, 1090 | { 1091 | "cell_type": "markdown", 1092 | "metadata": { 1093 | "id": "QBSrNOtYUsWa" 1094 | }, 1095 | "source": [ 1096 | "우리가 만든 간단한 프레임워크를 수정해서 후진 모드 자동 미분을 추가하겠습니다:" 1097 | ] 1098 | }, 1099 | { 1100 | "cell_type": "code", 1101 | "metadata": { 1102 | "id": "cKT8P6R_UsWa" 1103 | }, 1104 | "source": [ 1105 | "class Const(object):\n", 1106 | " def __init__(self, value):\n", 1107 | " self.value = value\n", 1108 | " def evaluate(self):\n", 1109 | " return self.value\n", 1110 | " def backpropagate(self, gradient):\n", 1111 | " pass\n", 1112 | " def __str__(self):\n", 1113 | " return str(self.value)\n", 1114 | "\n", 1115 | "class Var(object):\n", 1116 | " def __init__(self, name, init_value=0):\n", 1117 | " self.value = init_value\n", 1118 | " self.name = name\n", 1119 | " self.gradient = 0\n", 1120 | " def evaluate(self):\n", 1121 | " return self.value\n", 1122 | " def backpropagate(self, gradient):\n", 1123 | " self.gradient += gradient\n", 1124 | " def __str__(self):\n", 1125 | " return self.name\n", 1126 | "\n", 1127 | "class BinaryOperator(object):\n", 1128 | " def __init__(self, a, b):\n", 1129 | " self.a = a\n", 1130 | " self.b = b\n", 1131 | "\n", 1132 | "class Add(BinaryOperator):\n", 1133 | " def evaluate(self):\n", 1134 | " self.value = self.a.evaluate() + self.b.evaluate()\n", 1135 | " return self.value\n", 1136 | " def backpropagate(self, gradient):\n", 1137 | " self.a.backpropagate(gradient)\n", 1138 | " self.b.backpropagate(gradient)\n", 1139 | " def __str__(self):\n", 1140 | " return \"{} + {}\".format(self.a, self.b)\n", 1141 | "\n", 1142 | "class Mul(BinaryOperator):\n", 1143 | " def evaluate(self):\n", 1144 | " self.value = self.a.evaluate() * self.b.evaluate()\n", 1145 | " return self.value\n", 1146 | " def backpropagate(self, gradient):\n", 1147 | " self.a.backpropagate(gradient * self.b.value)\n", 1148 | " self.b.backpropagate(gradient * self.a.value)\n", 1149 | " def __str__(self):\n", 1150 | " return \"({}) * ({})\".format(self.a, self.b)" 1151 | ], 1152 | "execution_count": 27, 1153 | "outputs": [] 1154 | }, 1155 | { 1156 | "cell_type": "code", 1157 | "metadata": { 1158 | "id": "Lygo8NB3UsWa" 1159 | }, 1160 | "source": [ 1161 | "x = Var(\"x\", init_value=3)\n", 1162 | "y = Var(\"y\", init_value=4)\n", 1163 | "f = Add(Mul(Mul(x, x), y), Add(y, Const(2))) # f(x,y) = x²y + y + 2\n", 1164 | "\n", 1165 | "result = f.evaluate()\n", 1166 | "f.backpropagate(1.0)" 1167 | ], 1168 | "execution_count": 28, 1169 | "outputs": [] 1170 | }, 1171 | { 1172 | "cell_type": "code", 1173 | "metadata": { 1174 | "id": "vcakvvOEUsWb", 1175 | "outputId": "accd6753-bb81-4aa2-e13a-e60dc1d24bad", 1176 | "colab": { 1177 | "base_uri": "https://localhost:8080/" 1178 | } 1179 | }, 1180 | "source": [ 1181 | "print(f)" 1182 | ], 1183 | "execution_count": 29, 1184 | "outputs": [ 1185 | { 1186 | "output_type": "stream", 1187 | "text": [ 1188 | "((x) * (x)) * (y) + y + 2\n" 1189 | ], 1190 | "name": "stdout" 1191 | } 1192 | ] 1193 | }, 1194 | { 1195 | "cell_type": "code", 1196 | "metadata": { 1197 | "id": "8MITDXZGUsWb", 1198 | "outputId": "b6a9bde8-f295-40b1-92b3-f9a6072548ba", 1199 | "colab": { 1200 | "base_uri": "https://localhost:8080/" 1201 | } 1202 | }, 1203 | "source": [ 1204 | "result" 1205 | ], 1206 | "execution_count": 30, 1207 | "outputs": [ 1208 | { 1209 | "output_type": "execute_result", 1210 | "data": { 1211 | "text/plain": [ 1212 | "42" 1213 | ] 1214 | }, 1215 | "metadata": {}, 1216 | "execution_count": 30 1217 | } 1218 | ] 1219 | }, 1220 | { 1221 | "cell_type": "code", 1222 | "metadata": { 1223 | "id": "vkc6VmPGUsWb", 1224 | "outputId": "0b770093-e21a-4cc7-984a-c946c44ac630", 1225 | "colab": { 1226 | "base_uri": "https://localhost:8080/" 1227 | } 1228 | }, 1229 | "source": [ 1230 | "x.gradient" 1231 | ], 1232 | "execution_count": 31, 1233 | "outputs": [ 1234 | { 1235 | "output_type": "execute_result", 1236 | "data": { 1237 | "text/plain": [ 1238 | "24.0" 1239 | ] 1240 | }, 1241 | "metadata": {}, 1242 | "execution_count": 31 1243 | } 1244 | ] 1245 | }, 1246 | { 1247 | "cell_type": "code", 1248 | "metadata": { 1249 | "id": "jKqIYynRUsWb", 1250 | "outputId": "944bf0d7-efeb-45ae-fd8c-66739fa9b705", 1251 | "colab": { 1252 | "base_uri": "https://localhost:8080/" 1253 | } 1254 | }, 1255 | "source": [ 1256 | "y.gradient" 1257 | ], 1258 | "execution_count": 32, 1259 | "outputs": [ 1260 | { 1261 | "output_type": "execute_result", 1262 | "data": { 1263 | "text/plain": [ 1264 | "10.0" 1265 | ] 1266 | }, 1267 | "metadata": {}, 1268 | "execution_count": 32 1269 | } 1270 | ] 1271 | }, 1272 | { 1273 | "cell_type": "markdown", 1274 | "metadata": { 1275 | "id": "_vH5JWRSUsWb" 1276 | }, 1277 | "source": [ 1278 | "여기에서도 이 구현의 출력이 숫자이고 기호 표현(symbolic expressions)이 아니므로 1차 도함수로 제한이 됩니다. 그러나 값 대신 기호 표현을 반환하는 `backpropagate()` 메서드를 만들 수 있습니다. 이렇게 하면 2차 도함수(또 그 이상)를 계산할 수 있습니다. 이것이 텐서플로와 자동 미분을 구현한 모든 주요 딥러닝 라이브러리들의 방식입니다." 1279 | ] 1280 | }, 1281 | { 1282 | "cell_type": "markdown", 1283 | "metadata": { 1284 | "id": "7nrb-pnLUsWb" 1285 | }, 1286 | "source": [ 1287 | "### 텐서플로를 사용한 후진 모드 자동 미분" 1288 | ] 1289 | }, 1290 | { 1291 | "cell_type": "code", 1292 | "metadata": { 1293 | "id": "4sbLsJonUsWb" 1294 | }, 1295 | "source": [ 1296 | "import tensorflow as tf" 1297 | ], 1298 | "execution_count": 33, 1299 | "outputs": [] 1300 | }, 1301 | { 1302 | "cell_type": "code", 1303 | "metadata": { 1304 | "id": "AmRrpFcvUsWb", 1305 | "outputId": "34290079-0054-49d9-d7aa-d9be25b67b56", 1306 | "colab": { 1307 | "base_uri": "https://localhost:8080/" 1308 | } 1309 | }, 1310 | "source": [ 1311 | "x = tf.Variable(3.)\n", 1312 | "y = tf.Variable(4.)\n", 1313 | "\n", 1314 | "with tf.GradientTape() as tape:\n", 1315 | " f = x*x*y + y + 2\n", 1316 | "\n", 1317 | "jacobians = tape.gradient(f, [x, y])\n", 1318 | "jacobians" 1319 | ], 1320 | "execution_count": 34, 1321 | "outputs": [ 1322 | { 1323 | "output_type": "execute_result", 1324 | "data": { 1325 | "text/plain": [ 1326 | "[,\n", 1327 | " ]" 1328 | ] 1329 | }, 1330 | "metadata": {}, 1331 | "execution_count": 34 1332 | } 1333 | ] 1334 | }, 1335 | { 1336 | "cell_type": "markdown", 1337 | "metadata": { 1338 | "id": "8EK4NCMAUsWb" 1339 | }, 1340 | "source": [ 1341 | "전부 기호이기 때문에 2차 도함수와 그 이상도 계산할 수 있습니다." 1342 | ] 1343 | }, 1344 | { 1345 | "cell_type": "code", 1346 | "metadata": { 1347 | "id": "spZAweweUsWb", 1348 | "outputId": "e02cf7bd-a909-499a-8a6d-5ba8e9784c00", 1349 | "colab": { 1350 | "base_uri": "https://localhost:8080/" 1351 | } 1352 | }, 1353 | "source": [ 1354 | "x = tf.Variable(3.)\n", 1355 | "y = tf.Variable(4.)\n", 1356 | "\n", 1357 | "with tf.GradientTape(persistent=True) as tape:\n", 1358 | " f = x*x*y + y + 2\n", 1359 | " df_dx, df_dy = tape.gradient(f, [x, y])\n", 1360 | "\n", 1361 | "d2f_d2x, d2f_dydx = tape.gradient(df_dx, [x, y])\n", 1362 | "d2f_dxdy, d2f_d2y = tape.gradient(df_dy, [x, y])\n", 1363 | "del tape\n", 1364 | "\n", 1365 | "hessians = [[d2f_d2x, d2f_dydx], [d2f_dxdy, d2f_d2y]]\n", 1366 | "hessians" 1367 | ], 1368 | "execution_count": 35, 1369 | "outputs": [ 1370 | { 1371 | "output_type": "stream", 1372 | "text": [ 1373 | "WARNING:tensorflow:Calling GradientTape.gradient on a persistent tape inside its context is significantly less efficient than calling it outside the context (it causes the gradient ops to be recorded on the tape, leading to increased CPU and memory usage). Only call GradientTape.gradient inside the context if you actually want to trace the gradient in order to compute higher order derivatives.\n" 1374 | ], 1375 | "name": "stdout" 1376 | }, 1377 | { 1378 | "output_type": "execute_result", 1379 | "data": { 1380 | "text/plain": [ 1381 | "[[,\n", 1382 | " ],\n", 1383 | " [, None]]" 1384 | ] 1385 | }, 1386 | "metadata": {}, 1387 | "execution_count": 35 1388 | } 1389 | ] 1390 | }, 1391 | { 1392 | "cell_type": "markdown", 1393 | "metadata": { 1394 | "id": "7xHXl0J0UsWb" 1395 | }, 1396 | "source": [ 1397 | "그러나 의존하지 않는 변수에 대한 텐서의 도함수를 계산할 때, `grandients()` 함수가 0.0 대신 None을 반환합니다." 1398 | ] 1399 | }, 1400 | { 1401 | "cell_type": "markdown", 1402 | "metadata": { 1403 | "id": "NWKGr87iUsWc" 1404 | }, 1405 | "source": [ 1406 | "여기까지 해서 마치도록 하겠습니다! 이 노트북이 맘에 드시길 바랄께요." 1407 | ] 1408 | } 1409 | ] 1410 | } -------------------------------------------------------------------------------- /images/ann/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/autoencoders/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/classification/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/cnn/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/cnn/test_image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/images/cnn/test_image.png -------------------------------------------------------------------------------- /images/decision_trees/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/deep/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/distributed/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/end_to_end_project/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/end_to_end_project/california.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/images/end_to_end_project/california.png -------------------------------------------------------------------------------- /images/ensembles/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/fundamentals/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/nlp/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/rl/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/rl/breakout.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/images/rl/breakout.gif -------------------------------------------------------------------------------- /images/rnn/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/svm/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/tensorflow/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/training_linear_models/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/unsupervised_learning/README: -------------------------------------------------------------------------------- 1 | Images generated by the notebooks 2 | -------------------------------------------------------------------------------- /images/unsupervised_learning/ladybug.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rickiepark/handson-ml2/22f78e5e97141164f1ab7933dae54b09d6e47276/images/unsupervised_learning/ladybug.png -------------------------------------------------------------------------------- /index.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 핸즈온 머신러닝 2 노트북\n", 8 | "\n", 9 | "*핸즈온 머신러닝2에 오신 것을 환영합니다!*\n", 10 | "\n", 11 | "[필요한 기술](#Prerequisites) (아래 참조)\n", 12 | "\n", 13 | "## 노트북\n", 14 | "1. [한눈에 보는 머신러닝](01_the_machine_learning_landscape.ipynb)\n", 15 | "2. [머신러닝 프로젝트 처음부터 끝까지](02_end_to_end_machine_learning_project.ipynb)\n", 16 | "3. [분류](03_classification.ipynb)\n", 17 | "4. [모델 훈련](04_training_linear_models.ipynb)\n", 18 | "5. [서포트 벡터 머신](05_support_vector_machines.ipynb)\n", 19 | "6. [결정 트리](06_decision_trees.ipynb)\n", 20 | "7. [앙상블 학습과 랜덤 포레스트](07_ensemble_learning_and_random_forests.ipynb)\n", 21 | "8. [차원 축소](08_dimensionality_reduction.ipynb)\n", 22 | "9. [비지도 학습](09_unsupervised_learning.ipynb)\n", 23 | "10. [케라스를 사용한 인공 신경망 소개](10_neural_nets_with_keras.ipynb)\n", 24 | "11. [심층 신경망 훈련하기](11_training_deep_neural_networks.ipynb)\n", 25 | "12. [텐서플로를 사용한 사용자 정의 모델과 훈련](12_custom_models_and_training_with_tensorflow.ipynb)\n", 26 | "13. [텐서플로에서 데이터 적재와 전처리하기](13_loading_and_preprocessing_data.ipynb)\n", 27 | "14. [합성곱 신경망을 사용한 컴퓨터 비전](14_deep_computer_vision_with_cnns.ipynb)\n", 28 | "15. [RNN과 CNN을 사용해 시퀀스 처리하기](15_processing_sequences_using_rnns_and_cnns.ipynb)\n", 29 | "16. [RNN과 어텐션을 사용한 자연어 처리](16_nlp_with_rnns_and_attention.ipynb)\n", 30 | "17. [오토인코더와 GAN을 사용한 표현 학습과 생성적 학습](17_autoencoders.ipynb)\n", 31 | "18. [강화 학습](18_reinforcement_learning.ipynb)\n", 32 | "19. [대규모 텐서플로 모델 훈련과 배포](19_training_and_deploying_at_scale.ipynb)\n", 33 | "\n", 34 | "## 과학 파이썬 튜토리얼\n", 35 | "* [넘파이](tools_numpy.ipynb)\n", 36 | "* [맷플롯립](tools_matplotlib.ipynb) - 번역: [박찬성](https://github.com/deep-diver)\n", 37 | "* [판다스](tools_pandas.ipynb)\n", 38 | "\n", 39 | "## 수학 튜토리얼\n", 40 | "* [선형 대수](math_linear_algebra.ipynb)\n", 41 | "* [미분](math_differential_calculus.ipynb)\n", 42 | "\n", 43 | "## 부록\n", 44 | "* [자동 미분](extra_autodiff.ipynb)\n", 45 | "\n", 46 | "## 그외\n", 47 | "* [수식](book_equations.pdf) (이 책에 실린 수식 목록)\n" 48 | ] 49 | }, 50 | { 51 | "cell_type": "markdown", 52 | "metadata": { 53 | "collapsed": true 54 | }, 55 | "source": [ 56 | "## 필요한 기술\n", 57 | "\n", 58 | "### 이해를 하려면\n", 59 | "\n", 60 | "* **파이썬** – 파이썬 전문가일 필요는 없지만 기초는 알고 있어야 합니다. 파이썬에 대해 잘 모른다면 공식 [파이썬 튜토리얼](https://docs.python.org/3/tutorial/)로 시작해 보세요.\n", 61 | "* **과학 파이썬** – 넘파이, 맷플롯립, 판다스 같은 유명한 파이썬 라이브러리를 사용합니다. 이런 라이브러에 친숙하지 않다면 '과학 파이썬 튜토리얼' 섹션에 나열된 튜토리얼을 참고하세요(특히 넘파이).\n", 62 | "* **수학** – 이 책은 선형 대수, 미분, 통계, 확률 이론을 사용합니다. 난이도가 높지 않으므로 이전에 배운적이 있다면 잘 따라 갈 수 있을 것입니다. 하지만 이런 수학을 잘 모르거나 배운 것을 다시 정리하고 싶다면 '수학 튜토리얼' 섹션을 참고하세요.\n", 63 | "\n", 64 | "### 예제를 실행하려면\n", 65 | "\n", 66 | "* **주피터** – 이 노트북은 주피터 노트북입니다. 바인더, 딥노트, 코랩과 같은 호스팅 플랫폼을 사용해 한 번의 클릭으로 노트북을 실행할 수 있습니다(설치가 필요 없습니다). 또는 Jupyter.org의 뷰어로 노트북을 보거나 원한다면 자신의 컴퓨터에 모두 설치할 수도 있습니다. 자세한 내용은 [깃허브](https://github.com/rickiepark/handson-ml2/)를 참고하세요." 67 | ] 68 | } 69 | ], 70 | "metadata": { 71 | "kernelspec": { 72 | "display_name": "TensorFlow 2.4 on Python 3.8 & CUDA 11.1", 73 | "language": "python", 74 | "name": "python3" 75 | }, 76 | "language_info": { 77 | "codemirror_mode": { 78 | "name": "ipython", 79 | "version": 3 80 | }, 81 | "file_extension": ".py", 82 | "mimetype": "text/x-python", 83 | "name": "python", 84 | "nbconvert_exporter": "python", 85 | "pygments_lexer": "ipython3", 86 | "version": "3.8.8" 87 | }, 88 | "nav_menu": {}, 89 | "toc": { 90 | "navigate_menu": true, 91 | "number_sections": true, 92 | "sideBar": true, 93 | "threshold": 6, 94 | "toc_cell": false, 95 | "toc_section_display": "block", 96 | "toc_window_display": false 97 | } 98 | }, 99 | "nbformat": 4, 100 | "nbformat_minor": 1 101 | } 102 | -------------------------------------------------------------------------------- /ml-project-checklist.md: -------------------------------------------------------------------------------- 1 | 이 체크리스트는 머신러닝 프로젝트를 진행하기 위한 가이드입니다. 8개의 주요 단계는 다음 2 | 과 같습니다. 3 | 4 | 1. 문제를 정의하고 큰 그림을 그립니다. 5 | 2. 데이터를 수집합니다. 6 | 3. 통찰을 얻기 위해 데이터를 탐색합니다. 7 | 4. 데이터에 내재된 패턴이 머신러닝 알고리즘에 잘 드러나도록 데이터를 준비합니다. 8 | 5. 여러 다른 모델을 시험해보고 가장 좋은 몇 개를 고릅니다. 9 | 6. 모델을 세밀하게 튜닝하고 이들을 연결해 최선의 솔루션을 만듭니다. 10 | 7. 솔루션을 출시합니다. 11 | 8. 시스템을 론칭하고 모니터링, 유지 보수합니다. 12 | 13 | 당연히 이 체크리스트는 각자의 필요에 맞게 조정될 수 있습니다. 14 | 15 | # 문제를 정의하고 큰 그림을 그립니다 16 | 1. 목표를 비즈니스 용어로 정의합니다. 17 | 2. 이 솔루션은 어떻게 사용될 것인가? 18 | 3. (만약 있다면) 현재 솔루션이나 차선책은 무엇인가? 19 | 4. 어떤 문제라고 정의할 수 있나(지도/비지도, 온라인/오프라인 등)? 20 | 5. 성능을 어떻게 측정해야 하나? 21 | 6. 성능 지표가 비즈니스 목표에 연결되어 있나? 22 | 7. 비즈니스 목표에 도달하기 위해 필요한 최소한의 성능은 얼마인가? 23 | 8. 비슷한 문제가 있나? 이전의 방식이나 도구를 재사용할 수 있나? 24 | 9. 해당 분야의 전문가가 있나? 25 | 10. 수동으로 문제를 해결하는 방법은 무엇인가? 26 | 11. 여러분이(또는 다른 사람이) 세운 가정을 나열합니다. 27 | 12. 가능하면 가정을 검증합니다. 28 | 29 | # 데이터를 수집합니다 30 | 노트: 새로운 데이터를 쉽게 얻을 수 있도록 최대한 자동화하세요. 31 | 32 | 1. 필요한 데이터와 양을 나열합니다. 33 | 2. 데이터를 얻을 수 있는 곳을 찾아 기록합니다. 34 | 3. 얼마나 많은 공간이 필요한지 확인합니다. 35 | 4. 법률상의 의무가 있는지 확인하고 필요하다면 인가를 받습니다. 36 | 5. 접근 권한을 획득합니다. 37 | 6. 작업 환경을 만듭니다(충분한 저장 공간으로). 38 | 7. 데이터를 수집합니다. 39 | 8. 데이터를 조작하기 편리한 형태로 변환합니다(데이터 자체는 바꾸지 않습니다). 40 | 9. 민감한 정보가 삭제되었거나 보호되었는지 검증합니다(예를 들면 개인정보 비식별화). 41 | 10. 데이터의 크기와 타입(시계열, 표본, 지리정보 등)을 확인합니다. 42 | 11. 테스트 세트를 샘플링하여 따로 떼어놓고 절대 들여다보지 않습니다(데이터 염탐 금지!). 43 | 44 | # 데이터를 탐색합니다 45 | 노트: 이 단계에서는 해당 분야의 전문가에게 조언을 구하세요. 46 | 47 | 1. 데이터 탐색을 위해 복사본을 생성합니다(필요하면 샘플링하여 적절한 크기로 줄입니다). 48 | 2. 데이터 탐색 결과를 저장하기 위해 주피터 노트북을 만듭니다. 49 | 3. 각 특성의 특징을 조사합니다. 50 | - 이름 51 | - 타입(범주형, 정수/부동소수, 최댓값/최솟값 유무, 텍스트, 구조적인 문자열 등) 52 | - 누락된 값의 비율(%) 53 | - 잡음 정도와 잡음의 종류(확률적, 이상치, 반올림 에러 등) 54 | - 이 작업에 유용한 정도 55 | - 분포 형태(가우시안, 균등, 로그 등) 56 | 4. 지도 학습 작업이라면 타깃 속성을 구분합니다. 57 | 5. 데이터를 시각화합니다. 58 | 6. 특성 간의 상관관계를 조사합니다. 59 | 7. 수동으로 문제를 해결할 수 있는 방법을 찾아봅니다. 60 | 8. 적용이 가능한 변환을 찾습니다. 61 | 9. 추가로 유용한 데이터를 찾습니다(있다면 ‘데이터를 수집합니다’로 돌아갑니다) 62 | 10. 조사한 것을 기록합니다. 63 | 64 | # 데이터를 준비합니다 65 | 노트: 66 | - 데이터의 복사본으로 작업합니다(원본 데이터셋은 그대로 보관합니다). 67 | - 적용한 모든 데이터 변환은 함수로 만듭니다. 여기에는 다섯 가지 이유가 있습니다. 68 | - 다음에 새로운 데이터를 얻을 때 데이터 준비를 쉽게 할 수 있기 때문입니다. 69 | - 다음 프로젝트에 이 변환을 쉽게 적용할 수 있기 때문입니다. 70 | - 테스트 세트를 정제하고 변환하기 위해서입니다. 71 | - 솔루션이 서비스에 투입된 후 새로운 데이터 샘플을 정제하고 변환하기 위해서입니다. 72 | - 하이퍼파라미터로 준비 단계를 쉽게 선택하기 위해서입니다. 73 | 74 | 1. 데이터 정제 75 | - 이상치를 수정하거나 삭제합니다(선택사항). 76 | - 누락된 값을 채우거나(예를 들면 0이나 평균, 중간값 등으로), 그 행(또는 열)을 제거합니다. 77 | 2. 특성 선택(선택사항) 78 | - 작업에 유용하지 않은 정보를 가진 특성을 제거합니다. 79 | 3. 적절한 특성 공학 80 | - 연속 특성 이산화하기 81 | - 특성 분해하기(예를 들면 범주형, 날짜/시간 등) 82 | - 가능한 특성 변환 추가하기(예를 들면 log(x), sqrt(x), x^2 등) 83 | - 특성을 조합해 가능성 있는 새로운 특성 만들기 84 | 4. 특성 스케일 조정 85 | - 표준화 또는 정규화 86 | 87 | # 가능성 있는 몇 개의 모델을 고릅니다 88 | 노트: 89 | - 데이터가 매우 크면 여러 가지 모델을 일정 시간 안에 훈련시킬 수 있도록 데이터를 샘플링하여 작은 훈련 세 90 | 트를 만드는 것이 좋습니다(이렇게 하면 규모가 큰 신경망이나 랜덤 포레스트 같은 복잡한 모델은 만들기 어 91 | 렵습니다). 92 | - 여기에서도 가능한 한 최대로 이 단계들을 자동화합니다. 93 | 94 | 1. 여러 종류의 모델을 기본 매개변수를 사용해 신속하게 많이 훈련시켜봅니다(예를 들면 선 95 | 형 모델, 나이브 베이지, SVM, 랜덤 포레스트, 신경망 등) 96 | 2. 성능을 측정하고 비교합니다. 97 | - 각 모델에서 N-겹 교차 검증을 사용해 N개 폴드의 성능에 대한 평균과 표준 편차를 계산합니다. 98 | 3. 각 알고리즘에서 가장 두드러진 변수를 분석합니다. 99 | 4. 모델이 만드는 에러의 종류를 분석합니다. 100 | - 이 에러를 피하기 위해 사람이 사용하는 데이터는 무엇인가요? 101 | 5. 간단한 특성 선택과 특성 공학 단계를 수행합니다. 102 | 6. 이전 다섯 단계를 한 번이나 두 번 빠르게 반복합니다. 103 | 7. 다른 종류의 에러를 만드는 모델을 중심으로 가장 가능성이 높은 모델을 세 개에서 다섯 104 | 개 정도 추립니다. 105 | 106 | # 시스템을 세밀하게 튜닝합니다 107 | 노트: 108 | - 이 단계에서는 가능한 한 많은 데이터를 사용하는 것이 좋습니다. 특히 세부 튜닝의 마지막 단계로 갈수록 그 109 | 렇습니다. 110 | - 언제나 그렇듯이 할 수 있다면 자동화합니다. 111 | 112 | 1. 교차 검증을 사용해 하이퍼파라미터를 정밀 튜닝합니다. 113 | - 하이퍼파라미터를 사용해 데이터 변환을 선택하세요. 특히 확신이 없는 경우 이렇게 해야 합니다(예를 114 | 들어 누락된 값을 0으로 채울 것인가 아니면 중간값으로 채울 것인가? 아니면 그 행을 버릴 것인가?). 115 | - 탐색할 하이퍼파라미터의 값이 매우 적지 않다면 그리드 서치보다 랜덤 서치를 사용하세요. 훈련 시간 116 | 이 오래 걸린다면 베이지안 최적화 방법을 사용하는 것이 좋습니다(예를 들면 가우시안 프로세스 사전 117 | 확률(Gaussian process prior)(Jasper Snoek, Hugo Larochelle, and Ryan Adams 118 | ([https://goo.gl/PEFfGr](https://goo.gl/PEFfGr)))을 사용합니다). 119 | 2. 앙상블 방법을 시도해보세요. 최고의 모델들을 연결하면 종종 개별 모델을 실행하는 것보 120 | 다 더 성능이 높습니다. 121 | 3. 최종 모델에 확신이 선 후 일반화 오차를 추정하기 위해 테스트 세트에서 성능을 측정합니다. 122 | 123 | > 일반화 오차를 측정한 후에는 모델을 변경하지 마세요. 만약 그렇게 하면 테스트 세트에 과대 124 | 적합되기 시작할 것입니다. 125 | 126 | # 솔루션을 출시합니다 127 | 1. 지금까지의 작업을 문서화합니다. 128 | 2. 멋진 발표 자료를 만듭니다. 129 | - 먼저 큰 그림을 부각시킵니다. 130 | 3. 이 솔루션이 어떻게 비즈니스의 목표를 달성하는지 설명하세요. 131 | 4. 작업 과정에서 알게 된 흥미로운 점들을 잊지 말고 설명하세요. 132 | - 성공한 것과 그렇지 못한 것을 설명합니다. 133 | - 우리가 세운 가정과 시스템의 제약을 나열합니다. 134 | 5. 멋진 그래프나 기억하기 쉬운 문장으로 핵심 내용을 전달하세요(예를 들면 ‘중간 소득이 135 | 주택 가격에 대한 가장 중요한 예측 변수입니다.’). 136 | 137 | # 시스템을 론칭합니다! 138 | 1. 서비스에 투입하기 위해 솔루션을 준비합니다(실제 입력 데이터 연결, 단위 테스트 작성 등). 139 | 2. 시스템의 서비스 성능을 일정한 간격으로 확인하고 성능이 감소됐을 때 알림을 받기 위해 140 | 모니터링 코드를 작성합니다. 141 | - 아주 느리게 감소되는 현상을 주의하세요. 데이터가 변화함에 따라 모델이 점차 구식이 되는 경향이 있 142 | 습니다. 143 | - 성능 측정에 사람의 개입이 필요할지 모릅니다(예를 들면 크라우드소싱crowdsourcing 서비스를 통해서). 144 | - 입력 데이터의 품질도 모니터링합니다(예를 들어 오동작 센서가 무작위한 값을 보내거나, 다른 팀의 출 145 | 력 품질이 나쁜 경우). 온라인 학습 시스템의 경우 특히 중요합니다. 146 | 3. 정기적으로 새로운 데이터에서 모델을 다시 훈련시킵니다(가능한 한 자동화합니다). 147 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | # WARNING: Using Anaconda instead of pip is highly recommended, especially on 2 | # Windows or when using a GPU. Please see the installation instructions in 3 | # INSTALL.md 4 | 5 | 6 | ##### Core scientific packages 7 | jupyter~=1.0.0 8 | matplotlib~=3.4.3 9 | numpy~=1.19.5 10 | pandas~=1.3.3 11 | scipy~=1.7.1 12 | 13 | ##### Machine Learning packages 14 | scikit-learn~=1.0 15 | 16 | # Optional: the XGBoost library is only used in chapter 7 17 | xgboost~=1.4.2 18 | 19 | # Optional: the transformers library is only using in chapter 16 20 | transformers~=4.11.3 21 | 22 | ##### TensorFlow-related packages 23 | 24 | # If you have a TF-compatible GPU and you want to enable GPU support, then 25 | # replace tensorflow-serving-api with tensorflow-serving-api-gpu. 26 | # Your GPU must have CUDA Compute Capability 3.5 or higher support, and 27 | # you must install CUDA, cuDNN and more: see tensorflow.org for the detailed 28 | # installation instructions. 29 | 30 | tensorflow~=2.6.0 31 | # Optional: the TF Serving API library is just needed for chapter 19. 32 | tensorflow-serving-api~=2.6.0 # or tensorflow-serving-api-gpu if gpu 33 | 34 | tensorboard~=2.7.0 35 | tensorboard-plugin-profile~=2.5.0 36 | tensorflow-datasets~=4.4.0 37 | tensorflow-hub~=0.12.0 38 | tensorflow-probability~=0.14.1 39 | 40 | # Optional: only used in chapter 13. 41 | tfx~=1.3.0 42 | 43 | # Optional: only used in chapter 16. 44 | tensorflow-addons~=0.14.0 45 | 46 | ##### Reinforcement Learning library (chapter 18) 47 | 48 | # There are a few dependencies you need to install first, check out: 49 | # https://github.com/openai/gym#installing-everything 50 | gym[Box2D,atari,accept-rom-license]~=0.21.0 51 | 52 | # WARNING: on Windows, installing Box2D this way requires: 53 | # * Swig: http://www.swig.org/download.html 54 | # * Microsoft C++ Build Tools: https://visualstudio.microsoft.com/visual-cpp-build-tools/ 55 | # It's much easier to use Anaconda instead. 56 | 57 | tf-agents~=0.10.0 58 | 59 | ##### Image manipulation 60 | Pillow~=8.4.0 61 | graphviz~=0.17 62 | opencv-python~=4.5.3.56 63 | pyglet~=1.5.21 64 | 65 | #pyvirtualdisplay # needed in chapter 18, if on a headless server 66 | # (i.e., without screen, e.g., Colab or VM) 67 | 68 | 69 | ##### Additional utilities 70 | 71 | # Efficient jobs (caching, parallelism, persistence) 72 | joblib~=0.14.1 73 | 74 | # Easy http requests 75 | requests~=2.26.0 76 | 77 | # Nice utility to diff Jupyter Notebooks. 78 | nbdime~=3.1.0 79 | 80 | # May be useful with Pandas for complex "where" clauses (e.g., Pandas 81 | # tutorial). 82 | numexpr~=2.7.3 83 | 84 | # Optional: these libraries can be useful in the chapter 3, exercise 4. 85 | nltk~=3.6.5 86 | urlextract~=1.4.0 87 | 88 | # Optional: these libraries are only used in chapter 16 89 | ftfy~=6.0.3 90 | 91 | # Optional: tqdm displays nice progress bars, ipywidgets for tqdm's notebook support 92 | tqdm~=4.62.3 93 | ipywidgets~=7.6.5 94 | --------------------------------------------------------------------------------