├── .gitignore ├── Chapter10 - Intro to Convolutional Neural Networks - Learning Edges and Corners.ipynb ├── Chapter11 - Intro to Word Embeddings - Neural Networks that Understand Language.ipynb ├── Chapter12 - Intro to Recurrence - Predicting the Next Word.ipynb ├── Chapter13 - Intro to Automatic Differentiation - Let's Build A Deep Learning Framework.ipynb ├── Chapter14 - Exploding Gradients Examples.ipynb ├── Chapter14 - Intro to LSTMs - Learn to Write Like Shakespeare.ipynb ├── Chapter14 - Intro to LSTMs - Part 2 - Learn to Write Like Shakespeare.ipynb ├── Chapter15 - Intro to Federated Learning - Deep Learning on Unseen Data.ipynb ├── Chapter3 - Forward Propagation - Intro to Neural Prediction.ipynb ├── Chapter4 - Gradient Descent - Intro to Neural Learning.ipynb ├── Chapter5 - Generalizing Gradient Descent - Learning Multiple Weights at a Time.ipynb ├── Chapter6 - Intro to Backpropagation - Building Your First DEEP Neural Network.ipynb ├── Chapter8 - Intro to Regularization - Learning Signal and Ignoring Noise.ipynb ├── Chapter9 - Intro to Activation Functions - Modeling Probabilities.ipynb ├── MNISTPreprocessor.ipynb ├── README.md ├── docker-compose.yml ├── floyd.yml ├── ham.txt ├── labels.txt ├── reviews.txt ├── shakespear.txt ├── spam.txt └── tasksv11 ├── LICENSE ├── README ├── en ├── qa10_indefinite-knowledge_test.txt ├── qa10_indefinite-knowledge_train.txt ├── qa11_basic-coreference_test.txt ├── qa11_basic-coreference_train.txt ├── qa12_conjunction_test.txt ├── qa12_conjunction_train.txt ├── qa13_compound-coreference_test.txt ├── qa13_compound-coreference_train.txt ├── qa14_time-reasoning_test.txt ├── qa14_time-reasoning_train.txt ├── qa15_basic-deduction_test.txt ├── qa15_basic-deduction_train.txt ├── qa16_basic-induction_test.txt ├── qa16_basic-induction_train.txt ├── qa17_positional-reasoning_test.txt ├── qa17_positional-reasoning_train.txt ├── qa18_size-reasoning_test.txt ├── qa18_size-reasoning_train.txt ├── qa19_path-finding_test.txt ├── qa19_path-finding_train.txt ├── qa1_single-supporting-fact_test.txt ├── qa1_single-supporting-fact_train.txt ├── qa20_agents-motivations_test.txt ├── qa20_agents-motivations_train.txt ├── qa2_two-supporting-facts_test.txt ├── qa2_two-supporting-facts_train.txt ├── qa3_three-supporting-facts_test.txt ├── qa3_three-supporting-facts_train.txt ├── qa4_two-arg-relations_test.txt ├── qa4_two-arg-relations_train.txt ├── qa5_three-arg-relations_test.txt ├── qa5_three-arg-relations_train.txt ├── qa6_yes-no-questions_test.txt ├── qa6_yes-no-questions_train.txt ├── qa7_counting_test.txt ├── qa7_counting_train.txt ├── qa8_lists-sets_test.txt ├── qa8_lists-sets_train.txt ├── qa9_simple-negation_test.txt └── qa9_simple-negation_train.txt └── shuffled ├── qa10_indefinite-knowledge_test.txt ├── qa10_indefinite-knowledge_train.txt ├── qa11_basic-coreference_test.txt ├── qa11_basic-coreference_train.txt ├── qa12_conjunction_test.txt ├── qa12_conjunction_train.txt ├── qa13_compound-coreference_test.txt ├── qa13_compound-coreference_train.txt ├── qa14_time-reasoning_test.txt ├── qa14_time-reasoning_train.txt ├── qa15_basic-deduction_test.txt ├── qa15_basic-deduction_train.txt ├── qa16_basic-induction_test.txt ├── qa16_basic-induction_train.txt ├── qa17_positional-reasoning_test.txt ├── qa17_positional-reasoning_train.txt ├── qa18_size-reasoning_test.txt ├── qa18_size-reasoning_train.txt ├── qa19_path-finding_test.txt ├── qa19_path-finding_train.txt ├── qa1_single-supporting-fact_test.txt ├── qa1_single-supporting-fact_train.txt ├── qa20_agents-motivations_test.txt ├── qa20_agents-motivations_train.txt ├── qa2_two-supporting-facts_test.txt ├── qa2_two-supporting-facts_train.txt ├── qa3_three-supporting-facts_test.txt ├── qa3_three-supporting-facts_train.txt ├── qa4_two-arg-relations_test.txt ├── qa4_two-arg-relations_train.txt ├── qa5_three-arg-relations_test.txt ├── qa5_three-arg-relations_train.txt ├── qa6_yes-no-questions_test.txt ├── qa6_yes-no-questions_train.txt ├── qa7_counting_test.txt ├── qa7_counting_train.txt ├── qa8_lists-sets_test.txt ├── qa8_lists-sets_train.txt ├── qa9_simple-negation_test.txt └── qa9_simple-negation_train.txt /.gitignore: -------------------------------------------------------------------------------- 1 | 2 | # Created by https://www.gitignore.io/api/jupyternotebook 3 | 4 | ### JupyterNotebook ### 5 | .ipynb_checkpoints 6 | */.ipynb_checkpoints/* 7 | 8 | # Remove previous ipynb_checkpoints 9 | # git rm -r .ipynb_checkpoints/ 10 | # 11 | 12 | 13 | # End of https://www.gitignore.io/api/jupyternotebook 14 | -------------------------------------------------------------------------------- /Chapter10 - Intro to Convolutional Neural Networks - Learning Edges and Corners.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Upgrading our MNIST Network" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 2, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "\n", 20 | "I:0 Test-Acc:0.0288 Train-Acc:0.055\n", 21 | "I:1 Test-Acc:0.0273 Train-Acc:0.037\n", 22 | "I:2 Test-Acc:0.028 Train-Acc:0.037\n", 23 | "I:3 Test-Acc:0.0292 Train-Acc:0.04\n", 24 | "I:4 Test-Acc:0.0339 Train-Acc:0.046\n", 25 | "I:5 Test-Acc:0.0478 Train-Acc:0.068\n", 26 | "I:6 Test-Acc:0.076 Train-Acc:0.083\n", 27 | "I:7 Test-Acc:0.1316 Train-Acc:0.096\n", 28 | "I:8 Test-Acc:0.2137 Train-Acc:0.127\n", 29 | "I:9 Test-Acc:0.2941 Train-Acc:0.148\n", 30 | "I:10 Test-Acc:0.3563 Train-Acc:0.181\n", 31 | "I:11 Test-Acc:0.4023 Train-Acc:0.209\n", 32 | "I:12 Test-Acc:0.4358 Train-Acc:0.238\n", 33 | "I:13 Test-Acc:0.4473 Train-Acc:0.286\n", 34 | "I:14 Test-Acc:0.4389 Train-Acc:0.274\n", 35 | "I:15 Test-Acc:0.3951 Train-Acc:0.257\n", 36 | "I:16 Test-Acc:0.2222 Train-Acc:0.243\n", 37 | "I:17 Test-Acc:0.0613 Train-Acc:0.112\n", 38 | "I:18 Test-Acc:0.0266 Train-Acc:0.035\n", 39 | "I:19 Test-Acc:0.0127 Train-Acc:0.026\n", 40 | "I:20 Test-Acc:0.0133 Train-Acc:0.022\n", 41 | "I:21 Test-Acc:0.0185 Train-Acc:0.038\n", 42 | "I:22 Test-Acc:0.0363 Train-Acc:0.038\n", 43 | "I:23 Test-Acc:0.0928 Train-Acc:0.067\n", 44 | "I:24 Test-Acc:0.1994 Train-Acc:0.081\n", 45 | "I:25 Test-Acc:0.3086 Train-Acc:0.154\n", 46 | "I:26 Test-Acc:0.4276 Train-Acc:0.204\n", 47 | "I:27 Test-Acc:0.5323 Train-Acc:0.256\n", 48 | "I:28 Test-Acc:0.5919 Train-Acc:0.305\n", 49 | "I:29 Test-Acc:0.6324 Train-Acc:0.341\n", 50 | "I:30 Test-Acc:0.6608 Train-Acc:0.426\n", 51 | "I:31 Test-Acc:0.6815 Train-Acc:0.439\n", 52 | "I:32 Test-Acc:0.7048 Train-Acc:0.462\n", 53 | "I:33 Test-Acc:0.7171 Train-Acc:0.484\n", 54 | "I:34 Test-Acc:0.7313 Train-Acc:0.505\n", 55 | "I:35 Test-Acc:0.7355 Train-Acc:0.53\n", 56 | "I:36 Test-Acc:0.7417 Train-Acc:0.548\n", 57 | "I:37 Test-Acc:0.747 Train-Acc:0.534\n", 58 | "I:38 Test-Acc:0.7491 Train-Acc:0.55\n", 59 | "I:39 Test-Acc:0.7459 Train-Acc:0.562\n", 60 | "I:40 Test-Acc:0.7352 Train-Acc:0.54\n", 61 | "I:41 Test-Acc:0.7082 Train-Acc:0.496\n", 62 | "I:42 Test-Acc:0.6487 Train-Acc:0.456\n", 63 | "I:43 Test-Acc:0.5209 Train-Acc:0.353\n", 64 | "I:44 Test-Acc:0.3305 Train-Acc:0.234\n", 65 | "I:45 Test-Acc:0.2052 Train-Acc:0.174\n", 66 | "I:46 Test-Acc:0.2149 Train-Acc:0.136\n", 67 | "I:47 Test-Acc:0.2679 Train-Acc:0.171\n", 68 | "I:48 Test-Acc:0.3237 Train-Acc:0.172\n", 69 | "I:49 Test-Acc:0.3581 Train-Acc:0.186\n", 70 | "I:50 Test-Acc:0.4202 Train-Acc:0.21\n", 71 | "I:51 Test-Acc:0.5165 Train-Acc:0.223\n", 72 | "I:52 Test-Acc:0.6007 Train-Acc:0.262\n", 73 | "I:53 Test-Acc:0.6476 Train-Acc:0.308\n", 74 | "I:54 Test-Acc:0.676 Train-Acc:0.363\n", 75 | "I:55 Test-Acc:0.696 Train-Acc:0.402\n", 76 | "I:56 Test-Acc:0.7077 Train-Acc:0.434\n", 77 | "I:57 Test-Acc:0.7204 Train-Acc:0.441\n", 78 | "I:58 Test-Acc:0.7303 Train-Acc:0.475\n", 79 | "I:59 Test-Acc:0.7359 Train-Acc:0.475\n", 80 | "I:60 Test-Acc:0.7401 Train-Acc:0.525\n", 81 | "I:61 Test-Acc:0.7493 Train-Acc:0.517\n", 82 | "I:62 Test-Acc:0.7533 Train-Acc:0.517\n", 83 | "I:63 Test-Acc:0.7606 Train-Acc:0.538\n", 84 | "I:64 Test-Acc:0.7644 Train-Acc:0.554\n", 85 | "I:65 Test-Acc:0.7724 Train-Acc:0.57\n", 86 | "I:66 Test-Acc:0.7788 Train-Acc:0.586\n", 87 | "I:67 Test-Acc:0.7855 Train-Acc:0.595\n", 88 | "I:68 Test-Acc:0.7853 Train-Acc:0.591\n", 89 | "I:69 Test-Acc:0.7925 Train-Acc:0.605\n", 90 | "I:70 Test-Acc:0.7973 Train-Acc:0.64\n", 91 | "I:71 Test-Acc:0.8013 Train-Acc:0.621\n", 92 | "I:72 Test-Acc:0.8029 Train-Acc:0.626\n", 93 | "I:73 Test-Acc:0.8092 Train-Acc:0.631\n", 94 | "I:74 Test-Acc:0.8099 Train-Acc:0.638\n", 95 | "I:75 Test-Acc:0.8156 Train-Acc:0.661\n", 96 | "I:76 Test-Acc:0.8156 Train-Acc:0.639\n", 97 | "I:77 Test-Acc:0.8184 Train-Acc:0.65\n", 98 | "I:78 Test-Acc:0.8216 Train-Acc:0.67\n", 99 | "I:79 Test-Acc:0.8246 Train-Acc:0.675\n", 100 | "I:80 Test-Acc:0.8237 Train-Acc:0.666\n", 101 | "I:81 Test-Acc:0.8273 Train-Acc:0.673\n", 102 | "I:82 Test-Acc:0.8273 Train-Acc:0.704\n", 103 | "I:83 Test-Acc:0.8314 Train-Acc:0.674\n", 104 | "I:84 Test-Acc:0.8292 Train-Acc:0.686\n", 105 | "I:85 Test-Acc:0.8335 Train-Acc:0.699\n", 106 | "I:86 Test-Acc:0.8359 Train-Acc:0.694\n", 107 | "I:87 Test-Acc:0.8375 Train-Acc:0.704\n", 108 | "I:88 Test-Acc:0.8373 Train-Acc:0.697\n", 109 | "I:89 Test-Acc:0.8398 Train-Acc:0.704\n", 110 | "I:90 Test-Acc:0.8393 Train-Acc:0.687\n", 111 | "I:91 Test-Acc:0.8436 Train-Acc:0.705\n", 112 | "I:92 Test-Acc:0.8437 Train-Acc:0.711\n", 113 | "I:93 Test-Acc:0.8446 Train-Acc:0.721\n", 114 | "I:94 Test-Acc:0.845 Train-Acc:0.719\n", 115 | "I:95 Test-Acc:0.8469 Train-Acc:0.724\n", 116 | "I:96 Test-Acc:0.8476 Train-Acc:0.726\n", 117 | "I:97 Test-Acc:0.848 Train-Acc:0.718\n", 118 | "I:98 Test-Acc:0.8496 Train-Acc:0.719\n", 119 | "I:99 Test-Acc:0.85 Train-Acc:0.73\n", 120 | "I:100 Test-Acc:0.8511 Train-Acc:0.737\n", 121 | "I:101 Test-Acc:0.8503 Train-Acc:0.73\n", 122 | "I:102 Test-Acc:0.8504 Train-Acc:0.717\n", 123 | "I:103 Test-Acc:0.8528 Train-Acc:0.74\n", 124 | "I:104 Test-Acc:0.8532 Train-Acc:0.733\n", 125 | "I:105 Test-Acc:0.8537 Train-Acc:0.73\n", 126 | "I:106 Test-Acc:0.8568 Train-Acc:0.721\n", 127 | "I:107 Test-Acc:0.857 Train-Acc:0.75\n", 128 | "I:108 Test-Acc:0.8558 Train-Acc:0.731\n", 129 | "I:109 Test-Acc:0.8578 Train-Acc:0.744\n", 130 | "I:110 Test-Acc:0.8588 Train-Acc:0.754\n", 131 | "I:111 Test-Acc:0.8579 Train-Acc:0.732\n", 132 | "I:112 Test-Acc:0.8582 Train-Acc:0.747\n", 133 | "I:113 Test-Acc:0.8593 Train-Acc:0.747\n", 134 | "I:114 Test-Acc:0.8598 Train-Acc:0.751\n", 135 | "I:115 Test-Acc:0.8603 Train-Acc:0.74\n", 136 | "I:116 Test-Acc:0.86 Train-Acc:0.753\n", 137 | "I:117 Test-Acc:0.8588 Train-Acc:0.746\n", 138 | "I:118 Test-Acc:0.861 Train-Acc:0.741\n", 139 | "I:119 Test-Acc:0.8616 Train-Acc:0.731\n", 140 | "I:120 Test-Acc:0.8629 Train-Acc:0.753\n", 141 | "I:121 Test-Acc:0.8609 Train-Acc:0.743\n", 142 | "I:122 Test-Acc:0.8627 Train-Acc:0.752\n", 143 | "I:123 Test-Acc:0.8646 Train-Acc:0.76\n", 144 | "I:124 Test-Acc:0.8649 Train-Acc:0.766\n", 145 | "I:125 Test-Acc:0.8659 Train-Acc:0.752\n", 146 | "I:126 Test-Acc:0.868 Train-Acc:0.756\n", 147 | "I:127 Test-Acc:0.8648 Train-Acc:0.767\n", 148 | "I:128 Test-Acc:0.8662 Train-Acc:0.747\n", 149 | "I:129 Test-Acc:0.8669 Train-Acc:0.753\n", 150 | "I:130 Test-Acc:0.8694 Train-Acc:0.753\n", 151 | "I:131 Test-Acc:0.8692 Train-Acc:0.76\n", 152 | "I:132 Test-Acc:0.8658 Train-Acc:0.756\n", 153 | "I:133 Test-Acc:0.8666 Train-Acc:0.769\n", 154 | "I:134 Test-Acc:0.8692 Train-Acc:0.77\n", 155 | "I:135 Test-Acc:0.8681 Train-Acc:0.757\n", 156 | "I:136 Test-Acc:0.8705 Train-Acc:0.77\n", 157 | "I:137 Test-Acc:0.8706 Train-Acc:0.77\n", 158 | "I:138 Test-Acc:0.8684 Train-Acc:0.768\n", 159 | "I:139 Test-Acc:0.8664 Train-Acc:0.774\n", 160 | "I:140 Test-Acc:0.8666 Train-Acc:0.756\n", 161 | "I:141 Test-Acc:0.8705 Train-Acc:0.783\n", 162 | "I:142 Test-Acc:0.87 Train-Acc:0.775\n", 163 | "I:143 Test-Acc:0.8729 Train-Acc:0.769\n", 164 | "I:144 Test-Acc:0.8725 Train-Acc:0.776\n", 165 | "I:145 Test-Acc:0.8721 Train-Acc:0.772\n", 166 | "I:146 Test-Acc:0.8718 Train-Acc:0.765\n", 167 | "I:147 Test-Acc:0.8746 Train-Acc:0.777\n", 168 | "I:148 Test-Acc:0.8746 Train-Acc:0.77\n", 169 | "I:149 Test-Acc:0.8734 Train-Acc:0.778\n", 170 | "I:150 Test-Acc:0.873 Train-Acc:0.785\n", 171 | "I:151 Test-Acc:0.8732 Train-Acc:0.76\n", 172 | "I:152 Test-Acc:0.8727 Train-Acc:0.779\n", 173 | "I:153 Test-Acc:0.8754 Train-Acc:0.772\n", 174 | "I:154 Test-Acc:0.8729 Train-Acc:0.773\n", 175 | "I:155 Test-Acc:0.8758 Train-Acc:0.784\n", 176 | "I:156 Test-Acc:0.8732 Train-Acc:0.774\n", 177 | "I:157 Test-Acc:0.8743 Train-Acc:0.782\n", 178 | "I:158 Test-Acc:0.8762 Train-Acc:0.772\n", 179 | "I:159 Test-Acc:0.8755 Train-Acc:0.79\n", 180 | "I:160 Test-Acc:0.8751 Train-Acc:0.774\n", 181 | "I:161 Test-Acc:0.8749 Train-Acc:0.782\n", 182 | "I:162 Test-Acc:0.8744 Train-Acc:0.78\n", 183 | "I:163 Test-Acc:0.8765 Train-Acc:0.782\n", 184 | "I:164 Test-Acc:0.8738 Train-Acc:0.796\n", 185 | "I:165 Test-Acc:0.8753 Train-Acc:0.798\n", 186 | "I:166 Test-Acc:0.8767 Train-Acc:0.794\n", 187 | "I:167 Test-Acc:0.8746 Train-Acc:0.784\n", 188 | "I:168 Test-Acc:0.8769 Train-Acc:0.796\n", 189 | "I:169 Test-Acc:0.8758 Train-Acc:0.789\n", 190 | "I:170 Test-Acc:0.8764 Train-Acc:0.79\n", 191 | "I:171 Test-Acc:0.873 Train-Acc:0.791\n", 192 | "I:172 Test-Acc:0.8765 Train-Acc:0.797\n", 193 | "I:173 Test-Acc:0.8772 Train-Acc:0.789\n", 194 | "I:174 Test-Acc:0.8778 Train-Acc:0.781\n", 195 | "I:175 Test-Acc:0.8758 Train-Acc:0.799\n", 196 | "I:176 Test-Acc:0.8773 Train-Acc:0.785\n", 197 | "I:177 Test-Acc:0.8766 Train-Acc:0.796\n", 198 | "I:178 Test-Acc:0.8782 Train-Acc:0.803\n", 199 | "I:179 Test-Acc:0.8789 Train-Acc:0.794\n", 200 | "I:180 Test-Acc:0.8778 Train-Acc:0.794\n", 201 | "I:181 Test-Acc:0.8778 Train-Acc:0.8\n", 202 | "I:182 Test-Acc:0.8785 Train-Acc:0.791\n", 203 | "I:183 Test-Acc:0.8777 Train-Acc:0.787\n", 204 | "I:184 Test-Acc:0.8769 Train-Acc:0.781\n", 205 | "I:185 Test-Acc:0.8765 Train-Acc:0.786\n", 206 | "I:186 Test-Acc:0.8765 Train-Acc:0.793\n", 207 | "I:187 Test-Acc:0.8785 Train-Acc:0.796\n", 208 | "I:188 Test-Acc:0.879 Train-Acc:0.789\n", 209 | "I:189 Test-Acc:0.8763 Train-Acc:0.79\n", 210 | "I:190 Test-Acc:0.8774 Train-Acc:0.787\n", 211 | "I:191 Test-Acc:0.8766 Train-Acc:0.782\n", 212 | "I:192 Test-Acc:0.8803 Train-Acc:0.798\n", 213 | "I:193 Test-Acc:0.8781 Train-Acc:0.789\n", 214 | "I:194 Test-Acc:0.8795 Train-Acc:0.785\n", 215 | "I:195 Test-Acc:0.8791 Train-Acc:0.807\n", 216 | "I:196 Test-Acc:0.8778 Train-Acc:0.796\n", 217 | "I:197 Test-Acc:0.8783 Train-Acc:0.801\n", 218 | "I:198 Test-Acc:0.8778 Train-Acc:0.81\n", 219 | "I:199 Test-Acc:0.8771 Train-Acc:0.784\n", 220 | "I:200 Test-Acc:0.8776 Train-Acc:0.792\n", 221 | "I:201 Test-Acc:0.8784 Train-Acc:0.794\n", 222 | "I:202 Test-Acc:0.8787 Train-Acc:0.795\n", 223 | "I:203 Test-Acc:0.8803 Train-Acc:0.781\n", 224 | "I:204 Test-Acc:0.8798 Train-Acc:0.804\n", 225 | "I:205 Test-Acc:0.8779 Train-Acc:0.779\n", 226 | "I:206 Test-Acc:0.8788 Train-Acc:0.792\n", 227 | "I:207 Test-Acc:0.8764 Train-Acc:0.793\n", 228 | "I:208 Test-Acc:0.8792 Train-Acc:0.792\n", 229 | "I:209 Test-Acc:0.8798 Train-Acc:0.803\n", 230 | "I:210 Test-Acc:0.8788 Train-Acc:0.804\n", 231 | "I:211 Test-Acc:0.8793 Train-Acc:0.797\n", 232 | "I:212 Test-Acc:0.8764 Train-Acc:0.791\n", 233 | "I:213 Test-Acc:0.8801 Train-Acc:0.801\n", 234 | "I:214 Test-Acc:0.8814 Train-Acc:0.799\n", 235 | "I:215 Test-Acc:0.8806 Train-Acc:0.79\n", 236 | "I:216 Test-Acc:0.8799 Train-Acc:0.8\n", 237 | "I:217 Test-Acc:0.8803 Train-Acc:0.802\n", 238 | "I:218 Test-Acc:0.8782 Train-Acc:0.807\n", 239 | "I:219 Test-Acc:0.8818 Train-Acc:0.797\n", 240 | "I:220 Test-Acc:0.8793 Train-Acc:0.799\n", 241 | "I:221 Test-Acc:0.8789 Train-Acc:0.815\n", 242 | "I:222 Test-Acc:0.8791 Train-Acc:0.816\n", 243 | "I:223 Test-Acc:0.8793 Train-Acc:0.809\n", 244 | "I:224 Test-Acc:0.8814 Train-Acc:0.795\n", 245 | "I:225 Test-Acc:0.8798 Train-Acc:0.799\n", 246 | "I:226 Test-Acc:0.8805 Train-Acc:0.806\n", 247 | "I:227 Test-Acc:0.88 Train-Acc:0.808\n", 248 | "I:228 Test-Acc:0.8782 Train-Acc:0.801\n", 249 | "I:229 Test-Acc:0.8802 Train-Acc:0.814\n", 250 | "I:230 Test-Acc:0.8807 Train-Acc:0.8\n", 251 | "I:231 Test-Acc:0.8809 Train-Acc:0.798\n", 252 | "I:232 Test-Acc:0.8805 Train-Acc:0.82\n", 253 | "I:233 Test-Acc:0.8795 Train-Acc:0.794\n", 254 | "I:234 Test-Acc:0.8807 Train-Acc:0.806\n", 255 | "I:235 Test-Acc:0.8806 Train-Acc:0.808\n", 256 | "I:236 Test-Acc:0.8787 Train-Acc:0.802\n", 257 | "I:237 Test-Acc:0.8796 Train-Acc:0.81\n", 258 | "I:238 Test-Acc:0.8766 Train-Acc:0.805\n", 259 | "I:239 Test-Acc:0.8781 Train-Acc:0.792\n", 260 | "I:240 Test-Acc:0.8787 Train-Acc:0.809\n", 261 | "I:241 Test-Acc:0.8762 Train-Acc:0.802\n", 262 | "I:242 Test-Acc:0.8775 Train-Acc:0.811\n", 263 | "I:243 Test-Acc:0.8804 Train-Acc:0.814\n", 264 | "I:244 Test-Acc:0.8794 Train-Acc:0.804\n", 265 | "I:245 Test-Acc:0.8788 Train-Acc:0.801\n", 266 | "I:246 Test-Acc:0.8777 Train-Acc:0.795\n", 267 | "I:247 Test-Acc:0.8785 Train-Acc:0.808\n", 268 | "I:248 Test-Acc:0.8788 Train-Acc:0.803\n", 269 | "I:249 Test-Acc:0.8773 Train-Acc:0.813\n", 270 | "I:250 Test-Acc:0.8786 Train-Acc:0.808\n", 271 | "I:251 Test-Acc:0.8787 Train-Acc:0.803\n", 272 | "I:252 Test-Acc:0.8789 Train-Acc:0.812\n", 273 | "I:253 Test-Acc:0.8792 Train-Acc:0.804\n", 274 | "I:254 Test-Acc:0.8779 Train-Acc:0.815\n", 275 | "I:255 Test-Acc:0.8796 Train-Acc:0.811\n", 276 | "I:256 Test-Acc:0.8798 Train-Acc:0.806\n", 277 | "I:257 Test-Acc:0.88 Train-Acc:0.803\n", 278 | "I:258 Test-Acc:0.8776 Train-Acc:0.795\n", 279 | "I:259 Test-Acc:0.8798 Train-Acc:0.803\n", 280 | "I:260 Test-Acc:0.8799 Train-Acc:0.805\n", 281 | "I:261 Test-Acc:0.8789 Train-Acc:0.807\n", 282 | "I:262 Test-Acc:0.8784 Train-Acc:0.804\n", 283 | "I:263 Test-Acc:0.8792 Train-Acc:0.806\n", 284 | "I:264 Test-Acc:0.8777 Train-Acc:0.796\n", 285 | "I:265 Test-Acc:0.8785 Train-Acc:0.821\n", 286 | "I:266 Test-Acc:0.8794 Train-Acc:0.81\n", 287 | "I:267 Test-Acc:0.8783 Train-Acc:0.816\n", 288 | "I:268 Test-Acc:0.8777 Train-Acc:0.812\n", 289 | "I:269 Test-Acc:0.8791 Train-Acc:0.812\n", 290 | "I:270 Test-Acc:0.878 Train-Acc:0.813\n", 291 | "I:271 Test-Acc:0.8784 Train-Acc:0.82\n", 292 | "I:272 Test-Acc:0.8792 Train-Acc:0.821\n", 293 | "I:273 Test-Acc:0.8781 Train-Acc:0.823\n", 294 | "I:274 Test-Acc:0.8788 Train-Acc:0.816\n", 295 | "I:275 Test-Acc:0.8793 Train-Acc:0.82\n", 296 | "I:276 Test-Acc:0.8781 Train-Acc:0.829\n", 297 | "I:277 Test-Acc:0.8795 Train-Acc:0.809\n", 298 | "I:278 Test-Acc:0.875 Train-Acc:0.806\n", 299 | "I:279 Test-Acc:0.8795 Train-Acc:0.813\n", 300 | "I:280 Test-Acc:0.88 Train-Acc:0.816\n", 301 | "I:281 Test-Acc:0.8796 Train-Acc:0.819\n", 302 | "I:282 Test-Acc:0.8802 Train-Acc:0.809\n", 303 | "I:283 Test-Acc:0.8804 Train-Acc:0.811\n", 304 | "I:284 Test-Acc:0.8779 Train-Acc:0.808\n", 305 | "I:285 Test-Acc:0.8816 Train-Acc:0.82\n", 306 | "I:286 Test-Acc:0.8792 Train-Acc:0.822\n", 307 | "I:287 Test-Acc:0.8791 Train-Acc:0.817\n", 308 | "I:288 Test-Acc:0.8769 Train-Acc:0.814\n", 309 | "I:289 Test-Acc:0.8785 Train-Acc:0.807\n", 310 | "I:290 Test-Acc:0.8778 Train-Acc:0.817\n", 311 | "I:291 Test-Acc:0.8794 Train-Acc:0.82\n", 312 | "I:292 Test-Acc:0.8804 Train-Acc:0.824\n", 313 | "I:293 Test-Acc:0.8779 Train-Acc:0.812\n", 314 | "I:294 Test-Acc:0.8784 Train-Acc:0.816\n", 315 | "I:295 Test-Acc:0.877 Train-Acc:0.817\n", 316 | "I:296 Test-Acc:0.8767 Train-Acc:0.826\n", 317 | "I:297 Test-Acc:0.8774 Train-Acc:0.816\n", 318 | "I:298 Test-Acc:0.8774 Train-Acc:0.804\n", 319 | "I:299 Test-Acc:0.8774 Train-Acc:0.814" 320 | ] 321 | } 322 | ], 323 | "source": [ 324 | "import numpy as np, sys\n", 325 | "np.random.seed(1)\n", 326 | "\n", 327 | "from keras.datasets import mnist\n", 328 | "\n", 329 | "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", 330 | "\n", 331 | "images, labels = (x_train[0:1000].reshape(1000,28*28) / 255,\n", 332 | " y_train[0:1000])\n", 333 | "\n", 334 | "\n", 335 | "one_hot_labels = np.zeros((len(labels),10))\n", 336 | "for i,l in enumerate(labels):\n", 337 | " one_hot_labels[i][l] = 1\n", 338 | "labels = one_hot_labels\n", 339 | "\n", 340 | "test_images = x_test.reshape(len(x_test),28*28) / 255\n", 341 | "test_labels = np.zeros((len(y_test),10))\n", 342 | "for i,l in enumerate(y_test):\n", 343 | " test_labels[i][l] = 1\n", 344 | "\n", 345 | "def tanh(x):\n", 346 | " return np.tanh(x)\n", 347 | "\n", 348 | "def tanh2deriv(output):\n", 349 | " return 1 - (output ** 2)\n", 350 | "\n", 351 | "def softmax(x):\n", 352 | " temp = np.exp(x)\n", 353 | " return temp / np.sum(temp, axis=1, keepdims=True)\n", 354 | "\n", 355 | "alpha, iterations = (2, 300)\n", 356 | "pixels_per_image, num_labels = (784, 10)\n", 357 | "batch_size = 128\n", 358 | "\n", 359 | "input_rows = 28\n", 360 | "input_cols = 28\n", 361 | "\n", 362 | "kernel_rows = 3\n", 363 | "kernel_cols = 3\n", 364 | "num_kernels = 16\n", 365 | "\n", 366 | "hidden_size = ((input_rows - kernel_rows) * \n", 367 | " (input_cols - kernel_cols)) * num_kernels\n", 368 | "\n", 369 | "# weights_0_1 = 0.02*np.random.random((pixels_per_image,hidden_size))-0.01\n", 370 | "kernels = 0.02*np.random.random((kernel_rows*kernel_cols,\n", 371 | " num_kernels))-0.01\n", 372 | "\n", 373 | "weights_1_2 = 0.2*np.random.random((hidden_size,\n", 374 | " num_labels)) - 0.1\n", 375 | "\n", 376 | "\n", 377 | "\n", 378 | "def get_image_section(layer,row_from, row_to, col_from, col_to):\n", 379 | " section = layer[:,row_from:row_to,col_from:col_to]\n", 380 | " return section.reshape(-1,1,row_to-row_from, col_to-col_from)\n", 381 | "\n", 382 | "for j in range(iterations):\n", 383 | " correct_cnt = 0\n", 384 | " for i in range(int(len(images) / batch_size)):\n", 385 | " batch_start, batch_end=((i * batch_size),((i+1)*batch_size))\n", 386 | " layer_0 = images[batch_start:batch_end]\n", 387 | " layer_0 = layer_0.reshape(layer_0.shape[0],28,28)\n", 388 | " layer_0.shape\n", 389 | "\n", 390 | " sects = list()\n", 391 | " for row_start in range(layer_0.shape[1]-kernel_rows):\n", 392 | " for col_start in range(layer_0.shape[2] - kernel_cols):\n", 393 | " sect = get_image_section(layer_0,\n", 394 | " row_start,\n", 395 | " row_start+kernel_rows,\n", 396 | " col_start,\n", 397 | " col_start+kernel_cols)\n", 398 | " sects.append(sect)\n", 399 | "\n", 400 | " expanded_input = np.concatenate(sects,axis=1)\n", 401 | " es = expanded_input.shape\n", 402 | " flattened_input = expanded_input.reshape(es[0]*es[1],-1)\n", 403 | "\n", 404 | " kernel_output = flattened_input.dot(kernels)\n", 405 | " layer_1 = tanh(kernel_output.reshape(es[0],-1))\n", 406 | " dropout_mask = np.random.randint(2,size=layer_1.shape)\n", 407 | " layer_1 *= dropout_mask * 2\n", 408 | " layer_2 = softmax(np.dot(layer_1,weights_1_2))\n", 409 | "\n", 410 | " for k in range(batch_size):\n", 411 | " labelset = labels[batch_start+k:batch_start+k+1]\n", 412 | " _inc = int(np.argmax(layer_2[k:k+1]) == \n", 413 | " np.argmax(labelset))\n", 414 | " correct_cnt += _inc\n", 415 | "\n", 416 | " layer_2_delta = (labels[batch_start:batch_end]-layer_2)\\\n", 417 | " / (batch_size * layer_2.shape[0])\n", 418 | " layer_1_delta = layer_2_delta.dot(weights_1_2.T) * \\\n", 419 | " tanh2deriv(layer_1)\n", 420 | " layer_1_delta *= dropout_mask\n", 421 | " weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n", 422 | " l1d_reshape = layer_1_delta.reshape(kernel_output.shape)\n", 423 | " k_update = flattened_input.T.dot(l1d_reshape)\n", 424 | " kernels -= alpha * k_update\n", 425 | " \n", 426 | " test_correct_cnt = 0\n", 427 | "\n", 428 | " for i in range(len(test_images)):\n", 429 | "\n", 430 | " layer_0 = test_images[i:i+1]\n", 431 | "# layer_1 = tanh(np.dot(layer_0,weights_0_1))\n", 432 | " layer_0 = layer_0.reshape(layer_0.shape[0],28,28)\n", 433 | " layer_0.shape\n", 434 | "\n", 435 | " sects = list()\n", 436 | " for row_start in range(layer_0.shape[1]-kernel_rows):\n", 437 | " for col_start in range(layer_0.shape[2] - kernel_cols):\n", 438 | " sect = get_image_section(layer_0,\n", 439 | " row_start,\n", 440 | " row_start+kernel_rows,\n", 441 | " col_start,\n", 442 | " col_start+kernel_cols)\n", 443 | " sects.append(sect)\n", 444 | "\n", 445 | " expanded_input = np.concatenate(sects,axis=1)\n", 446 | " es = expanded_input.shape\n", 447 | " flattened_input = expanded_input.reshape(es[0]*es[1],-1)\n", 448 | "\n", 449 | " kernel_output = flattened_input.dot(kernels)\n", 450 | " layer_1 = tanh(kernel_output.reshape(es[0],-1))\n", 451 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 452 | "\n", 453 | " test_correct_cnt += int(np.argmax(layer_2) == \n", 454 | " np.argmax(test_labels[i:i+1]))\n", 455 | " if(j % 1 == 0):\n", 456 | " sys.stdout.write(\"\\n\"+ \\\n", 457 | " \"I:\" + str(j) + \\\n", 458 | " \" Test-Acc:\"+str(test_correct_cnt/float(len(test_images)))+\\\n", 459 | " \" Train-Acc:\" + str(correct_cnt/float(len(images))))" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": null, 465 | "metadata": {}, 466 | "outputs": [], 467 | "source": [] 468 | } 469 | ], 470 | "metadata": { 471 | "kernelspec": { 472 | "display_name": "Python 3", 473 | "language": "python", 474 | "name": "python3" 475 | }, 476 | "language_info": { 477 | "codemirror_mode": { 478 | "name": "ipython", 479 | "version": 3 480 | }, 481 | "file_extension": ".py", 482 | "mimetype": "text/x-python", 483 | "name": "python", 484 | "nbconvert_exporter": "python", 485 | "pygments_lexer": "ipython3", 486 | "version": "3.6.1" 487 | } 488 | }, 489 | "nbformat": 4, 490 | "nbformat_minor": 2 491 | } 492 | -------------------------------------------------------------------------------- /Chapter11 - Intro to Word Embeddings - Neural Networks that Understand Language.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Download the IMDB Dataset" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 30, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# Download reviews.txt and labels.txt from here: https://github.com/udacity/deep-learning/tree/master/sentiment-network\n", 17 | "\n", 18 | "def pretty_print_review_and_label(i):\n", 19 | " print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n", 20 | "\n", 21 | "g = open('reviews.txt','r') # What we know!\n", 22 | "reviews = list(map(lambda x:x[:-1],g.readlines()))\n", 23 | "g.close()\n", 24 | "\n", 25 | "g = open('labels.txt','r') # What we WANT to know!\n", 26 | "labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n", 27 | "g.close()" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": {}, 33 | "source": [ 34 | "# Capturing Word Correlation in Input Data" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 31, 40 | "metadata": {}, 41 | "outputs": [ 42 | { 43 | "name": "stdout", 44 | "output_type": "stream", 45 | "text": [ 46 | "Sent Encoding:[1 1 0 1]\n" 47 | ] 48 | } 49 | ], 50 | "source": [ 51 | "import numpy as np\n", 52 | "\n", 53 | "onehots = {}\n", 54 | "onehots['cat'] = np.array([1,0,0,0])\n", 55 | "onehots['the'] = np.array([0,1,0,0])\n", 56 | "onehots['dog'] = np.array([0,0,1,0])\n", 57 | "onehots['sat'] = np.array([0,0,0,1])\n", 58 | "\n", 59 | "sentence = ['the','cat','sat']\n", 60 | "x = onehots[sentence[0]] + \\\n", 61 | " onehots[sentence[1]] + \\\n", 62 | " onehots[sentence[2]]\n", 63 | "\n", 64 | "print(\"Sent Encoding:\" + str(x))" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "# Predicting Movie Reviews" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 56, 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [ 80 | "import sys\n", 81 | "\n", 82 | "f = open('reviews.txt')\n", 83 | "raw_reviews = f.readlines()\n", 84 | "f.close()\n", 85 | "\n", 86 | "f = open('labels.txt')\n", 87 | "raw_labels = f.readlines()\n", 88 | "f.close()\n", 89 | "\n", 90 | "tokens = list(map(lambda x:set(x.split(\" \")),raw_reviews))\n", 91 | "\n", 92 | "vocab = set()\n", 93 | "for sent in tokens:\n", 94 | " for word in sent:\n", 95 | " if(len(word)>0):\n", 96 | " vocab.add(word)\n", 97 | "vocab = list(vocab)\n", 98 | "\n", 99 | "word2index = {}\n", 100 | "for i,word in enumerate(vocab):\n", 101 | " word2index[word]=i\n", 102 | "\n", 103 | "input_dataset = list()\n", 104 | "for sent in tokens:\n", 105 | " sent_indices = list()\n", 106 | " for word in sent:\n", 107 | " try:\n", 108 | " sent_indices.append(word2index[word])\n", 109 | " except:\n", 110 | " \"\"\n", 111 | " input_dataset.append(list(set(sent_indices)))\n", 112 | "\n", 113 | "target_dataset = list()\n", 114 | "for label in raw_labels:\n", 115 | " if label == 'positive\\n':\n", 116 | " target_dataset.append(1)\n", 117 | " else:\n", 118 | " target_dataset.append(0)" 119 | ] 120 | }, 121 | { 122 | "cell_type": "raw", 123 | "metadata": {}, 124 | "source": [ 125 | "import numpy as np\n", 126 | "np.random.seed(1)\n", 127 | "\n", 128 | "def sigmoid(x):\n", 129 | " return 1/(1 + np.exp(-x))\n", 130 | "\n", 131 | "alpha, iterations = (0.01, 2)\n", 132 | "hidden_size = 100\n", 133 | "\n", 134 | "weights_0_1 = 0.2*np.random.random((len(vocab),hidden_size)) - 0.1\n", 135 | "weights_1_2 = 0.2*np.random.random((hidden_size,1)) - 0.1\n", 136 | "\n", 137 | "correct,total = (0,0)\n", 138 | "for iter in range(iterations):\n", 139 | " \n", 140 | " # train on first 24,000\n", 141 | " for i in range(len(input_dataset)-1000):\n", 142 | "\n", 143 | " x,y = (input_dataset[i],target_dataset[i])\n", 144 | " layer_1 = sigmoid(np.sum(weights_0_1[x],axis=0)) #embed + sigmoid\n", 145 | " layer_2 = sigmoid(np.dot(layer_1,weights_1_2)) # linear + softmax\n", 146 | "\n", 147 | " layer_2_delta = layer_2 - y # compare pred with truth\n", 148 | " layer_1_delta = layer_2_delta.dot(weights_1_2.T) #backprop\n", 149 | "\n", 150 | " weights_0_1[x] -= layer_1_delta * alpha\n", 151 | " weights_1_2 -= np.outer(layer_1,layer_2_delta) * alpha\n", 152 | "\n", 153 | " if(np.abs(layer_2_delta) < 0.5):\n", 154 | " correct += 1\n", 155 | " total += 1\n", 156 | " if(i % 10 == 9):\n", 157 | " progress = str(i/float(len(input_dataset)))\n", 158 | " sys.stdout.write('\\rIter:'+str(iter)\\\n", 159 | " +' Progress:'+progress[2:4]\\\n", 160 | " +'.'+progress[4:6]\\\n", 161 | " +'% Training Accuracy:'\\\n", 162 | " + str(correct/float(total)) + '%')\n", 163 | " print()\n", 164 | "correct,total = (0,0)\n", 165 | "for i in range(len(input_dataset)-1000,len(input_dataset)):\n", 166 | "\n", 167 | " x = input_dataset[i]\n", 168 | " y = target_dataset[i]\n", 169 | "\n", 170 | " layer_1 = sigmoid(np.sum(weights_0_1[x],axis=0))\n", 171 | " layer_2 = sigmoid(np.dot(layer_1,weights_1_2))\n", 172 | " \n", 173 | " if(np.abs(layer_2 - y) < 0.5):\n", 174 | " correct += 1\n", 175 | " total += 1\n", 176 | "print(\"Test Accuracy:\" + str(correct / float(total)))" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 31, 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "data": { 186 | "text/plain": [ 187 | "{'',\n", 188 | " '\\n',\n", 189 | " '.',\n", 190 | " 'a',\n", 191 | " 'about',\n", 192 | " 'adults',\n", 193 | " 'age',\n", 194 | " 'all',\n", 195 | " 'and',\n", 196 | " 'as',\n", 197 | " 'at',\n", 198 | " 'believe',\n", 199 | " 'bromwell',\n", 200 | " 'burn',\n", 201 | " 'can',\n", 202 | " 'cartoon',\n", 203 | " 'classic',\n", 204 | " 'closer',\n", 205 | " 'comedy',\n", 206 | " 'down',\n", 207 | " 'episode',\n", 208 | " 'expect',\n", 209 | " 'far',\n", 210 | " 'fetched',\n", 211 | " 'financially',\n", 212 | " 'here',\n", 213 | " 'high',\n", 214 | " 'i',\n", 215 | " 'immediately',\n", 216 | " 'in',\n", 217 | " 'insightful',\n", 218 | " 'inspector',\n", 219 | " 'is',\n", 220 | " 'isn',\n", 221 | " 'it',\n", 222 | " 'knew',\n", 223 | " 'lead',\n", 224 | " 'life',\n", 225 | " 'line',\n", 226 | " 'm',\n", 227 | " 'many',\n", 228 | " 'me',\n", 229 | " 'much',\n", 230 | " 'my',\n", 231 | " 'of',\n", 232 | " 'one',\n", 233 | " 'other',\n", 234 | " 'pathetic',\n", 235 | " 'pettiness',\n", 236 | " 'pity',\n", 237 | " 'pomp',\n", 238 | " 'profession',\n", 239 | " 'programs',\n", 240 | " 'ran',\n", 241 | " 'reality',\n", 242 | " 'recalled',\n", 243 | " 'remind',\n", 244 | " 'repeatedly',\n", 245 | " 'right',\n", 246 | " 's',\n", 247 | " 'sack',\n", 248 | " 'same',\n", 249 | " 'satire',\n", 250 | " 'saw',\n", 251 | " 'school',\n", 252 | " 'schools',\n", 253 | " 'scramble',\n", 254 | " 'see',\n", 255 | " 'situation',\n", 256 | " 'some',\n", 257 | " 'student',\n", 258 | " 'students',\n", 259 | " 'such',\n", 260 | " 'survive',\n", 261 | " 't',\n", 262 | " 'teachers',\n", 263 | " 'teaching',\n", 264 | " 'than',\n", 265 | " 'that',\n", 266 | " 'the',\n", 267 | " 'their',\n", 268 | " 'think',\n", 269 | " 'through',\n", 270 | " 'time',\n", 271 | " 'to',\n", 272 | " 'tried',\n", 273 | " 'welcome',\n", 274 | " 'what',\n", 275 | " 'when',\n", 276 | " 'which',\n", 277 | " 'who',\n", 278 | " 'whole',\n", 279 | " 'years',\n", 280 | " 'your'}" 281 | ] 282 | }, 283 | "execution_count": 31, 284 | "metadata": {}, 285 | "output_type": "execute_result" 286 | } 287 | ], 288 | "source": [ 289 | "tokens[0]" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "# Comparing Word Embeddings" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 61, 302 | "metadata": {}, 303 | "outputs": [], 304 | "source": [ 305 | "from collections import Counter\n", 306 | "import math \n", 307 | "\n", 308 | "def similar(target='beautiful'):\n", 309 | " target_index = word2index[target]\n", 310 | " scores = Counter()\n", 311 | " for word,index in word2index.items():\n", 312 | " raw_difference = weights_0_1[index] - (weights_0_1[target_index])\n", 313 | " squared_difference = raw_difference * raw_difference\n", 314 | " scores[word] = -math.sqrt(sum(squared_difference))\n", 315 | "\n", 316 | " return scores.most_common(10)" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 64, 322 | "metadata": {}, 323 | "outputs": [ 324 | { 325 | "name": "stdout", 326 | "output_type": "stream", 327 | "text": [ 328 | "[('beautiful', -0.0), ('heart', -0.7461901055360456), ('captures', -0.7767713774499612), ('impact', -0.7851006592549541), ('unexpected', -0.8024296074764704), ('bit', -0.8041029062033365), ('touching', -0.8041105203290175), ('true', -0.8092335336931215), ('worth', -0.8095649927927353), ('strong', -0.8095814455120289)]\n" 329 | ] 330 | } 331 | ], 332 | "source": [ 333 | "print(similar('beautiful'))" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 65, 339 | "metadata": {}, 340 | "outputs": [ 341 | { 342 | "name": "stdout", 343 | "output_type": "stream", 344 | "text": [ 345 | "[('terrible', -0.0), ('boring', -0.7591663900380615), ('lame', -0.7732283645546325), ('horrible', -0.788081854105546), ('disappointing', -0.7893120726668719), ('avoid', -0.7939105009456955), ('badly', -0.8054784389155504), ('annoying', -0.8067172753479477), ('dull', -0.8072650189634973), ('mess', -0.8139036459320503)]\n" 346 | ] 347 | } 348 | ], 349 | "source": [ 350 | "print(similar('terrible'))" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "# Filling in the Blank" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 66, 363 | "metadata": {}, 364 | "outputs": [], 365 | "source": [ 366 | "import sys,random,math\n", 367 | "from collections import Counter\n", 368 | "import numpy as np\n", 369 | "\n", 370 | "np.random.seed(1)\n", 371 | "random.seed(1)\n", 372 | "f = open('reviews.txt')\n", 373 | "raw_reviews = f.readlines()\n", 374 | "f.close()\n", 375 | "\n", 376 | "tokens = list(map(lambda x:(x.split(\" \")),raw_reviews))\n", 377 | "wordcnt = Counter()\n", 378 | "for sent in tokens:\n", 379 | " for word in sent:\n", 380 | " wordcnt[word] -= 1\n", 381 | "vocab = list(set(map(lambda x:x[0],wordcnt.most_common())))\n", 382 | "\n", 383 | "word2index = {}\n", 384 | "for i,word in enumerate(vocab):\n", 385 | " word2index[word]=i\n", 386 | "\n", 387 | "concatenated = list()\n", 388 | "input_dataset = list()\n", 389 | "for sent in tokens:\n", 390 | " sent_indices = list()\n", 391 | " for word in sent:\n", 392 | " try:\n", 393 | " sent_indices.append(word2index[word])\n", 394 | " concatenated.append(word2index[word])\n", 395 | " except:\n", 396 | " \"\"\n", 397 | " input_dataset.append(sent_indices)\n", 398 | "concatenated = np.array(concatenated)\n", 399 | "random.shuffle(input_dataset)" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 69, 405 | "metadata": {}, 406 | "outputs": [ 407 | { 408 | "name": "stdout", 409 | "output_type": "stream", 410 | "text": [ 411 | "Progress:0.99998[('terrible', -0.0), ('horrible', -3.488841411481131), ('bad', -4.0636425093941595), ('brilliant', -4.211247495138625), ('pathetic', -4.304645745396163), ('fantastic', -4.341998952418319), ('fabulous', -4.356925869405997), ('phenomenal', -4.361301237074382), ('marvelous', -4.3856957968039145), ('spectacular', -4.413156799233535)]\n" 412 | ] 413 | } 414 | ], 415 | "source": [ 416 | "alpha, iterations = (0.05, 2)\n", 417 | "hidden_size,window,negative = (50,2,5)\n", 418 | "\n", 419 | "weights_0_1 = (np.random.rand(len(vocab),hidden_size) - 0.5) * 0.2\n", 420 | "weights_1_2 = np.random.rand(len(vocab),hidden_size)*0\n", 421 | "\n", 422 | "layer_2_target = np.zeros(negative+1)\n", 423 | "layer_2_target[0] = 1\n", 424 | "\n", 425 | "def similar(target='beautiful'):\n", 426 | " target_index = word2index[target]\n", 427 | "\n", 428 | " scores = Counter()\n", 429 | " for word,index in word2index.items():\n", 430 | " raw_difference = weights_0_1[index] - (weights_0_1[target_index])\n", 431 | " squared_difference = raw_difference * raw_difference\n", 432 | " scores[word] = -math.sqrt(sum(squared_difference))\n", 433 | " return scores.most_common(10)\n", 434 | "\n", 435 | "def sigmoid(x):\n", 436 | " return 1/(1 + np.exp(-x))\n", 437 | "\n", 438 | "for rev_i,review in enumerate(input_dataset * iterations):\n", 439 | " for target_i in range(len(review)):\n", 440 | " \n", 441 | " # since it's really expensive to predict every vocabulary\n", 442 | " # we're only going to predict a random subset\n", 443 | " target_samples = [review[target_i]]+list(concatenated\\\n", 444 | " [(np.random.rand(negative)*len(concatenated)).astype('int').tolist()])\n", 445 | "\n", 446 | " left_context = review[max(0,target_i-window):target_i]\n", 447 | " right_context = review[target_i+1:min(len(review),target_i+window)]\n", 448 | "\n", 449 | " layer_1 = np.mean(weights_0_1[left_context+right_context],axis=0)\n", 450 | " layer_2 = sigmoid(layer_1.dot(weights_1_2[target_samples].T))\n", 451 | " layer_2_delta = layer_2 - layer_2_target\n", 452 | " layer_1_delta = layer_2_delta.dot(weights_1_2[target_samples])\n", 453 | "\n", 454 | " weights_0_1[left_context+right_context] -= layer_1_delta * alpha\n", 455 | " weights_1_2[target_samples] -= np.outer(layer_2_delta,layer_1)*alpha\n", 456 | "\n", 457 | " if(rev_i % 250 == 0):\n", 458 | " sys.stdout.write('\\rProgress:'+str(rev_i/float(len(input_dataset)\n", 459 | " *iterations)) + \" \" + str(similar('terrible')))\n", 460 | " sys.stdout.write('\\rProgress:'+str(rev_i/float(len(input_dataset)\n", 461 | " *iterations)))\n", 462 | "print(similar('terrible'))" 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "metadata": {}, 468 | "source": [ 469 | "# King - Man + Woman ~= Queen" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": 70, 475 | "metadata": {}, 476 | "outputs": [], 477 | "source": [ 478 | "def analogy(positive=['terrible','good'],negative=['bad']):\n", 479 | " \n", 480 | " norms = np.sum(weights_0_1 * weights_0_1,axis=1)\n", 481 | " norms.resize(norms.shape[0],1)\n", 482 | " \n", 483 | " normed_weights = weights_0_1 * norms\n", 484 | " \n", 485 | " query_vect = np.zeros(len(weights_0_1[0]))\n", 486 | " for word in positive:\n", 487 | " query_vect += normed_weights[word2index[word]]\n", 488 | " for word in negative:\n", 489 | " query_vect -= normed_weights[word2index[word]]\n", 490 | " \n", 491 | " scores = Counter()\n", 492 | " for word,index in word2index.items():\n", 493 | " raw_difference = weights_0_1[index] - query_vect\n", 494 | " squared_difference = raw_difference * raw_difference\n", 495 | " scores[word] = -math.sqrt(sum(squared_difference))\n", 496 | " \n", 497 | " return scores.most_common(10)[1:]" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 71, 503 | "metadata": {}, 504 | "outputs": [ 505 | { 506 | "data": { 507 | "text/plain": [ 508 | "[('terrific', -210.46593317724228),\n", 509 | " ('perfect', -210.52652806032205),\n", 510 | " ('worth', -210.53162266358495),\n", 511 | " ('good', -210.55072184482773),\n", 512 | " ('terrible', -210.58429046605724),\n", 513 | " ('decent', -210.87945442008805),\n", 514 | " ('superb', -211.01143515971094),\n", 515 | " ('great', -211.1327058081335),\n", 516 | " ('worthy', -211.13577238103477)]" 517 | ] 518 | }, 519 | "execution_count": 71, 520 | "metadata": {}, 521 | "output_type": "execute_result" 522 | } 523 | ], 524 | "source": [ 525 | "analogy(['terrible','good'],['bad'])" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 72, 531 | "metadata": {}, 532 | "outputs": [ 533 | { 534 | "data": { 535 | "text/plain": [ 536 | "[('simon', -193.82490698964878),\n", 537 | " ('obsessed', -193.91805919583555),\n", 538 | " ('stanwyck', -194.22311983847902),\n", 539 | " ('sandler', -194.22846640800597),\n", 540 | " ('branagh', -194.24551334589853),\n", 541 | " ('daniel', -194.24631020485714),\n", 542 | " ('peter', -194.29908544092078),\n", 543 | " ('tony', -194.31388897167716),\n", 544 | " ('aged', -194.35115773165094)]" 545 | ] 546 | }, 547 | "execution_count": 72, 548 | "metadata": {}, 549 | "output_type": "execute_result" 550 | } 551 | ], 552 | "source": [ 553 | "analogy(['elizabeth','he'],['she'])" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": null, 559 | "metadata": {}, 560 | "outputs": [], 561 | "source": [] 562 | } 563 | ], 564 | "metadata": { 565 | "kernelspec": { 566 | "display_name": "Python 3", 567 | "language": "python", 568 | "name": "python3" 569 | }, 570 | "language_info": { 571 | "codemirror_mode": { 572 | "name": "ipython", 573 | "version": 3 574 | }, 575 | "file_extension": ".py", 576 | "mimetype": "text/x-python", 577 | "name": "python", 578 | "nbconvert_exporter": "python", 579 | "pygments_lexer": "ipython3", 580 | "version": "3.6.1" 581 | } 582 | }, 583 | "nbformat": 4, 584 | "nbformat_minor": 2 585 | } 586 | -------------------------------------------------------------------------------- /Chapter12 - Intro to Recurrence - Predicting the Next Word.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Download & Preprocess the IMDB Dataset" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 34, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "# Download reviews.txt and labels.txt from here: https://github.com/udacity/deep-learning/tree/master/sentiment-network\n", 17 | "\n", 18 | "def pretty_print_review_and_label(i):\n", 19 | " print(labels[i] + \"\\t:\\t\" + reviews[i][:80] + \"...\")\n", 20 | "\n", 21 | "g = open('reviews.txt','r') # What we know!\n", 22 | "reviews = list(map(lambda x:x[:-1],g.readlines()))\n", 23 | "g.close()\n", 24 | "\n", 25 | "g = open('labels.txt','r') # What we WANT to know!\n", 26 | "labels = list(map(lambda x:x[:-1].upper(),g.readlines()))\n", 27 | "g.close()\n", 28 | "\n", 29 | "\n", 30 | "# Preprocess dataset:\n", 31 | "\n", 32 | "import sys\n", 33 | "\n", 34 | "f = open('reviews.txt')\n", 35 | "raw_reviews = f.readlines()\n", 36 | "f.close()\n", 37 | "\n", 38 | "f = open('labels.txt')\n", 39 | "raw_labels = f.readlines()\n", 40 | "f.close()\n", 41 | "\n", 42 | "tokens = list(map(lambda x:set(x.split(\" \")),raw_reviews))\n", 43 | "\n", 44 | "vocab = set()\n", 45 | "for sent in tokens:\n", 46 | " for word in sent:\n", 47 | " if(len(word)>0):\n", 48 | " vocab.add(word)\n", 49 | "vocab = list(vocab)\n", 50 | "\n", 51 | "word2index = {}\n", 52 | "for i,word in enumerate(vocab):\n", 53 | " word2index[word]=i\n", 54 | "\n", 55 | "input_dataset = list()\n", 56 | "for sent in tokens:\n", 57 | " sent_indices = list()\n", 58 | " for word in sent:\n", 59 | " try:\n", 60 | " sent_indices.append(word2index[word])\n", 61 | " except:\n", 62 | " \"\"\n", 63 | " input_dataset.append(list(set(sent_indices)))\n", 64 | "\n", 65 | "target_dataset = list()\n", 66 | "for label in raw_labels:\n", 67 | " if label == 'positive\\n':\n", 68 | " target_dataset.append(1)\n", 69 | " else:\n", 70 | " target_dataset.append(0)" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "# The Surprising Power of Averaged Word Vectors" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 35, 83 | "metadata": {}, 84 | "outputs": [ 85 | { 86 | "data": { 87 | "text/plain": [ 88 | "['this tim burton remake of the original ',\n", 89 | " 'certainly one of the dozen or so worst m',\n", 90 | " 'boring and appallingly acted summer phe']" 91 | ] 92 | }, 93 | "execution_count": 35, 94 | "metadata": {}, 95 | "output_type": "execute_result" 96 | } 97 | ], 98 | "source": [ 99 | "import numpy as np\n", 100 | "norms = np.sum(weights_0_1 * weights_0_1,axis=1)\n", 101 | "norms.resize(norms.shape[0],1)\n", 102 | "normed_weights = weights_0_1 * norms\n", 103 | "\n", 104 | "def make_sent_vect(words):\n", 105 | " indices = list(map(lambda x:word2index[x],filter(lambda x:x in word2index,words)))\n", 106 | " return np.mean(normed_weights[indices],axis=0)\n", 107 | "\n", 108 | "reviews2vectors = list()\n", 109 | "for review in tokens: # tokenized reviews\n", 110 | " reviews2vectors.append(make_sent_vect(review))\n", 111 | "reviews2vectors = np.array(reviews2vectors)\n", 112 | "\n", 113 | "def most_similar_reviews(review):\n", 114 | " v = make_sent_vect(review)\n", 115 | " scores = Counter()\n", 116 | " for i,val in enumerate(reviews2vectors.dot(v)):\n", 117 | " scores[i] = val\n", 118 | " most_similar = list()\n", 119 | " \n", 120 | " for idx,score in scores.most_common(3):\n", 121 | " most_similar.append(raw_reviews[idx][0:40])\n", 122 | " return most_similar\n", 123 | "\n", 124 | "most_similar_reviews(['boring','awful'])" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "# Matrices that Change Absolutely Nothing" 132 | ] 133 | }, 134 | { 135 | "cell_type": "code", 136 | "execution_count": 37, 137 | "metadata": {}, 138 | "outputs": [ 139 | { 140 | "name": "stdout", 141 | "output_type": "stream", 142 | "text": [ 143 | "[[1. 0. 0.]\n", 144 | " [0. 1. 0.]\n", 145 | " [0. 0. 1.]]\n" 146 | ] 147 | } 148 | ], 149 | "source": [ 150 | "import numpy as np\n", 151 | "\n", 152 | "a = np.array([1,2,3])\n", 153 | "b = np.array([0.1,0.2,0.3])\n", 154 | "c = np.array([-1,-0.5,0])\n", 155 | "d = np.array([0,0,0])\n", 156 | "\n", 157 | "identity = np.eye(3)\n", 158 | "print(identity)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 38, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "name": "stdout", 168 | "output_type": "stream", 169 | "text": [ 170 | "[1. 2. 3.]\n", 171 | "[0.1 0.2 0.3]\n", 172 | "[-1. -0.5 0. ]\n", 173 | "[0. 0. 0.]\n" 174 | ] 175 | } 176 | ], 177 | "source": [ 178 | "print(a.dot(identity))\n", 179 | "print(b.dot(identity))\n", 180 | "print(c.dot(identity))\n", 181 | "print(d.dot(identity))" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 39, 187 | "metadata": {}, 188 | "outputs": [ 189 | { 190 | "name": "stdout", 191 | "output_type": "stream", 192 | "text": [ 193 | "[13 15 17]\n", 194 | "[13. 15. 17.]\n" 195 | ] 196 | } 197 | ], 198 | "source": [ 199 | "this = np.array([2,4,6])\n", 200 | "movie = np.array([10,10,10])\n", 201 | "rocks = np.array([1,1,1])\n", 202 | "\n", 203 | "print(this + movie + rocks)\n", 204 | "print((this.dot(identity) + movie).dot(identity) + rocks)" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "# Forward Propagation in Python" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 40, 217 | "metadata": {}, 218 | "outputs": [], 219 | "source": [ 220 | "import numpy as np\n", 221 | "\n", 222 | "def softmax(x_):\n", 223 | " x = np.atleast_2d(x_)\n", 224 | " temp = np.exp(x)\n", 225 | " return temp / np.sum(temp, axis=1, keepdims=True)\n", 226 | "\n", 227 | "word_vects = {}\n", 228 | "word_vects['yankees'] = np.array([[0.,0.,0.]])\n", 229 | "word_vects['bears'] = np.array([[0.,0.,0.]])\n", 230 | "word_vects['braves'] = np.array([[0.,0.,0.]])\n", 231 | "word_vects['red'] = np.array([[0.,0.,0.]])\n", 232 | "word_vects['socks'] = np.array([[0.,0.,0.]])\n", 233 | "word_vects['lose'] = np.array([[0.,0.,0.]])\n", 234 | "word_vects['defeat'] = np.array([[0.,0.,0.]])\n", 235 | "word_vects['beat'] = np.array([[0.,0.,0.]])\n", 236 | "word_vects['tie'] = np.array([[0.,0.,0.]])\n", 237 | "\n", 238 | "sent2output = np.random.rand(3,len(word_vects))\n", 239 | "\n", 240 | "identity = np.eye(3)" 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 41, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "[[0.11111111 0.11111111 0.11111111 0.11111111 0.11111111 0.11111111\n", 253 | " 0.11111111 0.11111111 0.11111111]]\n" 254 | ] 255 | } 256 | ], 257 | "source": [ 258 | "layer_0 = word_vects['red']\n", 259 | "layer_1 = layer_0.dot(identity) + word_vects['socks']\n", 260 | "layer_2 = layer_1.dot(identity) + word_vects['defeat']\n", 261 | "\n", 262 | "pred = softmax(layer_2.dot(sent2output))\n", 263 | "print(pred)" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "# How do we Backpropagate into this?" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 46, 276 | "metadata": {}, 277 | "outputs": [], 278 | "source": [ 279 | "y = np.array([1,0,0,0,0,0,0,0,0]) # target one-hot vector for \"yankees\"\n", 280 | "\n", 281 | "pred_delta = pred - y\n", 282 | "layer_2_delta = pred_delta.dot(sent2output.T)\n", 283 | "defeat_delta = layer_2_delta * 1 # can ignore the \"1\" like prev. chapter\n", 284 | "layer_1_delta = layer_2_delta.dot(identity.T)\n", 285 | "socks_delta = layer_1_delta * 1 # again... can ignore the \"1\"\n", 286 | "layer_0_delta = layer_1_delta.dot(identity.T)\n", 287 | "alpha = 0.01\n", 288 | "word_vects['red'] -= layer_0_delta * alpha\n", 289 | "word_vects['socks'] -= socks_delta * alpha\n", 290 | "word_vects['defeat'] -= defeat_delta * alpha\n", 291 | "identity -= np.outer(layer_0,layer_1_delta) * alpha\n", 292 | "identity -= np.outer(layer_1,layer_2_delta) * alpha\n", 293 | "sent2output -= np.outer(layer_2,pred_delta) * alpha" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "# Let's Train it!" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 49, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "name": "stdout", 310 | "output_type": "stream", 311 | "text": [ 312 | "[['mary', 'moved', 'to', 'the', 'bathroom.'], ['john', 'went', 'to', 'the', 'hallway.'], ['where', 'is', 'mary?', '\\tbathroom\\t1']]\n" 313 | ] 314 | } 315 | ], 316 | "source": [ 317 | "import sys,random,math\n", 318 | "from collections import Counter\n", 319 | "import numpy as np\n", 320 | "\n", 321 | "f = open('tasksv11/en/qa1_single-supporting-fact_train.txt','r')\n", 322 | "raw = f.readlines()\n", 323 | "f.close()\n", 324 | "\n", 325 | "tokens = list()\n", 326 | "for line in raw[0:1000]:\n", 327 | " tokens.append(line.lower().replace(\"\\n\",\"\").split(\" \")[1:])\n", 328 | "\n", 329 | "print(tokens[0:3])" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 87, 335 | "metadata": {}, 336 | "outputs": [], 337 | "source": [ 338 | "vocab = set()\n", 339 | "for sent in tokens:\n", 340 | " for word in sent:\n", 341 | " vocab.add(word)\n", 342 | "\n", 343 | "vocab = list(vocab)\n", 344 | "\n", 345 | "word2index = {}\n", 346 | "for i,word in enumerate(vocab):\n", 347 | " word2index[word]=i\n", 348 | " \n", 349 | "def words2indices(sentence):\n", 350 | " idx = list()\n", 351 | " for word in sentence:\n", 352 | " idx.append(word2index[word])\n", 353 | " return idx\n", 354 | "\n", 355 | "def softmax(x):\n", 356 | " e_x = np.exp(x - np.max(x))\n", 357 | " return e_x / e_x.sum(axis=0)" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 88, 363 | "metadata": {}, 364 | "outputs": [], 365 | "source": [ 366 | "np.random.seed(1)\n", 367 | "embed_size = 10\n", 368 | "\n", 369 | "# word embeddings\n", 370 | "embed = (np.random.rand(len(vocab),embed_size) - 0.5) * 0.1\n", 371 | "\n", 372 | "# embedding -> embedding (initially the identity matrix)\n", 373 | "recurrent = np.eye(embed_size)\n", 374 | "\n", 375 | "# sentence embedding for empty sentence\n", 376 | "start = np.zeros(embed_size)\n", 377 | "\n", 378 | "# embedding -> output weights\n", 379 | "decoder = (np.random.rand(embed_size, len(vocab)) - 0.5) * 0.1\n", 380 | "\n", 381 | "# one hot lookups (for loss function)\n", 382 | "one_hot = np.eye(len(vocab))" 383 | ] 384 | }, 385 | { 386 | "cell_type": "markdown", 387 | "metadata": {}, 388 | "source": [ 389 | "# Forward Propagation with Arbitrary Length" 390 | ] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": 89, 395 | "metadata": {}, 396 | "outputs": [], 397 | "source": [ 398 | "def predict(sent):\n", 399 | " \n", 400 | " layers = list()\n", 401 | " layer = {}\n", 402 | " layer['hidden'] = start\n", 403 | " layers.append(layer)\n", 404 | "\n", 405 | " loss = 0\n", 406 | "\n", 407 | " # forward propagate\n", 408 | " preds = list()\n", 409 | " for target_i in range(len(sent)):\n", 410 | "\n", 411 | " layer = {}\n", 412 | "\n", 413 | " # try to predict the next term\n", 414 | " layer['pred'] = softmax(layers[-1]['hidden'].dot(decoder))\n", 415 | "\n", 416 | " loss += -np.log(layer['pred'][sent[target_i]])\n", 417 | "\n", 418 | " # generate the next hidden state\n", 419 | " layer['hidden'] = layers[-1]['hidden'].dot(recurrent) + embed[sent[target_i]]\n", 420 | " layers.append(layer)\n", 421 | "\n", 422 | " return layers, loss" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "# Backpropagation with Arbitrary Length" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": 90, 435 | "metadata": {}, 436 | "outputs": [], 437 | "source": [ 438 | "# forward\n", 439 | "for iter in range(30000):\n", 440 | " alpha = 0.001\n", 441 | " sent = words2indices(tokens[iter%len(tokens)][1:])\n", 442 | " layers,loss = predict(sent) \n", 443 | "\n", 444 | " # back propagate\n", 445 | " for layer_idx in reversed(range(len(layers))):\n", 446 | " layer = layers[layer_idx]\n", 447 | " target = sent[layer_idx-1]\n", 448 | "\n", 449 | " if(layer_idx > 0): # if not the first layer\n", 450 | " layer['output_delta'] = layer['pred'] - one_hot[target]\n", 451 | " new_hidden_delta = layer['output_delta'].dot(decoder.transpose())\n", 452 | "\n", 453 | " # if the last layer - don't pull from a later one becasue it doesn't exist\n", 454 | " if(layer_idx == len(layers)-1):\n", 455 | " layer['hidden_delta'] = new_hidden_delta\n", 456 | " else:\n", 457 | " layer['hidden_delta'] = new_hidden_delta + layers[layer_idx+1]['hidden_delta'].dot(recurrent.transpose())\n", 458 | " else: # if the first layer\n", 459 | " layer['hidden_delta'] = layers[layer_idx+1]['hidden_delta'].dot(recurrent.transpose())" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "# Weight Update with Arbitrary Length" 467 | ] 468 | }, 469 | { 470 | "cell_type": "code", 471 | "execution_count": 91, 472 | "metadata": {}, 473 | "outputs": [ 474 | { 475 | "name": "stdout", 476 | "output_type": "stream", 477 | "text": [ 478 | "Perplexity:82.09227500075585\n", 479 | "Perplexity:81.87615610433569\n", 480 | "Perplexity:81.53705034457951\n", 481 | "Perplexity:80.88879456876245\n", 482 | "Perplexity:79.50015694256045\n", 483 | "Perplexity:76.04440447063566\n", 484 | "Perplexity:63.76523100870378\n", 485 | "Perplexity:34.69262611144399\n", 486 | "Perplexity:21.77439314730968\n", 487 | "Perplexity:19.74440305631078\n", 488 | "Perplexity:18.813349002926333\n", 489 | "Perplexity:17.920571868736154\n", 490 | "Perplexity:16.84823833832929\n", 491 | "Perplexity:15.302868260393344\n", 492 | "Perplexity:12.898616378336536\n", 493 | "Perplexity:9.781678937443305\n", 494 | "Perplexity:7.546724222346714\n", 495 | "Perplexity:6.4277474041777305\n", 496 | "Perplexity:5.685698933881173\n", 497 | "Perplexity:5.240514920446924\n", 498 | "Perplexity:4.916476504398705\n", 499 | "Perplexity:4.674677629541541\n", 500 | "Perplexity:4.494159385603734\n", 501 | "Perplexity:4.365041755388302\n", 502 | "Perplexity:4.289971726173599\n", 503 | "Perplexity:4.243384558378477\n", 504 | "Perplexity:4.192001080475404\n", 505 | "Perplexity:4.132556753967558\n", 506 | "Perplexity:4.071667181580819\n", 507 | "Perplexity:4.0167814473718435\n" 508 | ] 509 | } 510 | ], 511 | "source": [ 512 | "# forward\n", 513 | "for iter in range(30000):\n", 514 | " alpha = 0.001\n", 515 | " sent = words2indices(tokens[iter%len(tokens)][1:])\n", 516 | "\n", 517 | " layers,loss = predict(sent) \n", 518 | "\n", 519 | " # back propagate\n", 520 | " for layer_idx in reversed(range(len(layers))):\n", 521 | " layer = layers[layer_idx]\n", 522 | " target = sent[layer_idx-1]\n", 523 | "\n", 524 | " if(layer_idx > 0):\n", 525 | " layer['output_delta'] = layer['pred'] - one_hot[target]\n", 526 | " new_hidden_delta = layer['output_delta'].dot(decoder.transpose())\n", 527 | "\n", 528 | " # if the last layer - don't pull from a \n", 529 | " # later one becasue it doesn't exist\n", 530 | " if(layer_idx == len(layers)-1):\n", 531 | " layer['hidden_delta'] = new_hidden_delta\n", 532 | " else:\n", 533 | " layer['hidden_delta'] = new_hidden_delta + layers[layer_idx+1]['hidden_delta'].dot(recurrent.transpose())\n", 534 | " else:\n", 535 | " layer['hidden_delta'] = layers[layer_idx+1]['hidden_delta'].dot(recurrent.transpose())\n", 536 | "\n", 537 | " # update weights\n", 538 | " start -= layers[0]['hidden_delta'] * alpha / float(len(sent))\n", 539 | " for layer_idx,layer in enumerate(layers[1:]):\n", 540 | " \n", 541 | " decoder -= np.outer(layers[layer_idx]['hidden'], layer['output_delta']) * alpha / float(len(sent))\n", 542 | " \n", 543 | " embed_idx = sent[layer_idx]\n", 544 | " embed[embed_idx] -= layers[layer_idx]['hidden_delta'] * alpha / float(len(sent))\n", 545 | " recurrent -= np.outer(layers[layer_idx]['hidden'], layer['hidden_delta']) * alpha / float(len(sent))\n", 546 | " \n", 547 | " if(iter % 1000 == 0):\n", 548 | " print(\"Perplexity:\" + str(np.exp(loss/len(sent))))" 549 | ] 550 | }, 551 | { 552 | "cell_type": "markdown", 553 | "metadata": {}, 554 | "source": [ 555 | "# Execution and Output Analysis" 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": 93, 561 | "metadata": {}, 562 | "outputs": [ 563 | { 564 | "name": "stdout", 565 | "output_type": "stream", 566 | "text": [ 567 | "['sandra', 'moved', 'to', 'the', 'garden.']\n", 568 | "Prev Input:sandra True:moved Pred:is\n", 569 | "Prev Input:moved True:to Pred:to\n", 570 | "Prev Input:to True:the Pred:the\n", 571 | "Prev Input:the True:garden. Pred:bedroom.\n" 572 | ] 573 | } 574 | ], 575 | "source": [ 576 | "sent_index = 4\n", 577 | "\n", 578 | "l,_ = predict(words2indices(tokens[sent_index]))\n", 579 | "\n", 580 | "print(tokens[sent_index])\n", 581 | "\n", 582 | "for i,each_layer in enumerate(l[1:-1]):\n", 583 | " input = tokens[sent_index][i]\n", 584 | " true = tokens[sent_index][i+1]\n", 585 | " pred = vocab[each_layer['pred'].argmax()]\n", 586 | " print(\"Prev Input:\" + input + (' ' * (12 - len(input))) +\\\n", 587 | " \"True:\" + true + (\" \" * (15 - len(true))) + \"Pred:\" + pred)" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": null, 593 | "metadata": {}, 594 | "outputs": [], 595 | "source": [] 596 | } 597 | ], 598 | "metadata": { 599 | "kernelspec": { 600 | "display_name": "Python 3", 601 | "language": "python", 602 | "name": "python3" 603 | }, 604 | "language_info": { 605 | "codemirror_mode": { 606 | "name": "ipython", 607 | "version": 3 608 | }, 609 | "file_extension": ".py", 610 | "mimetype": "text/x-python", 611 | "name": "python", 612 | "nbconvert_exporter": "python", 613 | "pygments_lexer": "ipython3", 614 | "version": "3.6.1" 615 | } 616 | }, 617 | "nbformat": 4, 618 | "nbformat_minor": 2 619 | } 620 | -------------------------------------------------------------------------------- /Chapter14 - Exploding Gradients Examples.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 158, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "Activations\n", 13 | "[0.93940638 0.96852968]\n", 14 | "[0.9919462 0.99121735]\n", 15 | "[0.99301385 0.99302901]\n", 16 | "[0.9930713 0.99307098]\n", 17 | "[0.99307285 0.99307285]\n", 18 | "[0.99307291 0.99307291]\n", 19 | "[0.99307291 0.99307291]\n", 20 | "[0.99307291 0.99307291]\n", 21 | "[0.99307291 0.99307291]\n", 22 | "[0.99307291 0.99307291]\n", 23 | "\n", 24 | "Gradients\n", 25 | "[0.03439552 0.03439552]\n", 26 | "[0.00118305 0.00118305]\n", 27 | "[4.06916726e-05 4.06916726e-05]\n", 28 | "[1.39961115e-06 1.39961115e-06]\n", 29 | "[4.81403643e-08 4.81403637e-08]\n", 30 | "[1.65582672e-09 1.65582765e-09]\n", 31 | "[5.69682675e-11 5.69667160e-11]\n", 32 | "[1.97259346e-12 1.97517920e-12]\n", 33 | "[8.45387597e-14 8.02306381e-14]\n", 34 | "[1.45938177e-14 2.16938983e-14]\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "import numpy as np\n", 40 | "\n", 41 | "sigmoid = lambda x:1/(1 + np.exp(-x))\n", 42 | "relu = lambda x:(x>0).astype(float)*x\n", 43 | "\n", 44 | "weights = np.array([[1,4],[4,1]])\n", 45 | "activation = sigmoid(np.array([1,0.01]))\n", 46 | "\n", 47 | "print(\"Activations\")\n", 48 | "activations = list()\n", 49 | "for iter in range(10):\n", 50 | " activation = sigmoid(activation.dot(weights))\n", 51 | " activations.append(activation)\n", 52 | " print(activation)\n", 53 | "print(\"\\nGradients\")\n", 54 | "gradient = np.ones_like(activation)\n", 55 | "for activation in reversed(activations):\n", 56 | " gradient = (activation * (1 - activation) * gradient)\n", 57 | " gradient = gradient.dot(weights.transpose())\n", 58 | " print(gradient)" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 160, 64 | "metadata": {}, 65 | "outputs": [ 66 | { 67 | "name": "stdout", 68 | "output_type": "stream", 69 | "text": [ 70 | "Relu Activations\n", 71 | "[23.71814585 23.98025559]\n", 72 | "[119.63916823 118.852839 ]\n", 73 | "[595.05052421 597.40951192]\n", 74 | "[2984.68857188 2977.61160877]\n", 75 | "[14895.13500696 14916.36589628]\n", 76 | "[74560.59859209 74496.90592414]\n", 77 | "[372548.22228863 372739.30029248]\n", 78 | "[1863505.42345854 1862932.18944699]\n", 79 | "[9315234.18124649 9316953.88328115]\n", 80 | "[46583049.71437107 46577890.60826711]\n", 81 | "\n", 82 | "Relu Gradients\n", 83 | "[5. 5.]\n", 84 | "[25. 25.]\n", 85 | "[125. 125.]\n", 86 | "[625. 625.]\n", 87 | "[3125. 3125.]\n", 88 | "[15625. 15625.]\n", 89 | "[78125. 78125.]\n", 90 | "[390625. 390625.]\n", 91 | "[1953125. 1953125.]\n", 92 | "[9765625. 9765625.]\n" 93 | ] 94 | } 95 | ], 96 | "source": [ 97 | "print(\"Relu Activations\")\n", 98 | "activations = list()\n", 99 | "for iter in range(10):\n", 100 | " activation = relu(activation.dot(weights))\n", 101 | " activations.append(activation)\n", 102 | " print(activation)\n", 103 | "\n", 104 | "print(\"\\nRelu Gradients\")\n", 105 | "gradient = np.ones_like(activation)\n", 106 | "for activation in reversed(activations):\n", 107 | " gradient = ((activation > 0) * gradient).dot(weights.transpose())\n", 108 | " print(gradient)" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": {}, 115 | "outputs": [], 116 | "source": [] 117 | } 118 | ], 119 | "metadata": { 120 | "kernelspec": { 121 | "display_name": "Python 3", 122 | "language": "python", 123 | "name": "python3" 124 | }, 125 | "language_info": { 126 | "codemirror_mode": { 127 | "name": "ipython", 128 | "version": 3 129 | }, 130 | "file_extension": ".py", 131 | "mimetype": "text/x-python", 132 | "name": "python", 133 | "nbconvert_exporter": "python", 134 | "pygments_lexer": "ipython3", 135 | "version": "3.6.1" 136 | } 137 | }, 138 | "nbformat": 4, 139 | "nbformat_minor": 2 140 | } 141 | -------------------------------------------------------------------------------- /Chapter3 - Forward Propagation - Intro to Neural Prediction.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# A Simple Neural Network Making a Prediction" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "### What is a Neural Network?" 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [ 22 | { 23 | "name": "stdout", 24 | "output_type": "stream", 25 | "text": [ 26 | "0.8500000000000001\n" 27 | ] 28 | } 29 | ], 30 | "source": [ 31 | "# The network:\n", 32 | "\n", 33 | "weight = 0.1 \n", 34 | "def neural_network(input, weight):\n", 35 | " prediction = input * weight\n", 36 | " return prediction\n", 37 | "\n", 38 | "# How we use the network to predict something:\n", 39 | "\n", 40 | "number_of_toes = [8.5, 9.5, 10, 9]\n", 41 | "input = number_of_toes[0]\n", 42 | "pred = neural_network(input,weight)\n", 43 | "print(pred)" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "# Making a Prediction with Multiple Inputs" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "### Complete Runnable Code" 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 2, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "name": "stdout", 67 | "output_type": "stream", 68 | "text": [ 69 | "0.9800000000000001\n" 70 | ] 71 | } 72 | ], 73 | "source": [ 74 | "def w_sum(a,b):\n", 75 | " assert(len(a) == len(b))\n", 76 | " output = 0\n", 77 | " for i in range(len(a)):\n", 78 | " output += (a[i] * b[i])\n", 79 | " return output\n", 80 | "\n", 81 | "weights = [0.1, 0.2, 0] \n", 82 | " \n", 83 | "def neural_network(input, weights):\n", 84 | " pred = w_sum(input,weights)\n", 85 | " return pred\n", 86 | "\n", 87 | "# This dataset is the current\n", 88 | "# status at the beginning of\n", 89 | "# each game for the first 4 games\n", 90 | "# in a season.\n", 91 | "\n", 92 | "# toes = current number of toes\n", 93 | "# wlrec = current games won (percent)\n", 94 | "# nfans = fan count (in millions)\n", 95 | "\n", 96 | "toes = [8.5, 9.5, 9.9, 9.0]\n", 97 | "wlrec = [0.65, 0.8, 0.8, 0.9]\n", 98 | "nfans = [1.2, 1.3, 0.5, 1.0]\n", 99 | "\n", 100 | "# Input corresponds to every entry\n", 101 | "# for the first game of the season.\n", 102 | "\n", 103 | "input = [toes[0],wlrec[0],nfans[0]]\n", 104 | "pred = neural_network(input,weights)\n", 105 | "\n", 106 | "print(pred)" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "### NumPy Code" 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": 3, 119 | "metadata": {}, 120 | "outputs": [ 121 | { 122 | "name": "stdout", 123 | "output_type": "stream", 124 | "text": [ 125 | "0.9800000000000001\n" 126 | ] 127 | } 128 | ], 129 | "source": [ 130 | "import numpy as np\n", 131 | "weights = np.array([0.1, 0.2, 0])\n", 132 | "def neural_network(input, weights):\n", 133 | " pred = input.dot(weights)\n", 134 | " return pred\n", 135 | " \n", 136 | "toes = np.array([8.5, 9.5, 9.9, 9.0])\n", 137 | "wlrec = np.array([0.65, 0.8, 0.8, 0.9])\n", 138 | "nfans = np.array([1.2, 1.3, 0.5, 1.0])\n", 139 | "\n", 140 | "# Input corresponds to every entry\n", 141 | "# for the first game of the season.\n", 142 | "\n", 143 | "input = np.array([toes[0],wlrec[0],nfans[0]])\n", 144 | "pred = neural_network(input,weights)\n", 145 | "\n", 146 | "print(pred)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "# Making a Prediction with Multiple Outputs" 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 4, 159 | "metadata": {}, 160 | "outputs": [ 161 | { 162 | "name": "stdout", 163 | "output_type": "stream", 164 | "text": [ 165 | "[0.195, 0.13, 0.5850000000000001]\n" 166 | ] 167 | } 168 | ], 169 | "source": [ 170 | "# Instead of predicting just \n", 171 | "# whether the team won or lost, \n", 172 | "# now we're also predicting whether\n", 173 | "# they are happy/sad AND the percentage\n", 174 | "# of the team that is hurt. We are\n", 175 | "# making this prediction using only\n", 176 | "# the current win/loss record.\n", 177 | "\n", 178 | "def ele_mul(number,vector):\n", 179 | " output = [0,0,0]\n", 180 | " assert(len(output) == len(vector))\n", 181 | " for i in range(len(vector)):\n", 182 | " output[i] = number * vector[i]\n", 183 | " return output\n", 184 | "\n", 185 | "weights = [0.3, 0.2, 0.9] \n", 186 | "\n", 187 | "def neural_network(input, weights):\n", 188 | " pred = ele_mul(input,weights)\n", 189 | " return pred\n", 190 | " \n", 191 | "wlrec = [0.65, 0.8, 0.8, 0.9]\n", 192 | "input = wlrec[0]\n", 193 | "pred = neural_network(input,weights)\n", 194 | "\n", 195 | "print(pred)" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "# Predicting with Multiple Inputs & Outputs" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 5, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "[0.555, 0.9800000000000001, 0.9650000000000001]\n" 215 | ] 216 | } 217 | ], 218 | "source": [ 219 | " #toes %win #fans\n", 220 | "weights = [ [0.1, 0.1, -0.3], #hurt?\n", 221 | " [0.1, 0.2, 0.0], #win?\n", 222 | " [0.0, 1.3, 0.1] ] #sad?\n", 223 | "\n", 224 | "def w_sum(a,b):\n", 225 | " assert(len(a) == len(b))\n", 226 | " output = 0\n", 227 | " for i in range(len(a)):\n", 228 | " output += (a[i] * b[i])\n", 229 | " return output\n", 230 | "\n", 231 | "def vect_mat_mul(vect,matrix):\n", 232 | " assert(len(vect) == len(matrix))\n", 233 | " output = [0,0,0]\n", 234 | " for i in range(len(vect)):\n", 235 | " output[i] = w_sum(vect,matrix[i])\n", 236 | " return output\n", 237 | "\n", 238 | "def neural_network(input, weights):\n", 239 | " pred = vect_mat_mul(input,weights)\n", 240 | " return pred\n", 241 | "\n", 242 | "# This dataset is the current\n", 243 | "# status at the beginning of\n", 244 | "# each game for the first 4 games\n", 245 | "# in a season.\n", 246 | "\n", 247 | "# toes = current number of toes\n", 248 | "# wlrec = current games won (percent)\n", 249 | "# nfans = fan count (in millions)\n", 250 | "\n", 251 | "toes = [8.5, 9.5, 9.9, 9.0]\n", 252 | "wlrec = [0.65,0.8, 0.8, 0.9]\n", 253 | "nfans = [1.2, 1.3, 0.5, 1.0]\n", 254 | "\n", 255 | "# Input corresponds to every entry\n", 256 | "# for the first game of the season.\n", 257 | "\n", 258 | "input = [toes[0],wlrec[0],nfans[0]]\n", 259 | "pred = neural_network(input,weights)\n", 260 | "\n", 261 | "print(pred)" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "# Predicting on Predictions" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 6, 274 | "metadata": {}, 275 | "outputs": [ 276 | { 277 | "name": "stdout", 278 | "output_type": "stream", 279 | "text": [ 280 | "[0.21350000000000002, 0.14500000000000002, 0.5065]\n" 281 | ] 282 | } 283 | ], 284 | "source": [ 285 | " #toes %win #fans\n", 286 | "ih_wgt = [ [0.1, 0.2, -0.1], #hid[0]\n", 287 | " [-0.1,0.1, 0.9], #hid[1]\n", 288 | " [0.1, 0.4, 0.1] ] #hid[2]\n", 289 | "\n", 290 | " #hid[0] hid[1] hid[2]\n", 291 | "hp_wgt = [ [0.3, 1.1, -0.3], #hurt?\n", 292 | " [0.1, 0.2, 0.0], #win?\n", 293 | " [0.0, 1.3, 0.1] ] #sad?\n", 294 | "\n", 295 | "weights = [ih_wgt, hp_wgt]\n", 296 | "\n", 297 | "def neural_network(input, weights):\n", 298 | " hid = vect_mat_mul(input,weights[0])\n", 299 | " pred = vect_mat_mul(hid,weights[1])\n", 300 | " return pred\n", 301 | "\n", 302 | "toes = [8.5, 9.5, 9.9, 9.0]\n", 303 | "wlrec = [0.65,0.8, 0.8, 0.9]\n", 304 | "nfans = [1.2, 1.3, 0.5, 1.0]\n", 305 | "\n", 306 | "# Input corresponds to every entry\n", 307 | "# for the first game of the season.\n", 308 | "\n", 309 | "input = [toes[0],wlrec[0],nfans[0]]\n", 310 | "pred = neural_network(input,weights)\n", 311 | "\n", 312 | "print(pred)" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "# NumPy Version" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 7, 325 | "metadata": {}, 326 | "outputs": [ 327 | { 328 | "name": "stdout", 329 | "output_type": "stream", 330 | "text": [ 331 | "[0.2135 0.145 0.5065]\n" 332 | ] 333 | } 334 | ], 335 | "source": [ 336 | "import numpy as np\n", 337 | "\n", 338 | "#toes %win #fans\n", 339 | "ih_wgt = np.array([ \n", 340 | " [0.1, 0.2, -0.1], #hid[0]\n", 341 | " [-0.1,0.1, 0.9], #hid[1]\n", 342 | " [0.1, 0.4, 0.1]]).T #hid[2]\n", 343 | "\n", 344 | "\n", 345 | "# hid[0] hid[1] hid[2]\n", 346 | "hp_wgt = np.array([ \n", 347 | " [0.3, 1.1, -0.3], #hurt?\n", 348 | " [0.1, 0.2, 0.0], #win?\n", 349 | " [0.0, 1.3, 0.1] ]).T #sad?\n", 350 | "\n", 351 | "weights = [ih_wgt, hp_wgt]\n", 352 | "\n", 353 | "def neural_network(input, weights):\n", 354 | "\n", 355 | " hid = input.dot(weights[0])\n", 356 | " pred = hid.dot(weights[1])\n", 357 | " return pred\n", 358 | "\n", 359 | "\n", 360 | "toes = np.array([8.5, 9.5, 9.9, 9.0])\n", 361 | "wlrec = np.array([0.65,0.8, 0.8, 0.9])\n", 362 | "nfans = np.array([1.2, 1.3, 0.5, 1.0])\n", 363 | "\n", 364 | "input = np.array([toes[0],wlrec[0],nfans[0]])\n", 365 | "\n", 366 | "pred = neural_network(input,weights)\n", 367 | "print(pred)" 368 | ] 369 | }, 370 | { 371 | "cell_type": "markdown", 372 | "metadata": {}, 373 | "source": [ 374 | "# A Quick Primer on NumPy" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": 8, 380 | "metadata": {}, 381 | "outputs": [ 382 | { 383 | "name": "stdout", 384 | "output_type": "stream", 385 | "text": [ 386 | "[0 1 2 3]\n", 387 | "[4 5 6 7]\n", 388 | "[[0 1 2 3]\n", 389 | " [4 5 6 7]]\n", 390 | "[[0. 0. 0. 0.]\n", 391 | " [0. 0. 0. 0.]]\n", 392 | "[[0.40221396 0.5714968 0.68579318 0.73326444 0.42793703]\n", 393 | " [0.19555759 0.20401945 0.21708259 0.95738529 0.42907317]]\n" 394 | ] 395 | } 396 | ], 397 | "source": [ 398 | "import numpy as np\n", 399 | "\n", 400 | "a = np.array([0,1,2,3]) # a vector\n", 401 | "b = np.array([4,5,6,7]) # another vector\n", 402 | "c = np.array([[0,1,2,3], # a matrix\n", 403 | " [4,5,6,7]])\n", 404 | "\n", 405 | "d = np.zeros((2,4)) # (2x4 matrix of zeros)\n", 406 | "e = np.random.rand(2,5) # random 2x5\n", 407 | "# matrix with all numbers between 0 and 1\n", 408 | "\n", 409 | "print(a)\n", 410 | "print(b)\n", 411 | "print(c)\n", 412 | "print(d)\n", 413 | "print(e)" 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": 11, 419 | "metadata": {}, 420 | "outputs": [ 421 | { 422 | "name": "stdout", 423 | "output_type": "stream", 424 | "text": [ 425 | "[[0. 0. 0. 0.]]\n", 426 | "[[0. 0. 0.]]\n" 427 | ] 428 | }, 429 | { 430 | "ename": "ValueError", 431 | "evalue": "operands could not be broadcast together with shapes (1,4) (4,3) ", 432 | "output_type": "error", 433 | "traceback": [ 434 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 435 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 436 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mc\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m0.2\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# multiplies every number in matrix \"c\" by 0.2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# multiplies elementwise between a and b (columns paired up)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mb\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m0.2\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# elementwise multiplication then multiplied by 0.2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 437 | "\u001b[0;31mValueError\u001b[0m: operands could not be broadcast together with shapes (1,4) (4,3) " 438 | ] 439 | } 440 | ], 441 | "source": [ 442 | "print(a * 0.1) # multiplies every number in vector \"a\" by 0.1\n", 443 | " \n", 444 | "print(c * 0.2) # multiplies every number in matrix \"c\" by 0.2\n", 445 | " \n", 446 | "print(a * b) # multiplies elementwise between a and b (columns paired up)\n", 447 | " \n", 448 | "print(a * b * 0.2) # elementwise multiplication then multiplied by 0.2\n", 449 | " \n", 450 | "print(a * c) # since c has the same number of columns as a, this performs\n", 451 | "# elementwise multiplication on every row of the matrix \"c\"\n", 452 | "\n", 453 | "print(a * e) # since a and e don't have the same number of columns, this\n", 454 | "# throws a \"Value Error: operands could not be broadcast together with..\"" 455 | ] 456 | }, 457 | { 458 | "cell_type": "code", 459 | "execution_count": 12, 460 | "metadata": {}, 461 | "outputs": [ 462 | { 463 | "name": "stdout", 464 | "output_type": "stream", 465 | "text": [ 466 | "(1, 3)\n" 467 | ] 468 | } 469 | ], 470 | "source": [ 471 | "a = np.zeros((1,4)) # vector of length 4\n", 472 | "b = np.zeros((4,3)) # matrix with 4 rows & 3 columns\n", 473 | "\n", 474 | "c = a.dot(b)\n", 475 | "print(c.shape)" 476 | ] 477 | }, 478 | { 479 | "cell_type": "code", 480 | "execution_count": 13, 481 | "metadata": {}, 482 | "outputs": [ 483 | { 484 | "name": "stdout", 485 | "output_type": "stream", 486 | "text": [ 487 | "(2, 3)\n", 488 | "(2, 3)\n", 489 | "(4, 6)\n" 490 | ] 491 | }, 492 | { 493 | "ename": "ValueError", 494 | "evalue": "shapes (5,4) and (5,6) not aligned: 4 (dim 1) != 5 (dim 0)", 495 | "output_type": "error", 496 | "traceback": [ 497 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 498 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 499 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 19\u001b[0m \u001b[0mh\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# matrix with 5 rows and 4 columns\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 20\u001b[0m \u001b[0mi\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mzeros\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m6\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# matrix with 5 rows & 6 columns\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 21\u001b[0;31m \u001b[0mj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mh\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mi\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 22\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# throws an error\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 500 | "\u001b[0;31mValueError\u001b[0m: shapes (5,4) and (5,6) not aligned: 4 (dim 1) != 5 (dim 0)" 501 | ] 502 | } 503 | ], 504 | "source": [ 505 | "a = np.zeros((2,4)) # matrix with 2 rows and 4 columns\n", 506 | "b = np.zeros((4,3)) # matrix with 4 rows & 3 columns\n", 507 | "\n", 508 | "c = a.dot(b)\n", 509 | "print(c.shape) # outputs (2,3)\n", 510 | "\n", 511 | "e = np.zeros((2,1)) # matrix with 2 rows and 1 columns\n", 512 | "f = np.zeros((1,3)) # matrix with 1 row & 3 columns\n", 513 | "\n", 514 | "g = e.dot(f)\n", 515 | "print(g.shape) # outputs (2,3)\n", 516 | "\n", 517 | "h = np.zeros((5,4)).T # matrix with 4 rows and 5 columns\n", 518 | "i = np.zeros((5,6)) # matrix with 6 rows & 5 columns\n", 519 | "\n", 520 | "j = h.dot(i)\n", 521 | "print(j.shape) # outputs (4,6)\n", 522 | "\n", 523 | "h = np.zeros((5,4)) # matrix with 5 rows and 4 columns\n", 524 | "i = np.zeros((5,6)) # matrix with 5 rows & 6 columns\n", 525 | "j = h.dot(i)\n", 526 | "print(j.shape) # throws an error" 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": null, 532 | "metadata": {}, 533 | "outputs": [], 534 | "source": [] 535 | }, 536 | { 537 | "cell_type": "code", 538 | "execution_count": null, 539 | "metadata": {}, 540 | "outputs": [], 541 | "source": [] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": null, 546 | "metadata": {}, 547 | "outputs": [], 548 | "source": [] 549 | } 550 | ], 551 | "metadata": { 552 | "kernelspec": { 553 | "display_name": "Python 3", 554 | "language": "python", 555 | "name": "python3" 556 | }, 557 | "language_info": { 558 | "codemirror_mode": { 559 | "name": "ipython", 560 | "version": 3 561 | }, 562 | "file_extension": ".py", 563 | "mimetype": "text/x-python", 564 | "name": "python", 565 | "nbconvert_exporter": "python", 566 | "pygments_lexer": "ipython3", 567 | "version": "3.6.1" 568 | } 569 | }, 570 | "nbformat": 4, 571 | "nbformat_minor": 2 572 | } 573 | -------------------------------------------------------------------------------- /Chapter5 - Generalizing Gradient Descent - Learning Multiple Weights at a Time.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Gradient Descent Learning with Multiple Inputs" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 129, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "Weights:[0.1119, 0.20091, -0.09832]\n", 20 | "Weight Deltas:[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]\n" 21 | ] 22 | } 23 | ], 24 | "source": [ 25 | "def w_sum(a,b):\n", 26 | " assert(len(a) == len(b))\n", 27 | " output = 0\n", 28 | "\n", 29 | " for i in range(len(a)):\n", 30 | " output += (a[i] * b[i])\n", 31 | "\n", 32 | " return output\n", 33 | "\n", 34 | "weights = [0.1, 0.2, -.1] \n", 35 | "\n", 36 | "def neural_network(input,weights):\n", 37 | " pred = w_sum(input,weights)\n", 38 | " return pred\n", 39 | "\n", 40 | "toes = [8.5, 9.5, 9.9, 9.0]\n", 41 | "wlrec = [0.65, 0.8, 0.8, 0.9]\n", 42 | "nfans = [1.2, 1.3, 0.5, 1.0]\n", 43 | "\n", 44 | "win_or_lose_binary = [1, 1, 0, 1]\n", 45 | "\n", 46 | "true = win_or_lose_binary[0]\n", 47 | "\n", 48 | "# Input corresponds to every entry\n", 49 | "# for the first game of the season.\n", 50 | "\n", 51 | "input = [toes[0],wlrec[0],nfans[0]]\n", 52 | "\n", 53 | "pred = neural_network(input,weights)\n", 54 | "error = (pred - true) ** 2\n", 55 | "delta = pred - true\n", 56 | "\n", 57 | "def ele_mul(number,vector):\n", 58 | " output = [0,0,0]\n", 59 | "\n", 60 | " assert(len(output) == len(vector))\n", 61 | "\n", 62 | " for i in range(len(vector)):\n", 63 | " output[i] = number * vector[i]\n", 64 | "\n", 65 | " return output\n", 66 | "\n", 67 | " \n", 68 | "\n", 69 | "alpha = 0.01\n", 70 | "\n", 71 | "for i in range(len(weights)):\n", 72 | " weights[i] -= alpha * weight_deltas[i]\n", 73 | " \n", 74 | "print(\"Weights:\" + str(weights))\n", 75 | "print(\"Weight Deltas:\" + str(weight_deltas))" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "# Let's Watch Several Steps of Learning" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 21, 88 | "metadata": {}, 89 | "outputs": [ 90 | { 91 | "name": "stdout", 92 | "output_type": "stream", 93 | "text": [ 94 | "Iteration:1\n", 95 | "Pred:0.8600000000000001\n", 96 | "Error:0.01959999999999997\n", 97 | "Delta:-0.1399999999999999\n", 98 | "Weights:[0.1, 0.2, -0.1]\n", 99 | "Weight_Deltas:\n", 100 | "[-1.189999999999999, -0.09099999999999994, -0.16799999999999987]\n", 101 | "\n", 102 | "Iteration:2\n", 103 | "Pred:0.9637574999999999\n", 104 | "Error:0.0013135188062500048\n", 105 | "Delta:-0.036242500000000066\n", 106 | "Weights:[0.1119, 0.20091, -0.09832]\n", 107 | "Weight_Deltas:\n", 108 | "[-0.30806125000000056, -0.023557625000000044, -0.04349100000000008]\n", 109 | "\n", 110 | "Iteration:3\n", 111 | "Pred:0.9906177228125002\n", 112 | "Error:8.802712522307997e-05\n", 113 | "Delta:-0.009382277187499843\n", 114 | "Weights:[0.11498061250000001, 0.20114557625, -0.09788509000000001]\n", 115 | "Weight_Deltas:\n", 116 | "[-0.07974935609374867, -0.006098480171874899, -0.011258732624999811]\n", 117 | "\n" 118 | ] 119 | } 120 | ], 121 | "source": [ 122 | "def neural_network(input, weights):\n", 123 | " out = 0\n", 124 | " for i in range(len(input)):\n", 125 | " out += (input[i] * weights[i])\n", 126 | " return out\n", 127 | "\n", 128 | "def ele_mul(scalar, vector):\n", 129 | " out = [0,0,0]\n", 130 | " for i in range(len(out)):\n", 131 | " out[i] = vector[i] * scalar\n", 132 | " return out\n", 133 | "\n", 134 | "toes = [8.5, 9.5, 9.9, 9.0]\n", 135 | "wlrec = [0.65, 0.8, 0.8, 0.9]\n", 136 | "nfans = [1.2, 1.3, 0.5, 1.0]\n", 137 | "\n", 138 | "win_or_lose_binary = [1, 1, 0, 1]\n", 139 | "true = win_or_lose_binary[0]\n", 140 | "\n", 141 | "alpha = 0.01\n", 142 | "weights = [0.1, 0.2, -.1]\n", 143 | "input = [toes[0],wlrec[0],nfans[0]]\n", 144 | "\n", 145 | "for iter in range(3):\n", 146 | "\n", 147 | " pred = neural_network(input,weights)\n", 148 | "\n", 149 | " error = (pred - true) ** 2\n", 150 | " delta = pred - true\n", 151 | "\n", 152 | " weight_deltas=ele_mul(delta,input)\n", 153 | "\n", 154 | " print(\"Iteration:\" + str(iter+1))\n", 155 | " print(\"Pred:\" + str(pred))\n", 156 | " print(\"Error:\" + str(error))\n", 157 | " print(\"Delta:\" + str(delta))\n", 158 | " print(\"Weights:\" + str(weights))\n", 159 | " print(\"Weight_Deltas:\")\n", 160 | " print(str(weight_deltas))\n", 161 | " print(\n", 162 | " )\n", 163 | "\n", 164 | " for i in range(len(weights)):\n", 165 | " weights[i]-=alpha*weight_deltas[i]" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "# Freezing One Weight - What Does It Do?" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 157, 178 | "metadata": {}, 179 | "outputs": [ 180 | { 181 | "name": "stdout", 182 | "output_type": "stream", 183 | "text": [ 184 | "Iteration:1\n", 185 | "Pred:0.8600000000000001\n", 186 | "Error:0.01959999999999997\n", 187 | "Delta:-0.1399999999999999\n", 188 | "Weights:[0.1, 0.2, -0.1]\n", 189 | "Weight_Deltas:\n", 190 | "[0, -0.09099999999999994, -0.16799999999999987]\n", 191 | "\n", 192 | "Iteration:2\n", 193 | "Pred:0.9382250000000001\n", 194 | "Error:0.003816150624999989\n", 195 | "Delta:-0.06177499999999991\n", 196 | "Weights:[0.1, 0.2273, -0.04960000000000005]\n", 197 | "Weight_Deltas:\n", 198 | "[0, -0.040153749999999946, -0.07412999999999989]\n", 199 | "\n", 200 | "Iteration:3\n", 201 | "Pred:0.97274178125\n", 202 | "Error:0.000743010489422852\n", 203 | "Delta:-0.027258218750000007\n", 204 | "Weights:[0.1, 0.239346125, -0.02736100000000008]\n", 205 | "Weight_Deltas:\n", 206 | "[0, -0.017717842187500006, -0.032709862500000006]\n", 207 | "\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "def neural_network(input, weights):\n", 213 | " out = 0\n", 214 | " for i in range(len(input)):\n", 215 | " out += (input[i] * weights[i])\n", 216 | " return out\n", 217 | "\n", 218 | "def ele_mul(scalar, vector):\n", 219 | " out = [0,0,0]\n", 220 | " for i in range(len(out)):\n", 221 | " out[i] = vector[i] * scalar\n", 222 | " return out\n", 223 | "\n", 224 | "toes = [8.5, 9.5, 9.9, 9.0]\n", 225 | "wlrec = [0.65, 0.8, 0.8, 0.9]\n", 226 | "nfans = [1.2, 1.3, 0.5, 1.0]\n", 227 | "\n", 228 | "win_or_lose_binary = [1, 1, 0, 1]\n", 229 | "true = win_or_lose_binary[0]\n", 230 | "\n", 231 | "alpha = 0.3\n", 232 | "weights = [0.1, 0.2, -.1]\n", 233 | "input = [toes[0],wlrec[0],nfans[0]]\n", 234 | "\n", 235 | "for iter in range(3):\n", 236 | "\n", 237 | " pred = neural_network(input,weights)\n", 238 | "\n", 239 | " error = (pred - true) ** 2\n", 240 | " delta = pred - true\n", 241 | "\n", 242 | " weight_deltas=ele_mul(delta,input)\n", 243 | " weight_deltas[0] = 0\n", 244 | "\n", 245 | " print(\"Iteration:\" + str(iter+1))\n", 246 | " print(\"Pred:\" + str(pred))\n", 247 | " print(\"Error:\" + str(error))\n", 248 | " print(\"Delta:\" + str(delta))\n", 249 | " print(\"Weights:\" + str(weights))\n", 250 | " print(\"Weight_Deltas:\")\n", 251 | " print(str(weight_deltas))\n", 252 | " print(\n", 253 | " )\n", 254 | "\n", 255 | " for i in range(len(weights)):\n", 256 | " weights[i]-=alpha*weight_deltas[i]" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "# Gradient Descent Learning with Multiple Outputs" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 174, 269 | "metadata": {}, 270 | "outputs": [ 271 | { 272 | "name": "stdout", 273 | "output_type": "stream", 274 | "text": [ 275 | "Weights:[0.293825, 0.25655, 0.868475]\n", 276 | "Weight Deltas:[0.061750000000000006, -0.5655, 0.3152500000000001]\n" 277 | ] 278 | } 279 | ], 280 | "source": [ 281 | "# Instead of predicting just \n", 282 | "# whether the team won or lost, \n", 283 | "# now we're also predicting whether\n", 284 | "# they are happy/sad AND the\n", 285 | "# percentage of the team that is\n", 286 | "# hurt. We are making this\n", 287 | "# prediction using only\n", 288 | "# the current win/loss record.\n", 289 | "\n", 290 | "weights = [0.3, 0.2, 0.9] \n", 291 | "\n", 292 | "def neural_network(input, weights):\n", 293 | " pred = ele_mul(input,weights)\n", 294 | " return pred\n", 295 | "\n", 296 | "wlrec = [0.65, 1.0, 1.0, 0.9]\n", 297 | "\n", 298 | "hurt = [0.1, 0.0, 0.0, 0.1]\n", 299 | "win = [ 1, 1, 0, 1]\n", 300 | "sad = [0.1, 0.0, 0.1, 0.2]\n", 301 | "\n", 302 | "input = wlrec[0]\n", 303 | "true = [hurt[0], win[0], sad[0]]\n", 304 | "\n", 305 | "pred = neural_network(input,weights)\n", 306 | "\n", 307 | "error = [0, 0, 0] \n", 308 | "delta = [0, 0, 0]\n", 309 | "\n", 310 | "for i in range(len(true)):\n", 311 | " error[i] = (pred[i] - true[i]) ** 2\n", 312 | " delta[i] = pred[i] - true[i]\n", 313 | " \n", 314 | "def scalar_ele_mul(number,vector):\n", 315 | " output = [0,0,0]\n", 316 | "\n", 317 | " assert(len(output) == len(vector))\n", 318 | "\n", 319 | " for i in range(len(vector)):\n", 320 | " output[i] = number * vector[i]\n", 321 | "\n", 322 | " return output\n", 323 | "\n", 324 | "weight_deltas = scalar_ele_mul(input,delta)\n", 325 | "\n", 326 | "alpha = 0.1\n", 327 | "\n", 328 | "for i in range(len(weights)):\n", 329 | " weights[i] -= (weight_deltas[i] * alpha)\n", 330 | " \n", 331 | "print(\"Weights:\" + str(weights))\n", 332 | "print(\"Weight Deltas:\" + str(weight_deltas))" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "# Gradient Descent with Multiple Inputs & Outputs" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": 7, 345 | "metadata": {}, 346 | "outputs": [], 347 | "source": [ 348 | " #toes %win #fans\n", 349 | "weights = [ [0.1, 0.1, -0.3],#hurt?\n", 350 | " [0.1, 0.2, 0.0], #win?\n", 351 | " [0.0, 1.3, 0.1] ]#sad?\n", 352 | "\n", 353 | "def w_sum(a,b):\n", 354 | " assert(len(a) == len(b))\n", 355 | " output = 0\n", 356 | "\n", 357 | " for i in range(len(a)):\n", 358 | " output += (a[i] * b[i])\n", 359 | "\n", 360 | " return output\n", 361 | "\n", 362 | "def vect_mat_mul(vect,matrix):\n", 363 | " assert(len(vect) == len(matrix))\n", 364 | " output = [0,0,0]\n", 365 | " for i in range(len(vect)):\n", 366 | " output[i] = w_sum(vect,matrix[i])\n", 367 | " return output\n", 368 | "\n", 369 | "def neural_network(input, weights):\n", 370 | " pred = vect_mat_mul(input,weights)\n", 371 | " return pred\n", 372 | "\n", 373 | "toes = [8.5, 9.5, 9.9, 9.0]\n", 374 | "wlrec = [0.65,0.8, 0.8, 0.9]\n", 375 | "nfans = [1.2, 1.3, 0.5, 1.0]\n", 376 | "\n", 377 | "hurt = [0.1, 0.0, 0.0, 0.1]\n", 378 | "win = [ 1, 1, 0, 1]\n", 379 | "sad = [0.1, 0.0, 0.1, 0.2]\n", 380 | "\n", 381 | "alpha = 0.01\n", 382 | "\n", 383 | "input = [toes[0],wlrec[0],nfans[0]]\n", 384 | "true = [hurt[0], win[0], sad[0]]\n", 385 | "\n", 386 | "pred = neural_network(input,weights)\n", 387 | "\n", 388 | "error = [0, 0, 0] \n", 389 | "delta = [0, 0, 0]\n", 390 | "\n", 391 | "for i in range(len(true)):\n", 392 | " error[i] = (pred[i] - true[i]) ** 2\n", 393 | " delta[i] = pred[i] - true[i]" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 10, 399 | "metadata": {}, 400 | "outputs": [], 401 | "source": [ 402 | "import numpy as np\n", 403 | "def outer_prod(a, b):\n", 404 | " \n", 405 | " # just a matrix of zeros\n", 406 | " out = np.zeros((len(a), len(b)))\n", 407 | "\n", 408 | " for i in range(len(a)):\n", 409 | " for j in range(len(b)):\n", 410 | " out[i][j] = a[i] * b[j]\n", 411 | " return out\n", 412 | "\n", 413 | "weight_deltas = outer_prod(delta,input)\n", 414 | "\n", 415 | "for i in range(len(weights)):\n", 416 | " for j in range(len(weights[0])):\n", 417 | " weights[i][j] -= alpha * weight_deltas[i][j]" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 16, 423 | "metadata": {}, 424 | "outputs": [ 425 | { 426 | "data": { 427 | "text/plain": [ 428 | "[[0.061324999999999998, 0.097042500000000004, -0.30546000000000001],\n", 429 | " [0.1017, 0.20013, 0.00023999999999999887],\n", 430 | " [-0.073525000000000007, 1.2943775, 0.089620000000000005]]" 431 | ] 432 | }, 433 | "execution_count": 16, 434 | "metadata": {}, 435 | "output_type": "execute_result" 436 | } 437 | ], 438 | "source": [ 439 | "weights" 440 | ] 441 | } 442 | ], 443 | "metadata": { 444 | "kernelspec": { 445 | "display_name": "Python 3", 446 | "language": "python", 447 | "name": "python3" 448 | }, 449 | "language_info": { 450 | "codemirror_mode": { 451 | "name": "ipython", 452 | "version": 3 453 | }, 454 | "file_extension": ".py", 455 | "mimetype": "text/x-python", 456 | "name": "python", 457 | "nbconvert_exporter": "python", 458 | "pygments_lexer": "ipython3", 459 | "version": "3.6.1" 460 | } 461 | }, 462 | "nbformat": 4, 463 | "nbformat_minor": 2 464 | } 465 | -------------------------------------------------------------------------------- /Chapter6 - Intro to Backpropagation - Building Your First DEEP Neural Network.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Creating a Matrix or Two in Python" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 10, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "Error:0.03999999999999998 Prediction:-0.19999999999999996\n", 20 | "Error:0.025599999999999973 Prediction:-0.15999999999999992\n", 21 | "Error:0.01638399999999997 Prediction:-0.1279999999999999\n", 22 | "Error:0.010485759999999964 Prediction:-0.10239999999999982\n", 23 | "Error:0.006710886399999962 Prediction:-0.08191999999999977\n", 24 | "Error:0.004294967295999976 Prediction:-0.06553599999999982\n", 25 | "Error:0.002748779069439994 Prediction:-0.05242879999999994\n", 26 | "Error:0.0017592186044416036 Prediction:-0.04194304000000004\n", 27 | "Error:0.0011258999068426293 Prediction:-0.03355443200000008\n", 28 | "Error:0.0007205759403792803 Prediction:-0.02684354560000002\n", 29 | "Error:0.0004611686018427356 Prediction:-0.021474836479999926\n", 30 | "Error:0.0002951479051793508 Prediction:-0.01717986918399994\n", 31 | "Error:0.00018889465931478573 Prediction:-0.013743895347199997\n", 32 | "Error:0.00012089258196146188 Prediction:-0.010995116277759953\n", 33 | "Error:7.737125245533561e-05 Prediction:-0.008796093022207963\n", 34 | "Error:4.951760157141604e-05 Prediction:-0.007036874417766459\n", 35 | "Error:3.169126500570676e-05 Prediction:-0.0056294995342132115\n", 36 | "Error:2.028240960365233e-05 Prediction:-0.004503599627370569\n", 37 | "Error:1.298074214633813e-05 Prediction:-0.003602879701896544\n", 38 | "Error:8.307674973656916e-06 Prediction:-0.002882303761517324\n" 39 | ] 40 | } 41 | ], 42 | "source": [ 43 | "import numpy as np\n", 44 | "weights = np.array([0.5,0.48,-0.7])\n", 45 | "alpha = 0.1\n", 46 | "\n", 47 | "streetlights = np.array( [ [ 1, 0, 1 ],\n", 48 | " [ 0, 1, 1 ],\n", 49 | " [ 0, 0, 1 ],\n", 50 | " [ 1, 1, 1 ],\n", 51 | " [ 0, 1, 1 ],\n", 52 | " [ 1, 0, 1 ] ] )\n", 53 | "\n", 54 | "walk_vs_stop = np.array( [ 0, 1, 0, 1, 1, 0 ] )\n", 55 | "\n", 56 | "input = streetlights[0] # [1,0,1]\n", 57 | "goal_prediction = walk_vs_stop[0] # equals 0... i.e. \"stop\"\n", 58 | "\n", 59 | "for iteration in range(20):\n", 60 | " prediction = input.dot(weights)\n", 61 | " error = (goal_prediction - prediction) ** 2\n", 62 | " delta = prediction - goal_prediction\n", 63 | " weights = weights - (alpha * (input * delta))\t\n", 64 | "\n", 65 | " print(\"Error:\" + str(error) + \" Prediction:\" + str(prediction))" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "## Building Our Neural Network" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 12, 78 | "metadata": {}, 79 | "outputs": [ 80 | { 81 | "name": "stdout", 82 | "output_type": "stream", 83 | "text": [ 84 | "[0 2 4 3]\n", 85 | "[2 3 4 4]\n", 86 | "[0. 0.5 1. 0.5]\n", 87 | "[0.5 1.5 2.5 1.5]\n" 88 | ] 89 | } 90 | ], 91 | "source": [ 92 | "import numpy as np\n", 93 | "\n", 94 | "a = np.array([0,1,2,1])\n", 95 | "b = np.array([2,2,2,3])\n", 96 | "\n", 97 | "print(a*b) #elementwise multiplication\n", 98 | "print(a+b) #elementwise addition\n", 99 | "print(a * 0.5) # vector-scalar multiplication\n", 100 | "print(a + 0.5) # vector-scalar addition" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "# Learning the whole dataset!" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 13, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "name": "stdout", 117 | "output_type": "stream", 118 | "text": [ 119 | "Prediction:-0.19999999999999996\n", 120 | "Prediction:-0.19999999999999996\n", 121 | "Prediction:-0.5599999999999999\n", 122 | "Prediction:0.6160000000000001\n", 123 | "Prediction:0.17279999999999995\n", 124 | "Prediction:0.17552\n", 125 | "Error:2.6561231104\n", 126 | "\n", 127 | "Prediction:0.14041599999999999\n", 128 | "Prediction:0.3066464\n", 129 | "Prediction:-0.34513824\n", 130 | "Prediction:1.006637344\n", 131 | "Prediction:0.4785034751999999\n", 132 | "Prediction:0.26700416768\n", 133 | "Error:0.9628701776715985\n", 134 | "\n", 135 | "Prediction:0.213603334144\n", 136 | "Prediction:0.5347420299776\n", 137 | "Prediction:-0.26067345110016\n", 138 | "Prediction:1.1319428845096962\n", 139 | "Prediction:0.6274723921901568\n", 140 | "Prediction:0.25433999330650114\n", 141 | "Error:0.5509165866836797\n", 142 | "\n", 143 | "Prediction:0.20347199464520088\n", 144 | "Prediction:0.6561967149569552\n", 145 | "Prediction:-0.221948503950995\n", 146 | "Prediction:1.166258650532124\n", 147 | "Prediction:0.7139004922542389\n", 148 | "Prediction:0.21471099528371604\n", 149 | "Error:0.36445836852222424\n", 150 | "\n", 151 | "Prediction:0.17176879622697283\n", 152 | "Prediction:0.7324724146523222\n", 153 | "Prediction:-0.19966478845083285\n", 154 | "Prediction:1.1697769945341199\n", 155 | "Prediction:0.7719890116601171\n", 156 | "Prediction:0.17297997428859369\n", 157 | "Error:0.2516768662079895\n", 158 | "\n", 159 | "Prediction:0.13838397943087496\n", 160 | "Prediction:0.7864548139561468\n", 161 | "Prediction:-0.1836567869927348\n", 162 | "Prediction:1.163248019006011\n", 163 | "Prediction:0.8148799260629888\n", 164 | "Prediction:0.1362897844408577\n", 165 | "Error:0.17797575048089034\n", 166 | "\n", 167 | "Prediction:0.10903182755268614\n", 168 | "Prediction:0.8273717796510367\n", 169 | "Prediction:-0.17037324196481937\n", 170 | "Prediction:1.1537962739591756\n", 171 | "Prediction:0.8481754931254761\n", 172 | "Prediction:0.1059488041691444\n", 173 | "Error:0.12864460733422164\n", 174 | "\n", 175 | "Prediction:0.0847590433353155\n", 176 | "Prediction:0.859469609749935\n", 177 | "Prediction:-0.1585508402022421\n", 178 | "Prediction:1.1438418857156731\n", 179 | "Prediction:0.8746623946770374\n", 180 | "Prediction:0.08148074110264475\n", 181 | "Error:0.09511036950476208\n", 182 | "\n", 183 | "Prediction:0.06518459288211581\n", 184 | "Prediction:0.8850633823431538\n", 185 | "Prediction:-0.14771905585408038\n", 186 | "Prediction:1.1341830033853888\n", 187 | "Prediction:0.8959860107828534\n", 188 | "Prediction:0.0619780399014222\n", 189 | "Error:0.07194564247043436\n", 190 | "\n", 191 | "Prediction:0.04958243192113776\n", 192 | "Prediction:0.9056327614440267\n", 193 | "Prediction:-0.13768337501215525\n", 194 | "Prediction:1.1250605910610996\n", 195 | "Prediction:0.9132624284442169\n", 196 | "Prediction:0.04653264583708144\n", 197 | "Error:0.05564914990717743\n", 198 | "\n", 199 | "Prediction:0.03722611666966513\n", 200 | "Prediction:0.922234066504699\n", 201 | "Prediction:-0.12834662236261596\n", 202 | "Prediction:1.116526024487899\n", 203 | "Prediction:0.9273167105424409\n", 204 | "Prediction:0.03435527296969987\n", 205 | "Error:0.04394763937673939\n", 206 | "\n", 207 | "Prediction:0.027484218375759886\n", 208 | "Prediction:0.9356694192994068\n", 209 | "Prediction:-0.11964712469387503\n", 210 | "Prediction:1.1085678053734553\n", 211 | "Prediction:0.9387866868342218\n", 212 | "Prediction:0.024792915481941458\n", 213 | "Error:0.035357967050948465\n", 214 | "\n", 215 | "Prediction:0.019834332385553155\n", 216 | "Prediction:0.946566624680628\n", 217 | "Prediction:-0.11153724870006754\n", 218 | "Prediction:1.1011550767549563\n", 219 | "Prediction:0.948176009263518\n", 220 | "Prediction:0.017315912033043404\n", 221 | "Error:0.02890700056547436\n", 222 | "\n", 223 | "Prediction:0.013852729626434732\n", 224 | "Prediction:0.9554239432448665\n", 225 | "Prediction:-0.10397589092234266\n", 226 | "Prediction:1.0942524239871314\n", 227 | "Prediction:0.9558862588907013\n", 228 | "Prediction:0.011498267782398985\n", 229 | "Error:0.023951660591138853\n", 230 | "\n", 231 | "Prediction:0.009198614225919194\n", 232 | "Prediction:0.9626393189117293\n", 233 | "Prediction:-0.09692579020989642\n", 234 | "Prediction:1.087824783849832\n", 235 | "Prediction:0.9622390773804066\n", 236 | "Prediction:0.006998674002545002\n", 237 | "Error:0.020063105176016144\n", 238 | "\n", 239 | "Prediction:0.005598939202035996\n", 240 | "Prediction:0.9685315005838672\n", 241 | "Prediction:-0.09035250869077546\n", 242 | "Prediction:1.0818389613301889\n", 243 | "Prediction:0.9674926590701334\n", 244 | "Prediction:0.003544193999268516\n", 245 | "Error:0.016952094519447087\n", 246 | "\n", 247 | "Prediction:0.0028353551994148157\n", 248 | "Prediction:0.9733561723362383\n", 249 | "Prediction:-0.0842239920152223\n", 250 | "Prediction:1.0762639960116431\n", 251 | "Prediction:0.9718545378681842\n", 252 | "Prediction:0.0009168131382832068\n", 253 | "Error:0.014420818295271236\n", 254 | "\n", 255 | "Prediction:0.0007334505106265654\n", 256 | "Prediction:0.9773186039296565\n", 257 | "Prediction:-0.07851033295953944\n", 258 | "Prediction:1.0710711494147542\n", 259 | "Prediction:0.9754916865567282\n", 260 | "Prediction:-0.0010574652271341245\n", 261 | "Error:0.012331739998443648\n", 262 | "\n", 263 | "Prediction:-0.0008459721817072885\n", 264 | "Prediction:0.9805836929862668\n", 265 | "Prediction:-0.07318360881847627\n", 266 | "Prediction:1.066233777045345\n", 267 | "Prediction:0.9785385598617921\n", 268 | "Prediction:-0.0025173975573930946\n", 269 | "Error:0.010587393171639842\n", 270 | "\n", 271 | "Prediction:-0.002013918045914484\n", 272 | "Prediction:0.9832839794497644\n", 273 | "Prediction:-0.06821774801198803\n", 274 | "Prediction:1.0617271739912904\n", 275 | "Prediction:0.9811035235627523\n", 276 | "Prediction:-0.0035735447350425317\n", 277 | "Error:0.009117233405426495\n", 278 | "\n", 279 | "Prediction:-0.002858835788034024\n", 280 | "Prediction:0.9855260569025094\n", 281 | "Prediction:-0.06358841060413677\n", 282 | "Prediction:1.05752842286588\n", 283 | "Prediction:0.9832740020092452\n", 284 | "Prediction:-0.004313918034364962\n", 285 | "Error:0.00786904226904208\n", 286 | "\n", 287 | "Prediction:-0.003451134427491974\n", 288 | "Prediction:0.9873957068535818\n", 289 | "Prediction:-0.059272877470408075\n", 290 | "Prediction:1.0536162524729626\n", 291 | "Prediction:0.9851206027353137\n", 292 | "Prediction:-0.004808501248434842\n", 293 | "Error:0.006803273214640502\n", 294 | "\n", 295 | "Prediction:-0.0038468009987478735\n", 296 | "Prediction:0.9889620124129692\n", 297 | "Prediction:-0.05524994626077355\n", 298 | "Prediction:1.049970908776931\n", 299 | "Prediction:0.9867004228010665\n", 300 | "Prediction:-0.005112871449710697\n", 301 | "Error:0.005889303541837786\n", 302 | "\n", 303 | "Prediction:-0.004090297159768559\n", 304 | "Prediction:0.9902806551018011\n", 305 | "Prediction:-0.051499833441728114\n", 306 | "Prediction:1.0465740376293469\n", 307 | "Prediction:0.9880596998997442\n", 308 | "Prediction:-0.0052710974096659285\n", 309 | "Error:0.0051029252561172675\n", 310 | "\n", 311 | "Prediction:-0.004216877927732746\n", 312 | "Prediction:0.9913965574535352\n", 313 | "Prediction:-0.048004082062078055\n", 314 | "Prediction:1.043408578143574\n", 315 | "Prediction:0.9892359385403211\n", 316 | "Prediction:-0.005318059364078823\n", 317 | "Error:0.004424644608684828\n", 318 | "\n", 319 | "Prediction:-0.0042544474912630525\n", 320 | "Prediction:0.992346001517791\n", 321 | "Prediction:-0.044745474990504665\n", 322 | "Prediction:1.0404586655589985\n", 323 | "Prediction:0.9902596156014837\n", 324 | "Prediction:-0.005281305317687134\n", 325 | "Error:0.0038385124412518303\n", 326 | "\n", 327 | "Prediction:-0.0042250442541497055\n", 328 | "Prediction:0.9931583274383705\n", 329 | "Prediction:-0.041707953394155776\n", 330 | "Prediction:1.0377095425371112\n", 331 | "Prediction:0.9911555487826897\n", 332 | "Prediction:-0.005182536193432452\n", 333 | "Error:0.0033313054558089675\n", 334 | "\n", 335 | "Prediction:-0.004146028954745959\n", 336 | "Prediction:0.9938572955409696\n", 337 | "Prediction:-0.03887654022599941\n", 338 | "Prediction:1.0351474779634813\n", 339 | "Prediction:0.9919439948626794\n", 340 | "Prediction:-0.00503879377425797\n", 341 | "Error:0.0028919416227737734\n", 342 | "\n", 343 | "Prediction:-0.004031035019406375\n", 344 | "Prediction:0.9944621787695098\n", 345 | "Prediction:-0.03623726848360008\n", 346 | "Prediction:1.032759692455092\n", 347 | "Prediction:0.9926415313729495\n", 348 | "Prediction:-0.004863410672429416\n", 349 | "Error:0.002511053608117256\n", 350 | "\n", 351 | "Prediction:-0.003890728537943533\n", 352 | "Prediction:0.9949886390193969\n", 353 | "Prediction:-0.03377711399894662\n", 354 | "Prediction:1.0305342898820642\n", 355 | "Prediction:0.9932617646389992\n", 356 | "Prediction:-0.004666769772712614\n", 357 | "Error:0.0021806703520253884\n", 358 | "\n", 359 | "Prediction:-0.003733415818170091\n", 360 | "Prediction:0.9954494302702878\n", 361 | "Prediction:-0.03148393251909879\n", 362 | "Prediction:1.0284601943056741\n", 363 | "Prediction:0.9938158986070053\n", 364 | "Prediction:-0.004456911151490314\n", 365 | "Error:0.0018939739123713475\n", 366 | "\n", 367 | "Prediction:-0.003565528921192253\n", 368 | "Prediction:0.9958549628928723\n", 369 | "Prediction:-0.029346400840475826\n", 370 | "Prediction:1.0265270918125804\n", 371 | "Prediction:0.9943131920358295\n", 372 | "Prediction:-0.004240016908292479\n", 373 | "Error:0.0016451096996342332\n", 374 | "\n", 375 | "Prediction:-0.0033920135266339822\n", 376 | "Prediction:0.9962137566721563\n", 377 | "Prediction:-0.02735396176499221\n", 378 | "Prediction:1.0247253767906936\n", 379 | "Prediction:0.9947613261560856\n", 380 | "Prediction:-0.004020798285770878\n", 381 | "Error:0.0014290353984827077\n", 382 | "\n", 383 | "Prediction:-0.003216638628616701\n", 384 | "Prediction:0.9965328046163073\n", 385 | "Prediction:-0.025496772653362886\n", 386 | "Prediction:1.0230461022472208\n", 387 | "Prediction:0.9951667005089379\n", 388 | "Prediction:-0.0038028045995257484\n", 389 | "Error:0.0012413985592149145\n", 390 | "\n", 391 | "Prediction:-0.003042243679620596\n", 392 | "Prediction:0.996817865235065\n", 393 | "Prediction:-0.023765657359234325\n", 394 | "Prediction:1.0214809338160067\n", 395 | "Prediction:0.995534671160774\n", 396 | "Prediction:-0.0035886696105582767\n", 397 | "Error:0.0010784359268087556\n", 398 | "\n", 399 | "Prediction:-0.0028709356884466207\n", 400 | "Prediction:0.9970736974585198\n", 401 | "Prediction:-0.022152061336940452\n", 402 | "Prediction:1.0200221071408409\n", 403 | "Prediction:0.9958697426723416\n", 404 | "Prediction:-0.0033803078583175654\n", 405 | "Error:0.0009368896209360312\n", 406 | "\n", 407 | "Prediction:-0.0027042462866540516\n", 408 | "Prediction:0.9973042495523706\n", 409 | "Prediction:-0.02064800972530455\n", 410 | "Prediction:1.018662388355171\n", 411 | "Prediction:0.9961757229433927\n", 412 | "Prediction:-0.0031790709774033414\n", 413 | "Error:0.0008139366504753339\n", 414 | "\n", 415 | "Prediction:-0.002543256781922673\n", 416 | "Prediction:0.9975128111306469\n", 417 | "Prediction:-0.019246068219762574\n", 418 | "Prediction:1.0173950374076535\n", 419 | "Prediction:0.9964558482449631\n", 420 | "Prediction:-0.0029858720226535913\n", 421 | "Error:0.0007071291752624441\n", 422 | "\n", 423 | "Prediction:-0.002388697618122871\n", 424 | "Prediction:0.9977021355600483\n", 425 | "Prediction:-0.01793930655497516\n", 426 | "Prediction:1.0162137740080082\n", 427 | "Prediction:0.9967128843019345\n", 428 | "Prediction:-0.0028012842268006904\n", 429 | "Error:0.0006143435674831474\n", 430 | "\n", 431 | "Prediction:-0.0022410273814405524\n", 432 | "Prediction:0.9978745386023716\n", 433 | "Prediction:-0.016721264429884947\n", 434 | "Prediction:1.0151127459893812\n", 435 | "Prediction:0.9969492081270097\n", 436 | "Prediction:-0.0026256193329783125\n", 437 | "Error:0.00053373677328488\n", 438 | "\n" 439 | ] 440 | } 441 | ], 442 | "source": [ 443 | "import numpy as np\n", 444 | "\n", 445 | "weights = np.array([0.5,0.48,-0.7])\n", 446 | "alpha = 0.1\n", 447 | "\n", 448 | "streetlights = np.array( [[ 1, 0, 1 ],\n", 449 | " [ 0, 1, 1 ],\n", 450 | " [ 0, 0, 1 ],\n", 451 | " [ 1, 1, 1 ],\n", 452 | " [ 0, 1, 1 ],\n", 453 | " [ 1, 0, 1 ] ] )\n", 454 | "\n", 455 | "walk_vs_stop = np.array( [ 0, 1, 0, 1, 1, 0 ] )\n", 456 | "\n", 457 | "input = streetlights[0] # [1,0,1]\n", 458 | "goal_prediction = walk_vs_stop[0] # equals 0... i.e. \"stop\"\n", 459 | "\n", 460 | "for iteration in range(40):\n", 461 | " error_for_all_lights = 0\n", 462 | " for row_index in range(len(walk_vs_stop)):\n", 463 | " input = streetlights[row_index]\n", 464 | " goal_prediction = walk_vs_stop[row_index]\n", 465 | " \n", 466 | " prediction = input.dot(weights)\n", 467 | " \n", 468 | " error = (goal_prediction - prediction) ** 2\n", 469 | " error_for_all_lights += error\n", 470 | " \n", 471 | " delta = prediction - goal_prediction\n", 472 | " weights = weights - (alpha * (input * delta))\t\n", 473 | " print(\"Prediction:\" + str(prediction))\n", 474 | " print(\"Error:\" + str(error_for_all_lights) + \"\\n\")" 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "metadata": {}, 480 | "source": [ 481 | "# Our First \"Deep\" Neural Network" 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": 15, 487 | "metadata": {}, 488 | "outputs": [], 489 | "source": [ 490 | "import numpy as np\n", 491 | "\n", 492 | "np.random.seed(1)\n", 493 | "\n", 494 | "def relu(x):\n", 495 | " return (x > 0) * x \n", 496 | "\n", 497 | "alpha = 0.2\n", 498 | "hidden_size = 4\n", 499 | "\n", 500 | "streetlights = np.array( [[ 1, 0, 1 ],\n", 501 | " [ 0, 1, 1 ],\n", 502 | " [ 0, 0, 1 ],\n", 503 | " [ 1, 1, 1 ] ] )\n", 504 | "\n", 505 | "walk_vs_stop = np.array([[ 1, 1, 0, 0]]).T\n", 506 | "\n", 507 | "weights_0_1 = 2*np.random.random((3,hidden_size)) - 1\n", 508 | "weights_1_2 = 2*np.random.random((hidden_size,1)) - 1\n", 509 | "\n", 510 | "layer_0 = streetlights[0]\n", 511 | "layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 512 | "layer_2 = np.dot(layer_1,weights_1_2)" 513 | ] 514 | }, 515 | { 516 | "cell_type": "markdown", 517 | "metadata": {}, 518 | "source": [ 519 | "# Backpropagation in Code" 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": 18, 525 | "metadata": {}, 526 | "outputs": [ 527 | { 528 | "name": "stdout", 529 | "output_type": "stream", 530 | "text": [ 531 | "Error:0.6342311598444467\n", 532 | "Error:0.35838407676317513\n", 533 | "Error:0.0830183113303298\n", 534 | "Error:0.006467054957103705\n", 535 | "Error:0.0003292669000750734\n", 536 | "Error:1.5055622665134859e-05\n" 537 | ] 538 | } 539 | ], 540 | "source": [ 541 | "import numpy as np\n", 542 | "\n", 543 | "np.random.seed(1)\n", 544 | "\n", 545 | "def relu(x):\n", 546 | " return (x > 0) * x # returns x if x > 0\n", 547 | " # return 0 otherwise\n", 548 | "\n", 549 | "def relu2deriv(output):\n", 550 | " return output>0 # returns 1 for input > 0\n", 551 | " # return 0 otherwise\n", 552 | "alpha = 0.2\n", 553 | "hidden_size = 4\n", 554 | "\n", 555 | "weights_0_1 = 2*np.random.random((3,hidden_size)) - 1\n", 556 | "weights_1_2 = 2*np.random.random((hidden_size,1)) - 1\n", 557 | "\n", 558 | "for iteration in range(60):\n", 559 | " layer_2_error = 0\n", 560 | " for i in range(len(streetlights)):\n", 561 | " layer_0 = streetlights[i:i+1]\n", 562 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 563 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 564 | "\n", 565 | " layer_2_error += np.sum((layer_2 - walk_vs_stop[i:i+1]) ** 2)\n", 566 | "\n", 567 | " layer_2_delta = (walk_vs_stop[i:i+1] - layer_2)\n", 568 | " layer_1_delta=layer_2_delta.dot(weights_1_2.T)*relu2deriv(layer_1)\n", 569 | "\n", 570 | " weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n", 571 | " weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)\n", 572 | "\n", 573 | " if(iteration % 10 == 9):\n", 574 | " print(\"Error:\" + str(layer_2_error))" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "# One Iteration of Backpropagation" 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": 22, 587 | "metadata": {}, 588 | "outputs": [], 589 | "source": [ 590 | "import numpy as np\n", 591 | "\n", 592 | "np.random.seed(1)\n", 593 | "\n", 594 | "def relu(x):\n", 595 | " return (x > 0) * x \n", 596 | "\n", 597 | "def relu2deriv(output):\n", 598 | " return output>0 \n", 599 | "\n", 600 | "lights = np.array( [[ 1, 0, 1 ],\n", 601 | " [ 0, 1, 1 ],\n", 602 | " [ 0, 0, 1 ],\n", 603 | " [ 1, 1, 1 ] ] )\n", 604 | "\n", 605 | "walk_stop = np.array([[ 1, 1, 0, 0]]).T\n", 606 | "\n", 607 | "alpha = 0.2\n", 608 | "hidden_size = 3\n", 609 | "\n", 610 | "weights_0_1 = 2*np.random.random((3,hidden_size)) - 1\n", 611 | "weights_1_2 = 2*np.random.random((hidden_size,1)) - 1\n", 612 | "\n", 613 | "layer_0 = lights[0:1]\n", 614 | "layer_1 = np.dot(layer_0,weights_0_1)\n", 615 | "layer_1 = relu(layer_1)\n", 616 | "layer_2 = np.dot(layer_1,weights_1_2)\n", 617 | "\n", 618 | "error = (layer_2-walk_stop[0:1])**2\n", 619 | "\n", 620 | "layer_2_delta=(layer_2-walk_stop[0:1])\n", 621 | "\n", 622 | "layer_1_delta=layer_2_delta.dot(weights_1_2.T)\n", 623 | "layer_1_delta *= relu2deriv(layer_1)\n", 624 | "\n", 625 | "weight_delta_1_2 = layer_1.T.dot(layer_2_delta)\n", 626 | "weight_delta_0_1 = layer_0.T.dot(layer_1_delta)\n", 627 | "\n", 628 | "weights_1_2 -= alpha * weight_delta_1_2\n", 629 | "weights_0_1 -= alpha * weight_delta_0_1" 630 | ] 631 | }, 632 | { 633 | "cell_type": "markdown", 634 | "metadata": {}, 635 | "source": [ 636 | "## Putting it all Together" 637 | ] 638 | }, 639 | { 640 | "cell_type": "code", 641 | "execution_count": 23, 642 | "metadata": {}, 643 | "outputs": [ 644 | { 645 | "name": "stdout", 646 | "output_type": "stream", 647 | "text": [ 648 | "Error:0.6342311598444467\n", 649 | "Error:0.35838407676317513\n", 650 | "Error:0.0830183113303298\n", 651 | "Error:0.006467054957103705\n", 652 | "Error:0.0003292669000750734\n", 653 | "Error:1.5055622665134859e-05\n" 654 | ] 655 | } 656 | ], 657 | "source": [ 658 | "import numpy as np\n", 659 | "\n", 660 | "np.random.seed(1)\n", 661 | "\n", 662 | "def relu(x):\n", 663 | " return (x > 0) * x # returns x if x > 0\n", 664 | " # return 0 otherwise\n", 665 | "\n", 666 | "def relu2deriv(output):\n", 667 | " return output>0 # returns 1 for input > 0\n", 668 | " # return 0 otherwise\n", 669 | "\n", 670 | "streetlights = np.array( [[ 1, 0, 1 ],\n", 671 | " [ 0, 1, 1 ],\n", 672 | " [ 0, 0, 1 ],\n", 673 | " [ 1, 1, 1 ] ] )\n", 674 | "\n", 675 | "walk_vs_stop = np.array([[ 1, 1, 0, 0]]).T\n", 676 | " \n", 677 | "alpha = 0.2\n", 678 | "hidden_size = 4\n", 679 | "\n", 680 | "weights_0_1 = 2*np.random.random((3,hidden_size)) - 1\n", 681 | "weights_1_2 = 2*np.random.random((hidden_size,1)) - 1\n", 682 | "\n", 683 | "for iteration in range(60):\n", 684 | " layer_2_error = 0\n", 685 | " for i in range(len(streetlights)):\n", 686 | " layer_0 = streetlights[i:i+1]\n", 687 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 688 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 689 | "\n", 690 | " layer_2_error += np.sum((layer_2 - walk_vs_stop[i:i+1]) ** 2)\n", 691 | "\n", 692 | " layer_2_delta = (layer_2 - walk_vs_stop[i:i+1])\n", 693 | " layer_1_delta=layer_2_delta.dot(weights_1_2.T)*relu2deriv(layer_1)\n", 694 | "\n", 695 | " weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)\n", 696 | " weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)\n", 697 | "\n", 698 | " if(iteration % 10 == 9):\n", 699 | " print(\"Error:\" + str(layer_2_error))" 700 | ] 701 | }, 702 | { 703 | "cell_type": "code", 704 | "execution_count": null, 705 | "metadata": {}, 706 | "outputs": [], 707 | "source": [] 708 | } 709 | ], 710 | "metadata": { 711 | "kernelspec": { 712 | "display_name": "Python 3", 713 | "language": "python", 714 | "name": "python3" 715 | }, 716 | "language_info": { 717 | "codemirror_mode": { 718 | "name": "ipython", 719 | "version": 3 720 | }, 721 | "file_extension": ".py", 722 | "mimetype": "text/x-python", 723 | "name": "python", 724 | "nbconvert_exporter": "python", 725 | "pygments_lexer": "ipython3", 726 | "version": "3.6.1" 727 | } 728 | }, 729 | "nbformat": 4, 730 | "nbformat_minor": 2 731 | } 732 | -------------------------------------------------------------------------------- /Chapter8 - Intro to Regularization - Learning Signal and Ignoring Noise.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# 3 Layer Network on MNIST" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 29, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | " I:349 Train-Err:0.108 Train-Acc:1.0" 20 | ] 21 | } 22 | ], 23 | "source": [ 24 | "import sys, numpy as np\n", 25 | "from keras.datasets import mnist\n", 26 | "\n", 27 | "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", 28 | "\n", 29 | "images, labels = (x_train[0:1000].reshape(1000,28*28) / 255, y_train[0:1000])\n", 30 | "\n", 31 | "one_hot_labels = np.zeros((len(labels),10))\n", 32 | "for i,l in enumerate(labels):\n", 33 | " one_hot_labels[i][l] = 1\n", 34 | "labels = one_hot_labels\n", 35 | "\n", 36 | "test_images = x_test.reshape(len(x_test),28*28) / 255\n", 37 | "test_labels = np.zeros((len(y_test),10))\n", 38 | "for i,l in enumerate(y_test):\n", 39 | " test_labels[i][l] = 1\n", 40 | " \n", 41 | "np.random.seed(1)\n", 42 | "relu = lambda x:(x>=0) * x # returns x if x > 0, return 0 otherwise\n", 43 | "relu2deriv = lambda x: x>=0 # returns 1 for input > 0, return 0 otherwise\n", 44 | "alpha, iterations, hidden_size, pixels_per_image, num_labels = (0.005, 350, 40, 784, 10)\n", 45 | "\n", 46 | "weights_0_1 = 0.2*np.random.random((pixels_per_image,hidden_size)) - 0.1\n", 47 | "weights_1_2 = 0.2*np.random.random((hidden_size,num_labels)) - 0.1\n", 48 | "\n", 49 | "for j in range(iterations):\n", 50 | " error, correct_cnt = (0.0, 0)\n", 51 | " \n", 52 | " for i in range(len(images)):\n", 53 | " layer_0 = images[i:i+1]\n", 54 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 55 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 56 | "\n", 57 | " error += np.sum((labels[i:i+1] - layer_2) ** 2)\n", 58 | " correct_cnt += int(np.argmax(layer_2) == \\\n", 59 | " np.argmax(labels[i:i+1]))\n", 60 | "\n", 61 | " layer_2_delta = (labels[i:i+1] - layer_2)\n", 62 | " layer_1_delta = layer_2_delta.dot(weights_1_2.T)\\\n", 63 | " * relu2deriv(layer_1)\n", 64 | " weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n", 65 | " weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)\n", 66 | "\n", 67 | " sys.stdout.write(\"\\r I:\"+str(j)+ \\\n", 68 | " \" Train-Err:\" + str(error/float(len(images)))[0:5] +\\\n", 69 | " \" Train-Acc:\" + str(correct_cnt/float(len(images))))" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 23, 75 | "metadata": {}, 76 | "outputs": [ 77 | { 78 | "name": "stdout", 79 | "output_type": "stream", 80 | "text": [ 81 | " Test-Err:0.653 Test-Acc:0.7073\n", 82 | "\n" 83 | ] 84 | } 85 | ], 86 | "source": [ 87 | "if(j % 10 == 0 or j == iterations-1):\n", 88 | " error, correct_cnt = (0.0, 0)\n", 89 | "\n", 90 | " for i in range(len(test_images)):\n", 91 | "\n", 92 | " layer_0 = test_images[i:i+1]\n", 93 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 94 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 95 | "\n", 96 | " error += np.sum((test_labels[i:i+1] - layer_2) ** 2)\n", 97 | " correct_cnt += int(np.argmax(layer_2) == \\\n", 98 | " np.argmax(test_labels[i:i+1]))\n", 99 | " sys.stdout.write(\" Test-Err:\" + str(error/float(len(test_images)))[0:5] +\\\n", 100 | " \" Test-Acc:\" + str(correct_cnt/float(len(test_images))) + \"\\n\")\n", 101 | " print()" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 28, 107 | "metadata": {}, 108 | "outputs": [ 109 | { 110 | "name": "stdout", 111 | "output_type": "stream", 112 | "text": [ 113 | " I:0 Train-Err:0.722 Train-Acc:0.537 Test-Err:0.601 Test-Acc:0.6488\n", 114 | " I:10 Train-Err:0.312 Train-Acc:0.901 Test-Err:0.420 Test-Acc:0.8114\n", 115 | " I:20 Train-Err:0.260 Train-Acc:0.93 Test-Err:0.414 Test-Acc:0.8111\n", 116 | " I:30 Train-Err:0.232 Train-Acc:0.946 Test-Err:0.417 Test-Acc:0.8066\n", 117 | " I:40 Train-Err:0.215 Train-Acc:0.956 Test-Err:0.426 Test-Acc:0.8019\n", 118 | " I:50 Train-Err:0.204 Train-Acc:0.966 Test-Err:0.437 Test-Acc:0.7982\n", 119 | " I:60 Train-Err:0.194 Train-Acc:0.967 Test-Err:0.448 Test-Acc:0.7921\n", 120 | " I:70 Train-Err:0.186 Train-Acc:0.975 Test-Err:0.458 Test-Acc:0.7864\n", 121 | " I:80 Train-Err:0.179 Train-Acc:0.979 Test-Err:0.466 Test-Acc:0.7817\n", 122 | " I:90 Train-Err:0.172 Train-Acc:0.981 Test-Err:0.474 Test-Acc:0.7758\n", 123 | " I:100 Train-Err:0.166 Train-Acc:0.984 Test-Err:0.482 Test-Acc:0.7706\n", 124 | " I:110 Train-Err:0.161 Train-Acc:0.984 Test-Err:0.489 Test-Acc:0.7686\n", 125 | " I:120 Train-Err:0.157 Train-Acc:0.986 Test-Err:0.496 Test-Acc:0.766\n", 126 | " I:130 Train-Err:0.153 Train-Acc:0.99 Test-Err:0.502 Test-Acc:0.7622\n", 127 | " I:140 Train-Err:0.149 Train-Acc:0.991 Test-Err:0.508 Test-Acc:0.758\n", 128 | " I:150 Train-Err:0.145 Train-Acc:0.991 Test-Err:0.513 Test-Acc:0.7558\n", 129 | " I:160 Train-Err:0.141 Train-Acc:0.992 Test-Err:0.518 Test-Acc:0.7553\n", 130 | " I:170 Train-Err:0.138 Train-Acc:0.992 Test-Err:0.524 Test-Acc:0.751\n", 131 | " I:180 Train-Err:0.135 Train-Acc:0.995 Test-Err:0.528 Test-Acc:0.7505\n", 132 | " I:190 Train-Err:0.132 Train-Acc:0.995 Test-Err:0.533 Test-Acc:0.7482\n", 133 | " I:200 Train-Err:0.130 Train-Acc:0.998 Test-Err:0.538 Test-Acc:0.7464\n", 134 | " I:210 Train-Err:0.127 Train-Acc:0.998 Test-Err:0.544 Test-Acc:0.7446\n", 135 | " I:220 Train-Err:0.125 Train-Acc:0.998 Test-Err:0.552 Test-Acc:0.7416\n", 136 | " I:230 Train-Err:0.123 Train-Acc:0.998 Test-Err:0.560 Test-Acc:0.7372\n", 137 | " I:240 Train-Err:0.121 Train-Acc:0.998 Test-Err:0.569 Test-Acc:0.7344\n", 138 | " I:250 Train-Err:0.120 Train-Acc:0.999 Test-Err:0.577 Test-Acc:0.7316\n", 139 | " I:260 Train-Err:0.118 Train-Acc:0.999 Test-Err:0.585 Test-Acc:0.729\n", 140 | " I:270 Train-Err:0.117 Train-Acc:0.999 Test-Err:0.593 Test-Acc:0.7259\n", 141 | " I:280 Train-Err:0.115 Train-Acc:0.999 Test-Err:0.600 Test-Acc:0.723\n", 142 | " I:290 Train-Err:0.114 Train-Acc:0.999 Test-Err:0.607 Test-Acc:0.7196\n", 143 | " I:300 Train-Err:0.113 Train-Acc:0.999 Test-Err:0.614 Test-Acc:0.7183\n", 144 | " I:310 Train-Err:0.112 Train-Acc:0.999 Test-Err:0.622 Test-Acc:0.7165\n", 145 | " I:320 Train-Err:0.111 Train-Acc:0.999 Test-Err:0.629 Test-Acc:0.7133\n", 146 | " I:330 Train-Err:0.110 Train-Acc:0.999 Test-Err:0.637 Test-Acc:0.7125\n", 147 | " I:340 Train-Err:0.109 Train-Acc:1.0 Test-Err:0.645 Test-Acc:0.71\n", 148 | " I:349 Train-Err:0.108 Train-Acc:1.0 Test-Err:0.653 Test-Acc:0.7073\n" 149 | ] 150 | } 151 | ], 152 | "source": [ 153 | "import sys, numpy as np\n", 154 | "from keras.datasets import mnist\n", 155 | "\n", 156 | "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", 157 | "\n", 158 | "images, labels = (x_train[0:1000].reshape(1000,28*28) / 255, y_train[0:1000])\n", 159 | "\n", 160 | "one_hot_labels = np.zeros((len(labels),10))\n", 161 | "for i,l in enumerate(labels):\n", 162 | " one_hot_labels[i][l] = 1\n", 163 | "labels = one_hot_labels\n", 164 | "\n", 165 | "test_images = x_test.reshape(len(x_test),28*28) / 255\n", 166 | "test_labels = np.zeros((len(y_test),10))\n", 167 | "for i,l in enumerate(y_test):\n", 168 | " test_labels[i][l] = 1\n", 169 | "\n", 170 | "np.random.seed(1)\n", 171 | "relu = lambda x:(x>=0) * x # returns x if x > 0, return 0 otherwise\n", 172 | "relu2deriv = lambda x: x>=0 # returns 1 for input > 0, return 0 otherwise\n", 173 | "alpha, iterations, hidden_size, pixels_per_image, num_labels = (0.005, 350, 40, 784, 10)\n", 174 | "\n", 175 | "weights_0_1 = 0.2*np.random.random((pixels_per_image,hidden_size)) - 0.1\n", 176 | "weights_1_2 = 0.2*np.random.random((hidden_size,num_labels)) - 0.1\n", 177 | "\n", 178 | "for j in range(iterations):\n", 179 | " error, correct_cnt = (0.0, 0)\n", 180 | " \n", 181 | " for i in range(len(images)):\n", 182 | " layer_0 = images[i:i+1]\n", 183 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 184 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 185 | "\n", 186 | " error += np.sum((labels[i:i+1] - layer_2) ** 2)\n", 187 | " correct_cnt += int(np.argmax(layer_2) == \\\n", 188 | " np.argmax(labels[i:i+1]))\n", 189 | "\n", 190 | " layer_2_delta = (labels[i:i+1] - layer_2)\n", 191 | " layer_1_delta = layer_2_delta.dot(weights_1_2.T)\\\n", 192 | " * relu2deriv(layer_1)\n", 193 | " weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n", 194 | " weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)\n", 195 | "\n", 196 | " sys.stdout.write(\"\\r I:\"+str(j)+ \\\n", 197 | " \" Train-Err:\" + str(error/float(len(images)))[0:5] +\\\n", 198 | " \" Train-Acc:\" + str(correct_cnt/float(len(images))))\n", 199 | " \n", 200 | " if(j % 10 == 0 or j == iterations-1):\n", 201 | " error, correct_cnt = (0.0, 0)\n", 202 | "\n", 203 | " for i in range(len(test_images)):\n", 204 | "\n", 205 | " layer_0 = test_images[i:i+1]\n", 206 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 207 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 208 | "\n", 209 | " error += np.sum((test_labels[i:i+1] - layer_2) ** 2)\n", 210 | " correct_cnt += int(np.argmax(layer_2) == \\\n", 211 | " np.argmax(test_labels[i:i+1]))\n", 212 | " sys.stdout.write(\" Test-Err:\" + str(error/float(len(test_images)))[0:5] +\\\n", 213 | " \" Test-Acc:\" + str(correct_cnt/float(len(test_images))))\n", 214 | " print()" 215 | ] 216 | }, 217 | { 218 | "cell_type": "markdown", 219 | "metadata": {}, 220 | "source": [ 221 | "# Dropout In Code" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 55, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "i = 0\n", 231 | "layer_0 = images[i:i+1]\n", 232 | "dropout_mask = np.random.randint(2,size=layer_1.shape)\n", 233 | "\n", 234 | "layer_1 *= dropout_mask * 2\n", 235 | "layer_2 = np.dot(layer_1, weights_1_2)\n", 236 | "\n", 237 | "error += np.sum((labels[i:i+1] - layer_2) ** 2)\n", 238 | "\n", 239 | "correct_cnt += int(np.argmax(layer_2) == np.argmax(labels[i+i+1]))\n", 240 | "\n", 241 | "layer_2_delta = (labels[i:i+1] - layer_2)\n", 242 | "layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)\n", 243 | "\n", 244 | "layer_1_delta *= dropout_mask\n", 245 | "\n", 246 | "weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n", 247 | "weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 57, 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "name": "stdout", 257 | "output_type": "stream", 258 | "text": [ 259 | "\n", 260 | "I:0 Test-Err:0.641 Test-Acc:0.6333 Train-Err:0.891 Train-Acc:0.413\n", 261 | "I:10 Test-Err:0.458 Test-Acc:0.787 Train-Err:0.472 Train-Acc:0.764\n", 262 | "I:20 Test-Err:0.415 Test-Acc:0.8133 Train-Err:0.430 Train-Acc:0.809\n", 263 | "I:30 Test-Err:0.421 Test-Acc:0.8114 Train-Err:0.415 Train-Acc:0.811\n", 264 | "I:40 Test-Err:0.419 Test-Acc:0.8112 Train-Err:0.413 Train-Acc:0.827\n", 265 | "I:50 Test-Err:0.409 Test-Acc:0.8133 Train-Err:0.392 Train-Acc:0.836\n", 266 | "I:60 Test-Err:0.412 Test-Acc:0.8236 Train-Err:0.402 Train-Acc:0.836\n", 267 | "I:70 Test-Err:0.412 Test-Acc:0.8033 Train-Err:0.383 Train-Acc:0.857\n", 268 | "I:80 Test-Err:0.410 Test-Acc:0.8054 Train-Err:0.386 Train-Acc:0.854\n", 269 | "I:90 Test-Err:0.411 Test-Acc:0.8144 Train-Err:0.376 Train-Acc:0.868\n", 270 | "I:100 Test-Err:0.411 Test-Acc:0.7903 Train-Err:0.369 Train-Acc:0.864\n", 271 | "I:110 Test-Err:0.411 Test-Acc:0.8003 Train-Err:0.371 Train-Acc:0.868\n", 272 | "I:120 Test-Err:0.402 Test-Acc:0.8046 Train-Err:0.353 Train-Acc:0.857\n", 273 | "I:130 Test-Err:0.408 Test-Acc:0.8091 Train-Err:0.352 Train-Acc:0.867\n", 274 | "I:140 Test-Err:0.405 Test-Acc:0.8083 Train-Err:0.355 Train-Acc:0.885\n", 275 | "I:150 Test-Err:0.404 Test-Acc:0.8107 Train-Err:0.342 Train-Acc:0.883\n", 276 | "I:160 Test-Err:0.399 Test-Acc:0.8146 Train-Err:0.361 Train-Acc:0.876\n", 277 | "I:170 Test-Err:0.404 Test-Acc:0.8074 Train-Err:0.344 Train-Acc:0.889\n", 278 | "I:180 Test-Err:0.399 Test-Acc:0.807 Train-Err:0.333 Train-Acc:0.892\n", 279 | "I:190 Test-Err:0.407 Test-Acc:0.8066 Train-Err:0.335 Train-Acc:0.898\n", 280 | "I:200 Test-Err:0.405 Test-Acc:0.8036 Train-Err:0.347 Train-Acc:0.893\n", 281 | "I:210 Test-Err:0.405 Test-Acc:0.8034 Train-Err:0.336 Train-Acc:0.894\n", 282 | "I:220 Test-Err:0.402 Test-Acc:0.8067 Train-Err:0.325 Train-Acc:0.896\n", 283 | "I:230 Test-Err:0.404 Test-Acc:0.8091 Train-Err:0.321 Train-Acc:0.894\n", 284 | "I:240 Test-Err:0.415 Test-Acc:0.8091 Train-Err:0.332 Train-Acc:0.898\n", 285 | "I:250 Test-Err:0.395 Test-Acc:0.8182 Train-Err:0.320 Train-Acc:0.899\n", 286 | "I:260 Test-Err:0.390 Test-Acc:0.8204 Train-Err:0.321 Train-Acc:0.899\n", 287 | "I:270 Test-Err:0.382 Test-Acc:0.8194 Train-Err:0.312 Train-Acc:0.906\n", 288 | "I:280 Test-Err:0.396 Test-Acc:0.8208 Train-Err:0.317 Train-Acc:0.9\n", 289 | "I:290 Test-Err:0.399 Test-Acc:0.8181 Train-Err:0.301 Train-Acc:0.908" 290 | ] 291 | } 292 | ], 293 | "source": [ 294 | "import numpy, sys\n", 295 | "np.random.seed(1)\n", 296 | "def relu(x):\n", 297 | " return (x >= 0) * x # returns x if x > 0\n", 298 | " # returns 0 otherwise\n", 299 | "\n", 300 | "def relu2deriv(output):\n", 301 | " return output >= 0 #returns 1 for input > 0\n", 302 | "\n", 303 | "alpha, iterations, hidden_size = (0.005, 300, 100)\n", 304 | "pixels_per_image, num_labels = (784, 10)\n", 305 | "\n", 306 | "weights_0_1 = 0.2*np.random.random((pixels_per_image,hidden_size)) - 0.1\n", 307 | "weights_1_2 = 0.2*np.random.random((hidden_size,num_labels)) - 0.1\n", 308 | "\n", 309 | "for j in range(iterations):\n", 310 | " error, correct_cnt = (0.0,0)\n", 311 | " for i in range(len(images)):\n", 312 | " layer_0 = images[i:i+1]\n", 313 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 314 | " dropout_mask = np.random.randint(2, size=layer_1.shape)\n", 315 | " layer_1 *= dropout_mask * 2\n", 316 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 317 | "\n", 318 | " error += np.sum((labels[i:i+1] - layer_2) ** 2)\n", 319 | " correct_cnt += int(np.argmax(layer_2) == np.argmax(labels[i:i+1]))\n", 320 | " layer_2_delta = (labels[i:i+1] - layer_2)\n", 321 | " layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)\n", 322 | " layer_1_delta *= dropout_mask\n", 323 | "\n", 324 | " weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n", 325 | " weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)\n", 326 | "\n", 327 | " if(j%10 == 0):\n", 328 | " test_error = 0.0\n", 329 | " test_correct_cnt = 0\n", 330 | "\n", 331 | " for i in range(len(test_images)):\n", 332 | " layer_0 = test_images[i:i+1]\n", 333 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 334 | " layer_2 = np.dot(layer_1, weights_1_2)\n", 335 | "\n", 336 | " test_error += np.sum((test_labels[i:i+1] - layer_2) ** 2)\n", 337 | " test_correct_cnt += int(np.argmax(layer_2) == np.argmax(test_labels[i:i+1]))\n", 338 | "\n", 339 | " sys.stdout.write(\"\\n\" + \\\n", 340 | " \"I:\" + str(j) + \\\n", 341 | " \" Test-Err:\" + str(test_error/ float(len(test_images)))[0:5] +\\\n", 342 | " \" Test-Acc:\" + str(test_correct_cnt/ float(len(test_images)))+\\\n", 343 | " \" Train-Err:\" + str(error/ float(len(images)))[0:5] +\\\n", 344 | " \" Train-Acc:\" + str(correct_cnt/ float(len(images))))" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "# Batch Gradient Descent" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 38, 357 | "metadata": {}, 358 | "outputs": [ 359 | { 360 | "name": "stdout", 361 | "output_type": "stream", 362 | "text": [ 363 | "\n", 364 | "I:0 Test-Err:0.815 Test-Acc:0.3832 Train-Err:1.284 Train-Acc:0.165\n", 365 | "I:10 Test-Err:0.568 Test-Acc:0.7173 Train-Err:0.591 Train-Acc:0.672\n", 366 | "I:20 Test-Err:0.510 Test-Acc:0.7571 Train-Err:0.532 Train-Acc:0.729\n", 367 | "I:30 Test-Err:0.485 Test-Acc:0.7793 Train-Err:0.498 Train-Acc:0.754\n", 368 | "I:40 Test-Err:0.468 Test-Acc:0.7877 Train-Err:0.489 Train-Acc:0.749\n", 369 | "I:50 Test-Err:0.458 Test-Acc:0.793 Train-Err:0.468 Train-Acc:0.775\n", 370 | "I:60 Test-Err:0.452 Test-Acc:0.7995 Train-Err:0.452 Train-Acc:0.799\n", 371 | "I:70 Test-Err:0.446 Test-Acc:0.803 Train-Err:0.453 Train-Acc:0.792\n", 372 | "I:80 Test-Err:0.451 Test-Acc:0.7968 Train-Err:0.457 Train-Acc:0.786\n", 373 | "I:90 Test-Err:0.447 Test-Acc:0.795 Train-Err:0.454 Train-Acc:0.799\n", 374 | "I:100 Test-Err:0.448 Test-Acc:0.793 Train-Err:0.447 Train-Acc:0.796\n", 375 | "I:110 Test-Err:0.441 Test-Acc:0.7943 Train-Err:0.426 Train-Acc:0.816\n", 376 | "I:120 Test-Err:0.442 Test-Acc:0.7966 Train-Err:0.431 Train-Acc:0.813\n", 377 | "I:130 Test-Err:0.441 Test-Acc:0.7906 Train-Err:0.434 Train-Acc:0.816\n", 378 | "I:140 Test-Err:0.447 Test-Acc:0.7874 Train-Err:0.437 Train-Acc:0.822\n", 379 | "I:150 Test-Err:0.443 Test-Acc:0.7899 Train-Err:0.414 Train-Acc:0.823\n", 380 | "I:160 Test-Err:0.438 Test-Acc:0.797 Train-Err:0.427 Train-Acc:0.811\n", 381 | "I:170 Test-Err:0.440 Test-Acc:0.7884 Train-Err:0.418 Train-Acc:0.828\n", 382 | "I:180 Test-Err:0.436 Test-Acc:0.7935 Train-Err:0.407 Train-Acc:0.834\n", 383 | "I:190 Test-Err:0.434 Test-Acc:0.7935 Train-Err:0.410 Train-Acc:0.831\n", 384 | "I:200 Test-Err:0.435 Test-Acc:0.7972 Train-Err:0.416 Train-Acc:0.829\n", 385 | "I:210 Test-Err:0.434 Test-Acc:0.7923 Train-Err:0.409 Train-Acc:0.83\n", 386 | "I:220 Test-Err:0.433 Test-Acc:0.8032 Train-Err:0.396 Train-Acc:0.832\n", 387 | "I:230 Test-Err:0.431 Test-Acc:0.8036 Train-Err:0.393 Train-Acc:0.853\n", 388 | "I:240 Test-Err:0.430 Test-Acc:0.8047 Train-Err:0.397 Train-Acc:0.844\n", 389 | "I:250 Test-Err:0.429 Test-Acc:0.8028 Train-Err:0.386 Train-Acc:0.843\n", 390 | "I:260 Test-Err:0.431 Test-Acc:0.8038 Train-Err:0.394 Train-Acc:0.843\n", 391 | "I:270 Test-Err:0.428 Test-Acc:0.8014 Train-Err:0.384 Train-Acc:0.845\n", 392 | "I:280 Test-Err:0.430 Test-Acc:0.8067 Train-Err:0.401 Train-Acc:0.846\n", 393 | "I:290 Test-Err:0.428 Test-Acc:0.7975 Train-Err:0.383 Train-Acc:0.851" 394 | ] 395 | } 396 | ], 397 | "source": [ 398 | "import numpy as np\n", 399 | "np.random.seed(1)\n", 400 | "\n", 401 | "def relu(x):\n", 402 | " return (x >= 0) * x # returns x if x > 0\n", 403 | "\n", 404 | "def relu2deriv(output):\n", 405 | " return output >= 0 # returns 1 for input > 0\n", 406 | "\n", 407 | "batch_size = 100\n", 408 | "alpha, iterations = (0.001, 300)\n", 409 | "pixels_per_image, num_labels, hidden_size = (784, 10, 100)\n", 410 | "\n", 411 | "weights_0_1 = 0.2*np.random.random((pixels_per_image,hidden_size)) - 0.1\n", 412 | "weights_1_2 = 0.2*np.random.random((hidden_size,num_labels)) - 0.1\n", 413 | "\n", 414 | "for j in range(iterations):\n", 415 | " error, correct_cnt = (0.0, 0)\n", 416 | " for i in range(int(len(images) / batch_size)):\n", 417 | " batch_start, batch_end = ((i * batch_size),((i+1)*batch_size))\n", 418 | "\n", 419 | " layer_0 = images[batch_start:batch_end]\n", 420 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 421 | " dropout_mask = np.random.randint(2,size=layer_1.shape)\n", 422 | " layer_1 *= dropout_mask * 2\n", 423 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 424 | "\n", 425 | " error += np.sum((labels[batch_start:batch_end] - layer_2) ** 2)\n", 426 | " for k in range(batch_size):\n", 427 | " correct_cnt += int(np.argmax(layer_2[k:k+1]) == np.argmax(labels[batch_start+k:batch_start+k+1]))\n", 428 | "\n", 429 | " layer_2_delta = (labels[batch_start:batch_end]-layer_2)/batch_size\n", 430 | " layer_1_delta = layer_2_delta.dot(weights_1_2.T)* relu2deriv(layer_1)\n", 431 | " layer_1_delta *= dropout_mask\n", 432 | "\n", 433 | " weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n", 434 | " weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)\n", 435 | " \n", 436 | " if(j%10 == 0):\n", 437 | " test_error = 0.0\n", 438 | " test_correct_cnt = 0\n", 439 | "\n", 440 | " for i in range(len(test_images)):\n", 441 | " layer_0 = test_images[i:i+1]\n", 442 | " layer_1 = relu(np.dot(layer_0,weights_0_1))\n", 443 | " layer_2 = np.dot(layer_1, weights_1_2)\n", 444 | "\n", 445 | " test_error += np.sum((test_labels[i:i+1] - layer_2) ** 2)\n", 446 | " test_correct_cnt += int(np.argmax(layer_2) == np.argmax(test_labels[i:i+1]))\n", 447 | "\n", 448 | " sys.stdout.write(\"\\n\" + \\\n", 449 | " \"I:\" + str(j) + \\\n", 450 | " \" Test-Err:\" + str(test_error/ float(len(test_images)))[0:5] +\\\n", 451 | " \" Test-Acc:\" + str(test_correct_cnt/ float(len(test_images)))+\\\n", 452 | " \" Train-Err:\" + str(error/ float(len(images)))[0:5] +\\\n", 453 | " \" Train-Acc:\" + str(correct_cnt/ float(len(images))))" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": null, 459 | "metadata": {}, 460 | "outputs": [], 461 | "source": [] 462 | } 463 | ], 464 | "metadata": { 465 | "kernelspec": { 466 | "display_name": "Python 3", 467 | "language": "python", 468 | "name": "python3" 469 | }, 470 | "language_info": { 471 | "codemirror_mode": { 472 | "name": "ipython", 473 | "version": 3 474 | }, 475 | "file_extension": ".py", 476 | "mimetype": "text/x-python", 477 | "name": "python", 478 | "nbconvert_exporter": "python", 479 | "pygments_lexer": "ipython3", 480 | "version": "3.6.1" 481 | } 482 | }, 483 | "nbformat": 4, 484 | "nbformat_minor": 2 485 | } 486 | -------------------------------------------------------------------------------- /Chapter9 - Intro to Activation Functions - Modeling Probabilities.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Upgrading our MNIST Network" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 8, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "\n", 20 | "I:0 Test-Acc:0.394 Train-Acc:0.156\n", 21 | "I:10 Test-Acc:0.6867 Train-Acc:0.723\n", 22 | "I:20 Test-Acc:0.7025 Train-Acc:0.732\n", 23 | "I:30 Test-Acc:0.734 Train-Acc:0.763\n", 24 | "I:40 Test-Acc:0.7663 Train-Acc:0.794\n", 25 | "I:50 Test-Acc:0.7913 Train-Acc:0.819\n", 26 | "I:60 Test-Acc:0.8102 Train-Acc:0.849\n", 27 | "I:70 Test-Acc:0.8228 Train-Acc:0.864\n", 28 | "I:80 Test-Acc:0.831 Train-Acc:0.867\n", 29 | "I:90 Test-Acc:0.8364 Train-Acc:0.885\n", 30 | "I:100 Test-Acc:0.8407 Train-Acc:0.883\n", 31 | "I:110 Test-Acc:0.845 Train-Acc:0.891\n", 32 | "I:120 Test-Acc:0.8481 Train-Acc:0.901\n", 33 | "I:130 Test-Acc:0.8505 Train-Acc:0.901\n", 34 | "I:140 Test-Acc:0.8526 Train-Acc:0.905\n", 35 | "I:150 Test-Acc:0.8555 Train-Acc:0.914\n", 36 | "I:160 Test-Acc:0.8577 Train-Acc:0.925\n", 37 | "I:170 Test-Acc:0.8596 Train-Acc:0.918\n", 38 | "I:180 Test-Acc:0.8619 Train-Acc:0.933\n", 39 | "I:190 Test-Acc:0.863 Train-Acc:0.933\n", 40 | "I:200 Test-Acc:0.8642 Train-Acc:0.926\n", 41 | "I:210 Test-Acc:0.8653 Train-Acc:0.931\n", 42 | "I:220 Test-Acc:0.8668 Train-Acc:0.93\n", 43 | "I:230 Test-Acc:0.8672 Train-Acc:0.937\n", 44 | "I:240 Test-Acc:0.8681 Train-Acc:0.938\n", 45 | "I:250 Test-Acc:0.8687 Train-Acc:0.937\n", 46 | "I:260 Test-Acc:0.8684 Train-Acc:0.945\n", 47 | "I:270 Test-Acc:0.8703 Train-Acc:0.951\n", 48 | "I:280 Test-Acc:0.8699 Train-Acc:0.949\n", 49 | "I:290 Test-Acc:0.8701 Train-Acc:0.94" 50 | ] 51 | } 52 | ], 53 | "source": [ 54 | "import numpy as np, sys\n", 55 | "np.random.seed(1)\n", 56 | "\n", 57 | "from keras.datasets import mnist\n", 58 | "\n", 59 | "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", 60 | "\n", 61 | "images, labels = (x_train[0:1000].reshape(1000,28*28) / 255, y_train[0:1000])\n", 62 | "\n", 63 | "one_hot_labels = np.zeros((len(labels),10))\n", 64 | "for i,l in enumerate(labels):\n", 65 | " one_hot_labels[i][l] = 1\n", 66 | "labels = one_hot_labels\n", 67 | "\n", 68 | "test_images = x_test.reshape(len(x_test),28*28) / 255\n", 69 | "test_labels = np.zeros((len(y_test),10))\n", 70 | "for i,l in enumerate(y_test):\n", 71 | " test_labels[i][l] = 1\n", 72 | "\n", 73 | "def tanh(x):\n", 74 | " return np.tanh(x)\n", 75 | "\n", 76 | "def tanh2deriv(output):\n", 77 | " return 1 - (output ** 2)\n", 78 | "\n", 79 | "def softmax(x):\n", 80 | " temp = np.exp(x)\n", 81 | " return temp / np.sum(temp, axis=1, keepdims=True)\n", 82 | "\n", 83 | "alpha, iterations, hidden_size = (2, 300, 100)\n", 84 | "pixels_per_image, num_labels = (784, 10)\n", 85 | "batch_size = 100\n", 86 | "\n", 87 | "weights_0_1 = 0.02*np.random.random((pixels_per_image,hidden_size))-0.01\n", 88 | "weights_1_2 = 0.2*np.random.random((hidden_size,num_labels)) - 0.1\n", 89 | "\n", 90 | "for j in range(iterations):\n", 91 | " correct_cnt = 0\n", 92 | " for i in range(int(len(images) / batch_size)):\n", 93 | " batch_start, batch_end=((i * batch_size),((i+1)*batch_size))\n", 94 | " layer_0 = images[batch_start:batch_end]\n", 95 | " layer_1 = tanh(np.dot(layer_0,weights_0_1))\n", 96 | " dropout_mask = np.random.randint(2,size=layer_1.shape)\n", 97 | " layer_1 *= dropout_mask * 2\n", 98 | " layer_2 = softmax(np.dot(layer_1,weights_1_2))\n", 99 | "\n", 100 | " for k in range(batch_size):\n", 101 | " correct_cnt += int(np.argmax(layer_2[k:k+1]) == np.argmax(labels[batch_start+k:batch_start+k+1]))\n", 102 | "\n", 103 | " layer_2_delta = (labels[batch_start:batch_end]-layer_2) / (batch_size * layer_2.shape[0])\n", 104 | " layer_1_delta = layer_2_delta.dot(weights_1_2.T) * tanh2deriv(layer_1)\n", 105 | " layer_1_delta *= dropout_mask\n", 106 | "\n", 107 | " weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)\n", 108 | " weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)\n", 109 | "\n", 110 | " test_correct_cnt = 0\n", 111 | "\n", 112 | " for i in range(len(test_images)):\n", 113 | "\n", 114 | " layer_0 = test_images[i:i+1]\n", 115 | " layer_1 = tanh(np.dot(layer_0,weights_0_1))\n", 116 | " layer_2 = np.dot(layer_1,weights_1_2)\n", 117 | "\n", 118 | " test_correct_cnt += int(np.argmax(layer_2) == np.argmax(test_labels[i:i+1]))\n", 119 | " if(j % 10 == 0):\n", 120 | " sys.stdout.write(\"\\n\"+ \\\n", 121 | " \"I:\" + str(j) + \\\n", 122 | " \" Test-Acc:\"+str(test_correct_cnt/float(len(test_images)))+\\\n", 123 | " \" Train-Acc:\" + str(correct_cnt/float(len(images))))" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [] 132 | } 133 | ], 134 | "metadata": { 135 | "kernelspec": { 136 | "display_name": "Python 3", 137 | "language": "python", 138 | "name": "python3" 139 | }, 140 | "language_info": { 141 | "codemirror_mode": { 142 | "name": "ipython", 143 | "version": 3 144 | }, 145 | "file_extension": ".py", 146 | "mimetype": "text/x-python", 147 | "name": "python", 148 | "nbconvert_exporter": "python", 149 | "pygments_lexer": "ipython3", 150 | "version": "3.6.1" 151 | } 152 | }, 153 | "nbformat": 4, 154 | "nbformat_minor": 2 155 | } 156 | -------------------------------------------------------------------------------- /MNISTPreprocessor.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 6, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# First, install TensorFlow (https://www.tensorflow.org/) and Keras (https://keras.io/).\n", 10 | "\n", 11 | "from keras.datasets import mnist\n", 12 | "\n", 13 | "(x_train, y_train), (x_test, y_test) = mnist.load_data()\n", 14 | "\n", 15 | "images = x_train[0:1000]\n", 16 | "labels = y_train[0:1000]" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 7, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "text/plain": [ 27 | "array([[[0, 0, 0, ..., 0, 0, 0],\n", 28 | " [0, 0, 0, ..., 0, 0, 0],\n", 29 | " [0, 0, 0, ..., 0, 0, 0],\n", 30 | " ...,\n", 31 | " [0, 0, 0, ..., 0, 0, 0],\n", 32 | " [0, 0, 0, ..., 0, 0, 0],\n", 33 | " [0, 0, 0, ..., 0, 0, 0]],\n", 34 | "\n", 35 | " [[0, 0, 0, ..., 0, 0, 0],\n", 36 | " [0, 0, 0, ..., 0, 0, 0],\n", 37 | " [0, 0, 0, ..., 0, 0, 0],\n", 38 | " ...,\n", 39 | " [0, 0, 0, ..., 0, 0, 0],\n", 40 | " [0, 0, 0, ..., 0, 0, 0],\n", 41 | " [0, 0, 0, ..., 0, 0, 0]],\n", 42 | "\n", 43 | " [[0, 0, 0, ..., 0, 0, 0],\n", 44 | " [0, 0, 0, ..., 0, 0, 0],\n", 45 | " [0, 0, 0, ..., 0, 0, 0],\n", 46 | " ...,\n", 47 | " [0, 0, 0, ..., 0, 0, 0],\n", 48 | " [0, 0, 0, ..., 0, 0, 0],\n", 49 | " [0, 0, 0, ..., 0, 0, 0]],\n", 50 | "\n", 51 | " ...,\n", 52 | "\n", 53 | " [[0, 0, 0, ..., 0, 0, 0],\n", 54 | " [0, 0, 0, ..., 0, 0, 0],\n", 55 | " [0, 0, 0, ..., 0, 0, 0],\n", 56 | " ...,\n", 57 | " [0, 0, 0, ..., 0, 0, 0],\n", 58 | " [0, 0, 0, ..., 0, 0, 0],\n", 59 | " [0, 0, 0, ..., 0, 0, 0]],\n", 60 | "\n", 61 | " [[0, 0, 0, ..., 0, 0, 0],\n", 62 | " [0, 0, 0, ..., 0, 0, 0],\n", 63 | " [0, 0, 0, ..., 0, 0, 0],\n", 64 | " ...,\n", 65 | " [0, 0, 0, ..., 0, 0, 0],\n", 66 | " [0, 0, 0, ..., 0, 0, 0],\n", 67 | " [0, 0, 0, ..., 0, 0, 0]],\n", 68 | "\n", 69 | " [[0, 0, 0, ..., 0, 0, 0],\n", 70 | " [0, 0, 0, ..., 0, 0, 0],\n", 71 | " [0, 0, 0, ..., 0, 0, 0],\n", 72 | " ...,\n", 73 | " [0, 0, 0, ..., 0, 0, 0],\n", 74 | " [0, 0, 0, ..., 0, 0, 0],\n", 75 | " [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)" 76 | ] 77 | }, 78 | "execution_count": 7, 79 | "metadata": {}, 80 | "output_type": "execute_result" 81 | } 82 | ], 83 | "source": [ 84 | "images" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 8, 90 | "metadata": {}, 91 | "outputs": [ 92 | { 93 | "data": { 94 | "text/plain": [ 95 | "array([5, 0, 4, 1, 9, 2, 1, 3, 1, 4, 3, 5, 3, 6, 1, 7, 2, 8, 6, 9, 4, 0,\n", 96 | " 9, 1, 1, 2, 4, 3, 2, 7, 3, 8, 6, 9, 0, 5, 6, 0, 7, 6, 1, 8, 7, 9,\n", 97 | " 3, 9, 8, 5, 9, 3, 3, 0, 7, 4, 9, 8, 0, 9, 4, 1, 4, 4, 6, 0, 4, 5,\n", 98 | " 6, 1, 0, 0, 1, 7, 1, 6, 3, 0, 2, 1, 1, 7, 9, 0, 2, 6, 7, 8, 3, 9,\n", 99 | " 0, 4, 6, 7, 4, 6, 8, 0, 7, 8, 3, 1, 5, 7, 1, 7, 1, 1, 6, 3, 0, 2,\n", 100 | " 9, 3, 1, 1, 0, 4, 9, 2, 0, 0, 2, 0, 2, 7, 1, 8, 6, 4, 1, 6, 3, 4,\n", 101 | " 5, 9, 1, 3, 3, 8, 5, 4, 7, 7, 4, 2, 8, 5, 8, 6, 7, 3, 4, 6, 1, 9,\n", 102 | " 9, 6, 0, 3, 7, 2, 8, 2, 9, 4, 4, 6, 4, 9, 7, 0, 9, 2, 9, 5, 1, 5,\n", 103 | " 9, 1, 2, 3, 2, 3, 5, 9, 1, 7, 6, 2, 8, 2, 2, 5, 0, 7, 4, 9, 7, 8,\n", 104 | " 3, 2, 1, 1, 8, 3, 6, 1, 0, 3, 1, 0, 0, 1, 7, 2, 7, 3, 0, 4, 6, 5,\n", 105 | " 2, 6, 4, 7, 1, 8, 9, 9, 3, 0, 7, 1, 0, 2, 0, 3, 5, 4, 6, 5, 8, 6,\n", 106 | " 3, 7, 5, 8, 0, 9, 1, 0, 3, 1, 2, 2, 3, 3, 6, 4, 7, 5, 0, 6, 2, 7,\n", 107 | " 9, 8, 5, 9, 2, 1, 1, 4, 4, 5, 6, 4, 1, 2, 5, 3, 9, 3, 9, 0, 5, 9,\n", 108 | " 6, 5, 7, 4, 1, 3, 4, 0, 4, 8, 0, 4, 3, 6, 8, 7, 6, 0, 9, 7, 5, 7,\n", 109 | " 2, 1, 1, 6, 8, 9, 4, 1, 5, 2, 2, 9, 0, 3, 9, 6, 7, 2, 0, 3, 5, 4,\n", 110 | " 3, 6, 5, 8, 9, 5, 4, 7, 4, 2, 7, 3, 4, 8, 9, 1, 9, 2, 8, 7, 9, 1,\n", 111 | " 8, 7, 4, 1, 3, 1, 1, 0, 2, 3, 9, 4, 9, 2, 1, 6, 8, 4, 7, 7, 4, 4,\n", 112 | " 9, 2, 5, 7, 2, 4, 4, 2, 1, 9, 7, 2, 8, 7, 6, 9, 2, 2, 3, 8, 1, 6,\n", 113 | " 5, 1, 1, 0, 2, 6, 4, 5, 8, 3, 1, 5, 1, 9, 2, 7, 4, 4, 4, 8, 1, 5,\n", 114 | " 8, 9, 5, 6, 7, 9, 9, 3, 7, 0, 9, 0, 6, 6, 2, 3, 9, 0, 7, 5, 4, 8,\n", 115 | " 0, 9, 4, 1, 2, 8, 7, 1, 2, 6, 1, 0, 3, 0, 1, 1, 8, 2, 0, 3, 9, 4,\n", 116 | " 0, 5, 0, 6, 1, 7, 7, 8, 1, 9, 2, 0, 5, 1, 2, 2, 7, 3, 5, 4, 9, 7,\n", 117 | " 1, 8, 3, 9, 6, 0, 3, 1, 1, 2, 6, 3, 5, 7, 6, 8, 3, 9, 5, 8, 5, 7,\n", 118 | " 6, 1, 1, 3, 1, 7, 5, 5, 5, 2, 5, 8, 7, 0, 9, 7, 7, 5, 0, 9, 0, 0,\n", 119 | " 8, 9, 2, 4, 8, 1, 6, 1, 6, 5, 1, 8, 3, 4, 0, 5, 5, 8, 3, 6, 2, 3,\n", 120 | " 9, 2, 1, 1, 5, 2, 1, 3, 2, 8, 7, 3, 7, 2, 4, 6, 9, 7, 2, 4, 2, 8,\n", 121 | " 1, 1, 3, 8, 4, 0, 6, 5, 9, 3, 0, 9, 2, 4, 7, 1, 2, 9, 4, 2, 6, 1,\n", 122 | " 8, 9, 0, 6, 6, 7, 9, 9, 8, 0, 1, 4, 4, 6, 7, 1, 5, 7, 0, 3, 5, 8,\n", 123 | " 4, 7, 1, 2, 5, 9, 5, 6, 7, 5, 9, 8, 8, 3, 6, 9, 7, 0, 7, 5, 7, 1,\n", 124 | " 1, 0, 7, 9, 2, 3, 7, 3, 2, 4, 1, 6, 2, 7, 5, 5, 7, 4, 0, 2, 6, 3,\n", 125 | " 6, 4, 0, 4, 2, 6, 0, 0, 0, 0, 3, 1, 6, 2, 2, 3, 1, 4, 1, 5, 4, 6,\n", 126 | " 4, 7, 2, 8, 7, 9, 2, 0, 5, 1, 4, 2, 8, 3, 2, 4, 1, 5, 4, 6, 0, 7,\n", 127 | " 9, 8, 4, 9, 8, 0, 1, 1, 0, 2, 2, 3, 2, 4, 4, 5, 8, 6, 5, 7, 7, 8,\n", 128 | " 8, 9, 7, 4, 7, 3, 2, 0, 8, 6, 8, 6, 1, 6, 8, 9, 4, 0, 9, 0, 4, 1,\n", 129 | " 5, 4, 7, 5, 3, 7, 4, 9, 8, 5, 8, 6, 3, 8, 6, 9, 9, 1, 8, 3, 5, 8,\n", 130 | " 6, 5, 9, 7, 2, 5, 0, 8, 5, 1, 1, 0, 9, 1, 8, 6, 7, 0, 9, 3, 0, 8,\n", 131 | " 8, 9, 6, 7, 8, 4, 7, 5, 9, 2, 6, 7, 4, 5, 9, 2, 3, 1, 6, 3, 9, 2,\n", 132 | " 2, 5, 6, 8, 0, 7, 7, 1, 9, 8, 7, 0, 9, 9, 4, 6, 2, 8, 5, 1, 4, 1,\n", 133 | " 5, 5, 1, 7, 3, 6, 4, 3, 2, 5, 6, 4, 4, 0, 4, 4, 6, 7, 2, 4, 3, 3,\n", 134 | " 8, 0, 0, 3, 2, 2, 9, 8, 2, 3, 7, 0, 1, 1, 0, 2, 3, 3, 8, 4, 3, 5,\n", 135 | " 7, 6, 4, 7, 7, 8, 5, 9, 7, 0, 3, 1, 6, 2, 4, 3, 4, 4, 7, 5, 9, 6,\n", 136 | " 9, 0, 7, 1, 4, 2, 7, 3, 6, 7, 5, 8, 4, 5, 5, 2, 7, 1, 1, 5, 6, 8,\n", 137 | " 5, 8, 4, 0, 7, 9, 9, 2, 9, 7, 7, 8, 7, 4, 2, 6, 9, 1, 7, 0, 6, 4,\n", 138 | " 2, 5, 7, 0, 7, 1, 0, 3, 7, 6, 5, 0, 6, 1, 5, 1, 7, 8, 5, 0, 3, 4,\n", 139 | " 7, 7, 5, 7, 8, 6, 9, 3, 8, 6, 1, 0, 9, 7, 1, 3, 0, 5, 6, 4, 4, 2,\n", 140 | " 4, 4, 3, 1, 7, 7, 6, 0, 3, 6], dtype=uint8)" 141 | ] 142 | }, 143 | "execution_count": 8, 144 | "metadata": {}, 145 | "output_type": "execute_result" 146 | } 147 | ], 148 | "source": [ 149 | "labels" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": null, 155 | "metadata": {}, 156 | "outputs": [], 157 | "source": [] 158 | } 159 | ], 160 | "metadata": { 161 | "kernelspec": { 162 | "display_name": "Python 3", 163 | "language": "python", 164 | "name": "python3" 165 | }, 166 | "language_info": { 167 | "codemirror_mode": { 168 | "name": "ipython", 169 | "version": 3 170 | }, 171 | "file_extension": ".py", 172 | "mimetype": "text/x-python", 173 | "name": "python", 174 | "nbconvert_exporter": "python", 175 | "pygments_lexer": "ipython3", 176 | "version": "3.6.4" 177 | } 178 | }, 179 | "nbformat": 4, 180 | "nbformat_minor": 2 181 | } 182 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Grokking-Deep-Learning 2 | [![Run on FloydHub](https://static.floydhub.com/button/button-small.svg)](https://floydhub.com/run) 3 | 4 | This repository accompanies the book "Grokking Deep Learning", [available here](https://manning.com/books/grokking-deep-learning?a_aid=grokkingdl&a_bid=32715258 "Grokking Deep Learning"). Also, the coupon code "trask40" is good for a 40% discount. 5 | 6 | - [Chapter 3 - Forward Propagation - Intro to Neural Prediction](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter3%20-%20%20Forward%20Propagation%20-%20Intro%20to%20Neural%20Prediction.ipynb) 7 | - [Chapter 4 - Gradient Descent - Into to Neural Learning](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter4%20-%20Gradient%20Descent%20-%20Intro%20to%20Neural%20Learning.ipynb) 8 | - [Chapter 5 - Generalizing Gradient Descent - Learning Multiple Weights at a Time](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter5%20-%20Generalizing%20Gradient%20Descent%20-%20Learning%20Multiple%20Weights%20at%20a%20Time.ipynb) 9 | - [Chapter 6 - Intro to Backpropagation - Building your first DEEP Neural Network](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter6%20-%20Intro%20to%20Backpropagation%20-%20Building%20Your%20First%20DEEP%20Neural%20Network.ipynb) 10 | - [Chapter 8 - Intro to Regularization - Learning Signal and Ignoring Noise](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter8%20-%20Intro%20to%20Regularization%20-%20Learning%20Signal%20and%20Ignoring%20Noise.ipynb) 11 | - [Chapter 9 - Intro to Activation Functions - Learning to Model Probabilities](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter9%20-%20Intro%20to%20Activation%20Functions%20-%20Modeling%20Probabilities.ipynb) 12 | - [Chapter 10 - Intro to Convolutional Neural Networks - Learning Edges and Corners](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter10%20-%20Intro%20to%20Convolutional%20Neural%20Networks%20-%20Learning%20Edges%20and%20Corners.ipynb) 13 | - [Chapter 11 - Intro to Word Embeddings - Neural Networks which Understand Language](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter11%20-%20Intro%20to%20Word%20Embeddings%20-%20Neural%20Networks%20that%20Understand%20Language.ipynb) 14 | - [Chapter 12 - Intro to Recurrence (RNNs) - Predicting the Next Word](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter12%20-%20Intro%20to%20Recurrence%20-%20Predicting%20the%20Next%20Word.ipynb) 15 | - [Chapter 13 - Intro to Automatic Differentiation](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter13%20-%20Intro%20to%20Automatic%20Differentiation%20-%20Let's%20Build%20A%20Deep%20Learning%20Framework.ipynb) 16 | - [Chapter 14 - Exploding Gradients Example](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter14%20-%20Exploding%20Gradients%20Examples.ipynb) 17 | - [Chapter 14 - Intro to LSTMs](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter14%20-%20Intro%20to%20LSTMs%20-%20Learn%20to%20Write%20Like%20Shakespeare.ipynb) 18 | - [Chapter 14 - Intro to LSTMs - Part 2](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter14%20-%20Intro%20to%20LSTMs%20-%20Part%202%20-%20Learn%20to%20Write%20Like%20Shakespeare.ipynb) 19 | - [Chapter 15 - Intro to Federated Learning](https://github.com/iamtrask/Grokking-Deep-Learning/blob/master/Chapter15%20-%20Intro%20to%20Federated%20Learning%20-%20Deep%20Learning%20on%20Unseen%20Data.ipynb) 20 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | # 2 | # Run the Jupyter notebooks in a container using Docker Compose 3 | # 4 | # Start container: 5 | # docker-compose up -d 6 | # 7 | # Open http://localhost:8888/ in browser and run the samples 8 | # 9 | # Stop the container: 10 | # docker-compose stop 11 | # 12 | # Stop and remove container, network, etc.: 13 | # docker-compose down 14 | # 15 | 16 | version: '2' 17 | services: 18 | jupyter: 19 | image: jupyter/tensorflow-notebook 20 | container_name: jupyter 21 | environment: 22 | JUPYTER_ENABLE_LAB: 1 23 | volumes: 24 | - .:/home/jovyan/work 25 | ports: 26 | - "8888:8888" 27 | entrypoint: 28 | - start-notebook.sh 29 | - --NotebookApp.token='' 30 | -------------------------------------------------------------------------------- /floyd.yml: -------------------------------------------------------------------------------- 1 | env: tensorflow-1.9 2 | machine: cpu 3 | -------------------------------------------------------------------------------- /spam.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/luisguiserrano/Grokking-Deep-Learning/e665168b4aefe3256360951504103cf98827ca51/spam.txt -------------------------------------------------------------------------------- /tasksv11/LICENSE: -------------------------------------------------------------------------------- 1 | CC License 2 | 3 | bAbI tasks data 4 | 5 | Copyright (c) 2015-present, Facebook, Inc. All rights reserved. 6 | 7 | Creative Commons Legal Code 8 | 9 | Attribution 3.0 Unported 10 | 11 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE 12 | LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN 13 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS 14 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES 15 | REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR 16 | DAMAGES RESULTING FROM ITS USE. 17 | 18 | License 19 | 20 | THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE 21 | COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY 22 | COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS 23 | AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED. 24 | 25 | BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE 26 | TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY 27 | BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS 28 | CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND 29 | CONDITIONS. 30 | 31 | 1. Definitions 32 | 33 | a. "Adaptation" means a work based upon the Work, or upon the Work and 34 | other pre-existing works, such as a translation, adaptation, 35 | derivative work, arrangement of music or other alterations of a 36 | literary or artistic work, or phonogram or performance and includes 37 | cinematographic adaptations or any other form in which the Work may be 38 | recast, transformed, or adapted including in any form recognizably 39 | derived from the original, except that a work that constitutes a 40 | Collection will not be considered an Adaptation for the purpose of 41 | this License. For the avoidance of doubt, where the Work is a musical 42 | work, performance or phonogram, the synchronization of the Work in 43 | timed-relation with a moving image ("synching") will be considered an 44 | Adaptation for the purpose of this License. 45 | b. "Collection" means a collection of literary or artistic works, such as 46 | encyclopedias and anthologies, or performances, phonograms or 47 | broadcasts, or other works or subject matter other than works listed 48 | in Section 1(f) below, which, by reason of the selection and 49 | arrangement of their contents, constitute intellectual creations, in 50 | which the Work is included in its entirety in unmodified form along 51 | with one or more other contributions, each constituting separate and 52 | independent works in themselves, which together are assembled into a 53 | collective whole. A work that constitutes a Collection will not be 54 | considered an Adaptation (as defined above) for the purposes of this 55 | License. 56 | c. "Distribute" means to make available to the public the original and 57 | copies of the Work or Adaptation, as appropriate, through sale or 58 | other transfer of ownership. 59 | d. "Licensor" means the individual, individuals, entity or entities that 60 | offer(s) the Work under the terms of this License. 61 | e. "Original Author" means, in the case of a literary or artistic work, 62 | the individual, individuals, entity or entities who created the Work 63 | or if no individual or entity can be identified, the publisher; and in 64 | addition (i) in the case of a performance the actors, singers, 65 | musicians, dancers, and other persons who act, sing, deliver, declaim, 66 | play in, interpret or otherwise perform literary or artistic works or 67 | expressions of folklore; (ii) in the case of a phonogram the producer 68 | being the person or legal entity who first fixes the sounds of a 69 | performance or other sounds; and, (iii) in the case of broadcasts, the 70 | organization that transmits the broadcast. 71 | f. "Work" means the literary and/or artistic work offered under the terms 72 | of this License including without limitation any production in the 73 | literary, scientific and artistic domain, whatever may be the mode or 74 | form of its expression including digital form, such as a book, 75 | pamphlet and other writing; a lecture, address, sermon or other work 76 | of the same nature; a dramatic or dramatico-musical work; a 77 | choreographic work or entertainment in dumb show; a musical 78 | composition with or without words; a cinematographic work to which are 79 | assimilated works expressed by a process analogous to cinematography; 80 | a work of drawing, painting, architecture, sculpture, engraving or 81 | lithography; a photographic work to which are assimilated works 82 | expressed by a process analogous to photography; a work of applied 83 | art; an illustration, map, plan, sketch or three-dimensional work 84 | relative to geography, topography, architecture or science; a 85 | performance; a broadcast; a phonogram; a compilation of data to the 86 | extent it is protected as a copyrightable work; or a work performed by 87 | a variety or circus performer to the extent it is not otherwise 88 | considered a literary or artistic work. 89 | g. "You" means an individual or entity exercising rights under this 90 | License who has not previously violated the terms of this License with 91 | respect to the Work, or who has received express permission from the 92 | Licensor to exercise rights under this License despite a previous 93 | violation. 94 | h. "Publicly Perform" means to perform public recitations of the Work and 95 | to communicate to the public those public recitations, by any means or 96 | process, including by wire or wireless means or public digital 97 | performances; to make available to the public Works in such a way that 98 | members of the public may access these Works from a place and at a 99 | place individually chosen by them; to perform the Work to the public 100 | by any means or process and the communication to the public of the 101 | performances of the Work, including by public digital performance; to 102 | broadcast and rebroadcast the Work by any means including signs, 103 | sounds or images. 104 | i. "Reproduce" means to make copies of the Work by any means including 105 | without limitation by sound or visual recordings and the right of 106 | fixation and reproducing fixations of the Work, including storage of a 107 | protected performance or phonogram in digital form or other electronic 108 | medium. 109 | 110 | 2. Fair Dealing Rights. Nothing in this License is intended to reduce, 111 | limit, or restrict any uses free from copyright or rights arising from 112 | limitations or exceptions that are provided for in connection with the 113 | copyright protection under copyright law or other applicable laws. 114 | 115 | 3. License Grant. Subject to the terms and conditions of this License, 116 | Licensor hereby grants You a worldwide, royalty-free, non-exclusive, 117 | perpetual (for the duration of the applicable copyright) license to 118 | exercise the rights in the Work as stated below: 119 | 120 | a. to Reproduce the Work, to incorporate the Work into one or more 121 | Collections, and to Reproduce the Work as incorporated in the 122 | Collections; 123 | b. to create and Reproduce Adaptations provided that any such Adaptation, 124 | including any translation in any medium, takes reasonable steps to 125 | clearly label, demarcate or otherwise identify that changes were made 126 | to the original Work. For example, a translation could be marked "The 127 | original work was translated from English to Spanish," or a 128 | modification could indicate "The original work has been modified."; 129 | c. to Distribute and Publicly Perform the Work including as incorporated 130 | in Collections; and, 131 | d. to Distribute and Publicly Perform Adaptations. 132 | e. For the avoidance of doubt: 133 | 134 | i. Non-waivable Compulsory License Schemes. In those jurisdictions in 135 | which the right to collect royalties through any statutory or 136 | compulsory licensing scheme cannot be waived, the Licensor 137 | reserves the exclusive right to collect such royalties for any 138 | exercise by You of the rights granted under this License; 139 | ii. Waivable Compulsory License Schemes. In those jurisdictions in 140 | which the right to collect royalties through any statutory or 141 | compulsory licensing scheme can be waived, the Licensor waives the 142 | exclusive right to collect such royalties for any exercise by You 143 | of the rights granted under this License; and, 144 | iii. Voluntary License Schemes. The Licensor waives the right to 145 | collect royalties, whether individually or, in the event that the 146 | Licensor is a member of a collecting society that administers 147 | voluntary licensing schemes, via that society, from any exercise 148 | by You of the rights granted under this License. 149 | 150 | The above rights may be exercised in all media and formats whether now 151 | known or hereafter devised. The above rights include the right to make 152 | such modifications as are technically necessary to exercise the rights in 153 | other media and formats. Subject to Section 8(f), all rights not expressly 154 | granted by Licensor are hereby reserved. 155 | 156 | 4. Restrictions. The license granted in Section 3 above is expressly made 157 | subject to and limited by the following restrictions: 158 | 159 | a. You may Distribute or Publicly Perform the Work only under the terms 160 | of this License. You must include a copy of, or the Uniform Resource 161 | Identifier (URI) for, this License with every copy of the Work You 162 | Distribute or Publicly Perform. You may not offer or impose any terms 163 | on the Work that restrict the terms of this License or the ability of 164 | the recipient of the Work to exercise the rights granted to that 165 | recipient under the terms of the License. You may not sublicense the 166 | Work. You must keep intact all notices that refer to this License and 167 | to the disclaimer of warranties with every copy of the Work You 168 | Distribute or Publicly Perform. When You Distribute or Publicly 169 | Perform the Work, You may not impose any effective technological 170 | measures on the Work that restrict the ability of a recipient of the 171 | Work from You to exercise the rights granted to that recipient under 172 | the terms of the License. This Section 4(a) applies to the Work as 173 | incorporated in a Collection, but this does not require the Collection 174 | apart from the Work itself to be made subject to the terms of this 175 | License. If You create a Collection, upon notice from any Licensor You 176 | must, to the extent practicable, remove from the Collection any credit 177 | as required by Section 4(b), as requested. If You create an 178 | Adaptation, upon notice from any Licensor You must, to the extent 179 | practicable, remove from the Adaptation any credit as required by 180 | Section 4(b), as requested. 181 | b. If You Distribute, or Publicly Perform the Work or any Adaptations or 182 | Collections, You must, unless a request has been made pursuant to 183 | Section 4(a), keep intact all copyright notices for the Work and 184 | provide, reasonable to the medium or means You are utilizing: (i) the 185 | name of the Original Author (or pseudonym, if applicable) if supplied, 186 | and/or if the Original Author and/or Licensor designate another party 187 | or parties (e.g., a sponsor institute, publishing entity, journal) for 188 | attribution ("Attribution Parties") in Licensor's copyright notice, 189 | terms of service or by other reasonable means, the name of such party 190 | or parties; (ii) the title of the Work if supplied; (iii) to the 191 | extent reasonably practicable, the URI, if any, that Licensor 192 | specifies to be associated with the Work, unless such URI does not 193 | refer to the copyright notice or licensing information for the Work; 194 | and (iv) , consistent with Section 3(b), in the case of an Adaptation, 195 | a credit identifying the use of the Work in the Adaptation (e.g., 196 | "French translation of the Work by Original Author," or "Screenplay 197 | based on original Work by Original Author"). The credit required by 198 | this Section 4 (b) may be implemented in any reasonable manner; 199 | provided, however, that in the case of a Adaptation or Collection, at 200 | a minimum such credit will appear, if a credit for all contributing 201 | authors of the Adaptation or Collection appears, then as part of these 202 | credits and in a manner at least as prominent as the credits for the 203 | other contributing authors. For the avoidance of doubt, You may only 204 | use the credit required by this Section for the purpose of attribution 205 | in the manner set out above and, by exercising Your rights under this 206 | License, You may not implicitly or explicitly assert or imply any 207 | connection with, sponsorship or endorsement by the Original Author, 208 | Licensor and/or Attribution Parties, as appropriate, of You or Your 209 | use of the Work, without the separate, express prior written 210 | permission of the Original Author, Licensor and/or Attribution 211 | Parties. 212 | c. Except as otherwise agreed in writing by the Licensor or as may be 213 | otherwise permitted by applicable law, if You Reproduce, Distribute or 214 | Publicly Perform the Work either by itself or as part of any 215 | Adaptations or Collections, You must not distort, mutilate, modify or 216 | take other derogatory action in relation to the Work which would be 217 | prejudicial to the Original Author's honor or reputation. Licensor 218 | agrees that in those jurisdictions (e.g. Japan), in which any exercise 219 | of the right granted in Section 3(b) of this License (the right to 220 | make Adaptations) would be deemed to be a distortion, mutilation, 221 | modification or other derogatory action prejudicial to the Original 222 | Author's honor and reputation, the Licensor will waive or not assert, 223 | as appropriate, this Section, to the fullest extent permitted by the 224 | applicable national law, to enable You to reasonably exercise Your 225 | right under Section 3(b) of this License (right to make Adaptations) 226 | but not otherwise. 227 | 228 | 5. Representations, Warranties and Disclaimer 229 | 230 | UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING, LICENSOR 231 | OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY 232 | KIND CONCERNING THE WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE, 233 | INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTIBILITY, 234 | FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF 235 | LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, 236 | WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION 237 | OF IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU. 238 | 239 | 6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE 240 | LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR 241 | ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES 242 | ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS 243 | BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 244 | 245 | 7. Termination 246 | 247 | a. This License and the rights granted hereunder will terminate 248 | automatically upon any breach by You of the terms of this License. 249 | Individuals or entities who have received Adaptations or Collections 250 | from You under this License, however, will not have their licenses 251 | terminated provided such individuals or entities remain in full 252 | compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will 253 | survive any termination of this License. 254 | b. Subject to the above terms and conditions, the license granted here is 255 | perpetual (for the duration of the applicable copyright in the Work). 256 | Notwithstanding the above, Licensor reserves the right to release the 257 | Work under different license terms or to stop distributing the Work at 258 | any time; provided, however that any such election will not serve to 259 | withdraw this License (or any other license that has been, or is 260 | required to be, granted under the terms of this License), and this 261 | License will continue in full force and effect unless terminated as 262 | stated above. 263 | 264 | 8. Miscellaneous 265 | 266 | a. Each time You Distribute or Publicly Perform the Work or a Collection, 267 | the Licensor offers to the recipient a license to the Work on the same 268 | terms and conditions as the license granted to You under this License. 269 | b. Each time You Distribute or Publicly Perform an Adaptation, Licensor 270 | offers to the recipient a license to the original Work on the same 271 | terms and conditions as the license granted to You under this License. 272 | c. If any provision of this License is invalid or unenforceable under 273 | applicable law, it shall not affect the validity or enforceability of 274 | the remainder of the terms of this License, and without further action 275 | by the parties to this agreement, such provision shall be reformed to 276 | the minimum extent necessary to make such provision valid and 277 | enforceable. 278 | d. No term or provision of this License shall be deemed waived and no 279 | breach consented to unless such waiver or consent shall be in writing 280 | and signed by the party to be charged with such waiver or consent. 281 | e. This License constitutes the entire agreement between the parties with 282 | respect to the Work licensed here. There are no understandings, 283 | agreements or representations with respect to the Work not specified 284 | here. Licensor shall not be bound by any additional provisions that 285 | may appear in any communication from You. This License may not be 286 | modified without the mutual written agreement of the Licensor and You. 287 | f. The rights granted under, and the subject matter referenced, in this 288 | License were drafted utilizing the terminology of the Berne Convention 289 | for the Protection of Literary and Artistic Works (as amended on 290 | September 28, 1979), the Rome Convention of 1961, the WIPO Copyright 291 | Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996 292 | and the Universal Copyright Convention (as revised on July 24, 1971). 293 | These rights and subject matter take effect in the relevant 294 | jurisdiction in which the License terms are sought to be enforced 295 | according to the corresponding provisions of the implementation of 296 | those treaty provisions in the applicable national law. If the 297 | standard suite of rights granted under applicable copyright law 298 | includes additional rights not granted under this License, such 299 | additional rights are deemed to be included in the License; this 300 | License is not intended to restrict the license of any rights under 301 | applicable law. 302 | 303 | 304 | Creative Commons Notice 305 | 306 | Creative Commons is not a party to this License, and makes no warranty 307 | whatsoever in connection with the Work. Creative Commons will not be 308 | liable to You or any party on any legal theory for any damages 309 | whatsoever, including without limitation any general, special, 310 | incidental or consequential damages arising in connection to this 311 | license. Notwithstanding the foregoing two (2) sentences, if Creative 312 | Commons has expressly identified itself as the Licensor hereunder, it 313 | shall have all rights and obligations of Licensor. 314 | 315 | Except for the limited purpose of indicating to the public that the 316 | Work is licensed under the CCPL, Creative Commons does not authorize 317 | the use by either party of the trademark "Creative Commons" or any 318 | related trademark or logo of Creative Commons without the prior 319 | written consent of Creative Commons. Any permitted use will be in 320 | compliance with Creative Commons' then-current trademark usage 321 | guidelines, as may be published on its website or otherwise made 322 | available upon request from time to time. For the avoidance of doubt, 323 | this trademark restriction does not form part of this License. 324 | 325 | Creative Commons may be contacted at https://creativecommons.org/. 326 | -------------------------------------------------------------------------------- /tasksv11/README: -------------------------------------------------------------------------------- 1 | Towards AI Complete Question Answering: A Set of Prerequisite Toy Tasks 2 | ----------------------------------------------------------------------- 3 | In this directory is the first set of 20 tasks for testing text understanding and reasoning in the bAbI project. 4 | The aim is that each task tests a unique aspect of text and reasoning, and hence test different capabilities of learning models. More tasks are planned in the future to capture more aspects. 5 | 6 | For each task, there are 1000 questions for training, and 1000 for testing. 7 | However, we emphasize that the goal is still to use as little data as possible to do well on the task (i.e. if you can use less than 1000 that's even better) -- and without resorting to engineering task-specific tricks that will not generalize to other tasks, as they may not be of much use subsequently. Note that the aim during evaluation is to use the _same_ learner across all tasks to evaluate its skills and capabilities. 8 | Further while the MemNN results in the paper use full supervision (including of the supporting facts) results with weak supervision would also be ultimately preferable as this kind of data is easier to collect. Hence results of that form are very welcome. 9 | 10 | For the reasons above there are currently two directories: 11 | 12 | 1) en/ -- the tasks in English, readable by humans. 13 | 2) shuffled/ -- the same tasks with shuffled letters so they are not readable by humans, and for existing parsers and taggers cannot be used in a straight-forward fashion to leverage extra resources-- in this case the learner is more forced to rely on the given training data. This mimics a learner being first presented a language and having to learn from scratch. We plan to add further languages in the future as well, e.g. German, French .. 14 | 15 | The file format for each task is as follows: 16 | ID text 17 | ID text 18 | ID text 19 | ID question[tab]answer[tab]supporting fact IDS. 20 | ... 21 | 22 | The IDs for a given "story" start at 1 and increase. 23 | When the IDs in a file reset back to 1 you can consider the following sentences as a new "story". 24 | Supporting fact IDs only ever reference the sentences within a "story". 25 | 26 | For Example: 27 | 1 Mary moved to the bathroom. 28 | 2 John went to the hallway. 29 | 3 Where is Mary? bathroom 1 30 | 4 Daniel went back to the hallway. 31 | 5 Sandra moved to the garden. 32 | 6 Where is Daniel? hallway 4 33 | 7 John moved to the office. 34 | 8 Sandra journeyed to the bathroom. 35 | 9 Where is Daniel? hallway 4 36 | 10 Mary moved to the hallway. 37 | 11 Daniel travelled to the office. 38 | 12 Where is Daniel? office 11 39 | 13 John went back to the garden. 40 | 14 John moved to the bedroom. 41 | 15 Where is Sandra? bathroom 8 42 | 1 Sandra travelled to the office. 43 | 2 Sandra went to the bathroom. 44 | 3 Where is Sandra? bathroom 2 45 | 46 | Changes between versions. 47 | ========================= 48 | V1.1 (this version) - fixed some problems with task 3, and reduced the training set size available to 1000 as this matches the results in the paper cited above, in order to avoid confusion. 49 | --------------------------------------------------------------------------------