├── certificate.png ├── README.md ├── NLP with Sequence Models ├── Week_2 │ ├── C3_W2_lecture_notebook_GRU.ipynb │ ├── C3_W2_lecture_notebook_perplexity.ipynb │ ├── C3_W2_Lecture_Notebook_Hidden_State_Activation.ipynb │ └── C3_W2_lecture_notebook_RNNs.ipynb ├── Week_4 │ ├── C3_W4_lecture_notebook_siamese.ipynb │ ├── C3_W4_lecture_notebook_accuracy.ipynb │ └── C3_W4_Lecture_Notebook_Modified_Triplet_Loss.ipynb └── Week_1 │ └── NLP_C3_W1_lecture_nb_03_data_generatos.ipynb ├── NLP with Probabilistic Models ├── Week_1 │ ├── NLP_C2_W1_lecture_nb_02.ipynb │ └── NLP_C2_W1_lecture_nb_01.ipynb ├── Week_4 │ └── NLP_C2_W4_lecture_notebook_word_embeddings.ipynb └── Week_3 │ ├── NLP_C2_W3_lecture_nb_01.ipynb │ └── NLP_C2_W3_lecture_nb_03.ipynb └── NLP with Attention Models ├── Week_2 ├── C4_W2_lecture_notebook_Attention.ipynb └── C4_W2_lecture_notebook_Transformer_Decoder.ipynb └── Week_3 ├── C4_W3_Assignment_Ungraded_BERT_Loss.ipynb └── C4_W3_Assignment_Ungraded_T5.ipynb /certificate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/samiptimalsena/Natural-Language-Processing-Specialization/HEAD/certificate.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Natural Language Processing Specialization 2 | 3 | ## Course 1: Classification and Vector Spaces in NLP 4 | - **Week 1**: Logistic Regression for Sentiment Analysis of Tweets [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Classification%20and%20Vector%20Spaces/Week_1) 5 | - **Week 2**: Naive Bayes for Sentiment Analysis of Tweets [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Classification%20and%20Vector%20Spaces/Week_2) 6 | - **Week 3**: Vector Space Models [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Classification%20and%20Vector%20Spaces/Week_3) 7 | - **Week 4**: Word Embeddings and Locality Sensitive Hashing for Machine Translation [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Classification%20and%20Vector%20Spaces/Week_4) 8 | 9 | ## Course 2: Probabilistic Models in NLP 10 | - **Week 1**: Auto-correct using Minimum Edit Distance [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Probabilistic%20Models/Week_1) 11 | - **Week 2**: Part-of-Speech (POS) Tagging [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Probabilistic%20Models/Week_2) 12 | - **Week 3**: N-gram Language Models [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Probabilistic%20Models/Week_3) 13 | - **Week 4**: Word2Vec and Stochastic Gradient Descent [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Probabilistic%20Models/Week_4) 14 | 15 | ## Course 3: Sequence Models in NLP 16 | - **Week 1**: Sentiment Analysis with Neural Nets [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Sequence%20Models/Week_1) 17 | - **Week 2**: Language Generation Models [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Sequence%20Models/Week_2) 18 | - **Week 3**: Named Entity Recognition (NER) [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Sequence%20Models/Week_3) 19 | - **Week 4**: Siamese Networks [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Sequence%20Models/Week_4) 20 | 21 | ## Course 4: Attention Models in NLP 22 | - **Week 1**: Neural Machine Translation with Attention [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Attention%20Models/Week_1) 23 | - **Week 2**: Text Summarization with Transformer Models [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Attention%20Models/Week_2) 24 | - **Week 3**: Question-Answering with Transformer Models [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Attention%20Models/Week_3) 25 | - **Week 4**: Chatbots with a Reformer Model [[Link]](https://github.com/samiptimalsena/Natural-Language-Processing-Specialization/tree/master/NLP%20with%20Attention%20Models/Week_4) 26 | 27 | 28 | # Certificate 29 | 30 | -------------------------------------------------------------------------------- /NLP with Sequence Models/Week_2/C3_W2_lecture_notebook_GRU.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Creating a GRU model using Trax: Ungraded Lecture Notebook" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "For this lecture notebook you will be using Trax's layers. These are the building blocks for creating neural networks with Trax." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 1, 20 | "metadata": {}, 21 | "outputs": [ 22 | { 23 | "name": "stdout", 24 | "output_type": "stream", 25 | "text": [ 26 | "INFO:tensorflow:tokens_length=568 inputs_length=512 targets_length=114 noise_density=0.15 mean_noise_span_length=3.0 \n" 27 | ] 28 | } 29 | ], 30 | "source": [ 31 | "import trax\n", 32 | "from trax import layers as tl" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "Trax allows to define neural network architectures by stacking layers (similarly to other libraries such as Keras). For this the `Serial()` is often used as it is a combinator that allows to stack layers serially using function composition.\n", 40 | "\n", 41 | "Next you can see a simple vanilla NN architecture containing 1 hidden(dense) layer with 128 cells and output (dense) layer with 10 cells on which we apply the final layer of logsoftmax." 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "mlp = tl.Serial(\n", 51 | " tl.Dense(128),\n", 52 | " tl.Relu(),\n", 53 | " tl.Dense(10),\n", 54 | " tl.LogSoftmax()\n", 55 | ")" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "Each of the layers within the `Serial` combinator layer is considered a sublayer. Notice that unlike similar libraries, **in Trax the activation functions are considered layers.** To know more about the `Serial` layer check the docs [here](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.combinators.Serial).\n", 63 | "\n", 64 | "You can try printing this object:" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 3, 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "name": "stdout", 74 | "output_type": "stream", 75 | "text": [ 76 | "Serial[\n", 77 | " Dense_128\n", 78 | " Relu\n", 79 | " Dense_10\n", 80 | " LogSoftmax\n", 81 | "]\n" 82 | ] 83 | } 84 | ], 85 | "source": [ 86 | "print(mlp)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "Printing the model gives you the exact same information as the model's definition itself.\n", 94 | "\n", 95 | "By just looking at the definition you can clearly see what is going on inside the neural network. Trax is very straightforward in the way a network is defined, that is one of the things that makes it awesome! " 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "## GRU MODEL" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "To create a `GRU` model you will need to be familiar with the following layers (Documentation link attached with each layer name):\n", 110 | " - [`ShiftRight`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.attention.ShiftRight) Shifts the tensor to the right by padding on axis 1. The `mode` should be specified and it refers to the context in which the model is being used. Possible values are: 'train', 'eval' or 'predict', predict mode is for fast inference. Defaults to \"train\".\n", 111 | " \n", 112 | " - [`Embedding`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Embedding) Maps discrete tokens to vectors. It will have shape `(vocabulary length X dimension of output vectors)`. The dimension of output vectors (also called `d_feature`) is the number of elements in the word embedding.\n", 113 | " - [`GRU`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.rnn.GRU) The GRU layer. It leverages another Trax layer called [`GRUCell`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.rnn.GRUCell). The number of GRU units should be specified and should match the number of elements in the word embedding. If you want to stack two consecutive GRU layers, it can be done by using python's list comprehension.\n", 114 | " - [`Dense`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Dense) Vanilla Dense layer.\n", 115 | " - [`LogSoftMax`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.LogSoftmax) Log Softmax function.\n", 116 | "\n", 117 | "Putting everything together the GRU model will look like this:" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [ 126 | "mode = 'train'\n", 127 | "vocab_size = 256\n", 128 | "model_dimension = 512\n", 129 | "n_layers = 2\n", 130 | "\n", 131 | "GRU = tl.Serial(\n", 132 | " tl.ShiftRight(mode=mode), # Do remember to pass the mode parameter if you are using it for interence/test as default is train \n", 133 | " tl.Embedding(vocab_size=vocab_size, d_feature=model_dimension),\n", 134 | " [tl.GRU(n_units=model_dimension) for _ in range(n_layers)], # You can play around n_layers if you want to stack more GRU layers together\n", 135 | " tl.Dense(n_units=vocab_size),\n", 136 | " tl.LogSoftmax()\n", 137 | " )" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "Next is a helper function that prints information for every layer (sublayer within `Serial`):\n", 145 | "\n", 146 | "_Try changing the parameters defined before the GRU model and see how it changes!_\n" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": null, 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "def show_layers(model, layer_prefix=\"Serial.sublayers\"):\n", 156 | " print(f\"Total layers: {len(model.sublayers)}\\n\")\n", 157 | " for i in range(len(model.sublayers)):\n", 158 | " print('========')\n", 159 | " print(f'{layer_prefix}_{i}: {model.sublayers[i]}\\n')\n", 160 | " \n", 161 | "show_layers(GRU)" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "Hope you are now more familiarized with creating GRU models using Trax. \n", 169 | "\n", 170 | "You will train this model in this week's assignment and see it in action. \n", 171 | "\n", 172 | "\n", 173 | "**GRU and the trax minions will return, in this week's endgame.**" 174 | ] 175 | } 176 | ], 177 | "metadata": { 178 | "kernelspec": { 179 | "display_name": "Python 3", 180 | "language": "python", 181 | "name": "python3" 182 | }, 183 | "language_info": { 184 | "codemirror_mode": { 185 | "name": "ipython", 186 | "version": 3 187 | }, 188 | "file_extension": ".py", 189 | "mimetype": "text/x-python", 190 | "name": "python", 191 | "nbconvert_exporter": "python", 192 | "pygments_lexer": "ipython3", 193 | "version": "3.7.1" 194 | } 195 | }, 196 | "nbformat": 4, 197 | "nbformat_minor": 4 198 | } 199 | -------------------------------------------------------------------------------- /NLP with Sequence Models/Week_4/C3_W4_lecture_notebook_siamese.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Creating a Siamese model using Trax: Ungraded Lecture Notebook" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "INFO:tensorflow:tokens_length=568 inputs_length=512 targets_length=114 noise_density=0.15 mean_noise_span_length=3.0 \n" 20 | ] 21 | } 22 | ], 23 | "source": [ 24 | "import trax\n", 25 | "from trax import layers as tl\n", 26 | "import trax.fastmath.numpy as np\n", 27 | "import numpy\n", 28 | "\n", 29 | "# Setting random seeds\n", 30 | "trax.supervised.trainer_lib.init_random_number_generators(10)\n", 31 | "numpy.random.seed(10)" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "## L2 Normalization" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "Before building the model you will need to define a function that applies L2 normalization to a tensor. This is very important because in this week's assignment you will create a custom loss function which expects the tensors it receives to be normalized. Luckily this is pretty straightforward:" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 2, 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "def normalize(x):\n", 55 | " return x / np.sqrt(np.sum(x * x, axis=-1, keepdims=True))" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "Notice that the denominator can be replaced by `np.linalg.norm(x, axis=-1, keepdims=True)` to achieve the same results and that Trax's numpy is being used within the function." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 3, 68 | "metadata": {}, 69 | "outputs": [ 70 | { 71 | "name": "stdout", 72 | "output_type": "stream", 73 | "text": [ 74 | "The tensor is of type: \n", 75 | "\n", 76 | "And looks like this:\n", 77 | "\n", 78 | " [[0.77132064 0.02075195 0.63364823 0.74880388 0.49850701]\n", 79 | " [0.22479665 0.19806286 0.76053071 0.16911084 0.08833981]]\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "tensor = numpy.random.random((2,5))\n", 85 | "print(f'The tensor is of type: {type(tensor)}\\n\\nAnd looks like this:\\n\\n {tensor}')" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 4, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "name": "stdout", 95 | "output_type": "stream", 96 | "text": [ 97 | "The normalized tensor is of type: \n", 98 | "\n", 99 | "And looks like this:\n", 100 | "\n", 101 | " [[0.57393795 0.01544148 0.4714962 0.55718327 0.37093794]\n", 102 | " [0.26781026 0.23596111 0.9060541 0.20146926 0.10524315]]\n" 103 | ] 104 | } 105 | ], 106 | "source": [ 107 | "norm_tensor = normalize(tensor)\n", 108 | "print(f'The normalized tensor is of type: {type(norm_tensor)}\\n\\nAnd looks like this:\\n\\n {norm_tensor}')" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "Notice that the initial tensor was converted from a numpy array to a jax array in the process." 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "## Siamese Model" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": {}, 128 | "source": [ 129 | "To create a `Siamese` model you will first need to create a LSTM model using the `Serial` combinator layer and then use another combinator layer called `Parallel` to create the Siamese model. You should be familiar with the following layers (notice each layer can be clicked to go to the docs):\n", 130 | " - [`Serial`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.combinators.Serial) A combinator layer that allows to stack layers serially using function composition.\n", 131 | " - [`Embedding`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Embedding) Maps discrete tokens to vectors. It will have shape `(vocabulary length X dimension of output vectors)`. The dimension of output vectors (also called `d_feature`) is the number of elements in the word embedding.\n", 132 | " - [`LSTM`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.rnn.LSTM) The LSTM layer. It leverages another Trax layer called [`LSTMCell`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.rnn.LSTMCell). The number of units should be specified and should match the number of elements in the word embedding.\n", 133 | " - [`Mean`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.core.Mean) Computes the mean across a desired axis. Mean uses one tensor axis to form groups of values and replaces each group with the mean value of that group.\n", 134 | " - [`Fn`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.base.Fn) Layer with no weights that applies the function f, which should be specified using a lambda syntax. \n", 135 | " - [`Parallel`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#trax.layers.combinators.Parallel) It is a combinator layer (like `Serial`) that applies a list of layers in parallel to its inputs.\n", 136 | "\n", 137 | "Putting everything together the Siamese model will look like this:" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": null, 143 | "metadata": {}, 144 | "outputs": [], 145 | "source": [ 146 | "vocab_size = 500\n", 147 | "model_dimension = 128\n", 148 | "\n", 149 | "# Define the LSTM model\n", 150 | "LSTM = tl.Serial(\n", 151 | " tl.Embedding(vocab_size=vocab_size, d_feature=model_dimension),\n", 152 | " tl.LSTM(model_dimension),\n", 153 | " tl.Mean(axis=1),\n", 154 | " tl.Fn('Normalize', lambda x: normalize(x))\n", 155 | " )\n", 156 | "\n", 157 | "# Use the Parallel combinator to create a Siamese model out of the LSTM \n", 158 | "Siamese = tl.Parallel(LSTM, LSTM)" 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "Next is a helper function that prints information for every layer (sublayer within `Serial`):" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "def show_layers(model, layer_prefix):\n", 175 | " print(f\"Total layers: {len(model.sublayers)}\\n\")\n", 176 | " for i in range(len(model.sublayers)):\n", 177 | " print('========')\n", 178 | " print(f'{layer_prefix}_{i}: {model.sublayers[i]}\\n')\n", 179 | "\n", 180 | "print('Siamese model:\\n')\n", 181 | "show_layers(Siamese, 'Parallel.sublayers')\n", 182 | "\n", 183 | "print('Detail of LSTM models:\\n')\n", 184 | "show_layers(LSTM, 'Serial.sublayers')" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "Try changing the parameters defined before the Siamese model and see how it changes!\n", 192 | "\n", 193 | "You will actually train this model in this week's assignment. For now you should be more familiarized with creating Siamese models using Trax. **Keep it up!**" 194 | ] 195 | } 196 | ], 197 | "metadata": { 198 | "kernelspec": { 199 | "display_name": "Python 3", 200 | "language": "python", 201 | "name": "python3" 202 | }, 203 | "language_info": { 204 | "codemirror_mode": { 205 | "name": "ipython", 206 | "version": 3 207 | }, 208 | "file_extension": ".py", 209 | "mimetype": "text/x-python", 210 | "name": "python", 211 | "nbconvert_exporter": "python", 212 | "pygments_lexer": "ipython3", 213 | "version": "3.7.1" 214 | } 215 | }, 216 | "nbformat": 4, 217 | "nbformat_minor": 4 218 | } 219 | -------------------------------------------------------------------------------- /NLP with Probabilistic Models/Week_1/NLP_C2_W1_lecture_nb_02.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# NLP Course 2 Week 1 Lesson : Building The Model - Lecture Exercise 02\n", 8 | "Estimated Time: 20 minutes\n", 9 | "
\n", 10 | "# Candidates from String Edits\n", 11 | "Create a list of candidate strings by applying an edit operation\n", 12 | "
\n", 13 | "### Imports and Data" 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "# data\n", 23 | "word = 'dearz' # 🦌" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "### Splits\n", 31 | "Find all the ways you can split a word into 2 parts !" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 2, 37 | "metadata": {}, 38 | "outputs": [ 39 | { 40 | "name": "stdout", 41 | "output_type": "stream", 42 | "text": [ 43 | "['', 'dearz']\n", 44 | "['d', 'earz']\n", 45 | "['de', 'arz']\n", 46 | "['dea', 'rz']\n", 47 | "['dear', 'z']\n", 48 | "['dearz', '']\n" 49 | ] 50 | } 51 | ], 52 | "source": [ 53 | "# splits with a loop\n", 54 | "splits_a = []\n", 55 | "for i in range(len(word)+1):\n", 56 | " splits_a.append([word[:i],word[i:]])\n", 57 | "\n", 58 | "for i in splits_a:\n", 59 | " print(i)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 6, 65 | "metadata": {}, 66 | "outputs": [ 67 | { 68 | "name": "stdout", 69 | "output_type": "stream", 70 | "text": [ 71 | "('', 'dearz')\n", 72 | "('d', 'earz')\n", 73 | "('de', 'arz')\n", 74 | "('dea', 'rz')\n", 75 | "('dear', 'z')\n", 76 | "('dearz', '')\n" 77 | ] 78 | } 79 | ], 80 | "source": [ 81 | "# same splits, done using a list comprehension\n", 82 | "splits_b = [(word[:i], word[i:]) for i in range(len(word) + 1)]\n", 83 | "\n", 84 | "for i in splits_b:\n", 85 | " print(i)" 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": {}, 91 | "source": [ 92 | "### Delete Edit\n", 93 | "Delete a letter from each string in the `splits` list.\n", 94 | "
\n", 95 | "What this does is effectivly delete each possible letter from the original word being edited. " 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 7, 101 | "metadata": {}, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "word : dearz\n", 108 | "earz <-- delete d\n", 109 | "darz <-- delete e\n", 110 | "derz <-- delete a\n", 111 | "deaz <-- delete r\n", 112 | "dear <-- delete z\n" 113 | ] 114 | } 115 | ], 116 | "source": [ 117 | "# deletes with a loop\n", 118 | "splits = splits_a\n", 119 | "deletes = []\n", 120 | "\n", 121 | "print('word : ', word)\n", 122 | "for L,R in splits:\n", 123 | " if R:\n", 124 | " print(L + R[1:], ' <-- delete ', R[0])" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "It's worth taking a closer look at how this is excecuting a 'delete'.\n", 132 | "
\n", 133 | "Taking the first item from the `splits` list :" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 8, 139 | "metadata": {}, 140 | "outputs": [ 141 | { 142 | "name": "stdout", 143 | "output_type": "stream", 144 | "text": [ 145 | "word : dearz\n", 146 | "first item from the splits list : ['', 'dearz']\n", 147 | "L : \n", 148 | "R : dearz\n", 149 | "*** now implicit delete by excluding the leading letter ***\n", 150 | "L + R[1:] : earz <-- delete d\n" 151 | ] 152 | } 153 | ], 154 | "source": [ 155 | "# breaking it down\n", 156 | "print('word : ', word)\n", 157 | "one_split = splits[0]\n", 158 | "print('first item from the splits list : ', one_split)\n", 159 | "L = one_split[0]\n", 160 | "R = one_split[1]\n", 161 | "print('L : ', L)\n", 162 | "print('R : ', R)\n", 163 | "print('*** now implicit delete by excluding the leading letter ***')\n", 164 | "print('L + R[1:] : ',L + R[1:], ' <-- delete ', R[0])" 165 | ] 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "metadata": {}, 170 | "source": [ 171 | "So the end result transforms **'dearz'** to **'earz'** by deleting the first character.\n", 172 | "
\n", 173 | "And you use a **loop** (code block above) or a **list comprehension** (code block below) to do\n", 174 | "
\n", 175 | "this for the entire `splits` list." 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 9, 181 | "metadata": {}, 182 | "outputs": [ 183 | { 184 | "name": "stdout", 185 | "output_type": "stream", 186 | "text": [ 187 | "['earz', 'darz', 'derz', 'deaz', 'dear']\n", 188 | "*** which is the same as ***\n", 189 | "earz\n", 190 | "darz\n", 191 | "derz\n", 192 | "deaz\n", 193 | "dear\n" 194 | ] 195 | } 196 | ], 197 | "source": [ 198 | "# deletes with a list comprehension\n", 199 | "splits = splits_a\n", 200 | "deletes = [L + R[1:] for L, R in splits if R]\n", 201 | "\n", 202 | "print(deletes)\n", 203 | "print('*** which is the same as ***')\n", 204 | "for i in deletes:\n", 205 | " print(i)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "### Ungraded Exercise\n", 213 | "You now have a list of ***candidate strings*** created after performing a **delete** edit.\n", 214 | "
\n", 215 | "Next step will be to filter this list for ***candidate words*** found in a vocabulary.\n", 216 | "
\n", 217 | "Given the example vocab below, can you think of a way to create a list of candidate words ? \n", 218 | "
\n", 219 | "Remember, you already have a list of candidate strings, some of which are certainly not actual words you might find in your vocabulary !\n", 220 | "
\n", 221 | "
\n", 222 | "So from the above list **earz, darz, derz, deaz, dear**. \n", 223 | "
\n", 224 | "You're really only interested in **dear**." 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 11, 230 | "metadata": {}, 231 | "outputs": [ 232 | { 233 | "name": "stdout", 234 | "output_type": "stream", 235 | "text": [ 236 | "vocab : ['dean', 'deer', 'dear', 'fries', 'and', 'coke']\n", 237 | "edits : ['earz', 'darz', 'derz', 'deaz', 'dear']\n", 238 | "candidate words : {'dear'}\n" 239 | ] 240 | } 241 | ], 242 | "source": [ 243 | "vocab = ['dean','deer','dear','fries','and','coke']\n", 244 | "edits = list(deletes)\n", 245 | "\n", 246 | "print('vocab : ', vocab)\n", 247 | "print('edits : ', edits)\n", 248 | "\n", 249 | "candidates=[]\n", 250 | "\n", 251 | "### START CODE HERE ###\n", 252 | "#candidates = ?? # hint: 'set.intersection'\n", 253 | "candidates = set(deletes).intersection(set(vocab))\n", 254 | "### END CODE HERE ###\n", 255 | "\n", 256 | "print('candidate words : ', candidates)" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "Expected Outcome:\n", 264 | "\n", 265 | "vocab : ['dean', 'deer', 'dear', 'fries', 'and', 'coke']\n", 266 | "
\n", 267 | "edits : ['earz', 'darz', 'derz', 'deaz', 'dear']\n", 268 | "
\n", 269 | "candidate words : {'dear'}" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "### Summary\n", 277 | "You've unpacked an integral part of the assignment by breaking down **splits** and **edits**, specifically looking at **deletes** here.\n", 278 | "
\n", 279 | "Implementation of the other edit types (insert, replace, switch) follows a similar methodology and should now feel somewhat familiar when you see them.\n", 280 | "
\n", 281 | "This bit of the code isn't as intuitive as other sections, so well done!\n", 282 | "
\n", 283 | "You should now feel confident facing some of the more technical parts of the assignment at the end of the week." 284 | ] 285 | }, 286 | { 287 | "cell_type": "code", 288 | "execution_count": null, 289 | "metadata": {}, 290 | "outputs": [], 291 | "source": [] 292 | } 293 | ], 294 | "metadata": { 295 | "kernelspec": { 296 | "display_name": "Python 3", 297 | "language": "python", 298 | "name": "python3" 299 | }, 300 | "language_info": { 301 | "codemirror_mode": { 302 | "name": "ipython", 303 | "version": 3 304 | }, 305 | "file_extension": ".py", 306 | "mimetype": "text/x-python", 307 | "name": "python", 308 | "nbconvert_exporter": "python", 309 | "pygments_lexer": "ipython3", 310 | "version": "3.7.1" 311 | } 312 | }, 313 | "nbformat": 4, 314 | "nbformat_minor": 2 315 | } 316 | -------------------------------------------------------------------------------- /NLP with Probabilistic Models/Week_4/NLP_C2_W4_lecture_notebook_word_embeddings.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Word Embeddings: Hands On\n", 8 | "\n", 9 | "In previous lecture notebooks you saw all the steps needed to train the CBOW model. This notebook will walk you through how to extract the word embedding vectors from a model.\n", 10 | "\n", 11 | "Let's dive into it!" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import numpy as np\n", 21 | "from utils2 import get_dict" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "Before moving on, you will be provided with some variables needed for further procedures, which should be familiar by now. Also a trained CBOW model will be simulated, the corresponding weights and biases are provided: " 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "# Define the tokenized version of the corpus\n", 38 | "words = ['i', 'am', 'happy', 'because', 'i', 'am', 'learning']\n", 39 | "\n", 40 | "# Define V. Remember this is the size of the vocabulary\n", 41 | "V = 5\n", 42 | "\n", 43 | "# Get 'word2Ind' and 'Ind2word' dictionaries for the tokenized corpus\n", 44 | "word2Ind, Ind2word = get_dict(words)\n", 45 | "\n", 46 | "\n", 47 | "# Define first matrix of weights\n", 48 | "W1 = np.array([[ 0.41687358, 0.08854191, -0.23495225, 0.28320538, 0.41800106],\n", 49 | " [ 0.32735501, 0.22795148, -0.23951958, 0.4117634 , -0.23924344],\n", 50 | " [ 0.26637602, -0.23846886, -0.37770863, -0.11399446, 0.34008124]])\n", 51 | "\n", 52 | "# Define second matrix of weights\n", 53 | "W2 = np.array([[-0.22182064, -0.43008631, 0.13310965],\n", 54 | " [ 0.08476603, 0.08123194, 0.1772054 ],\n", 55 | " [ 0.1871551 , -0.06107263, -0.1790735 ],\n", 56 | " [ 0.07055222, -0.02015138, 0.36107434],\n", 57 | " [ 0.33480474, -0.39423389, -0.43959196]])\n", 58 | "\n", 59 | "# Define first vector of biases\n", 60 | "b1 = np.array([[ 0.09688219],\n", 61 | " [ 0.29239497],\n", 62 | " [-0.27364426]])\n", 63 | "\n", 64 | "# Define second vector of biases\n", 65 | "b2 = np.array([[ 0.0352008 ],\n", 66 | " [-0.36393384],\n", 67 | " [-0.12775555],\n", 68 | " [-0.34802326],\n", 69 | " [-0.07017815]])" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "\n", 77 | "\n", 78 | "## Extracting word embedding vectors\n", 79 | "\n", 80 | "Once you have finished training the neural network, you have three options to get word embedding vectors for the words of your vocabulary, based on the weight matrices $\\mathbf{W_1}$ and/or $\\mathbf{W_2}$.\n", 81 | "\n", 82 | "### Option 1: extract embedding vectors from $\\mathbf{W_1}$\n", 83 | "\n", 84 | "The first option is to take the columns of $\\mathbf{W_1}$ as the embedding vectors of the words of the vocabulary, using the same order of the words as for the input and output vectors.\n", 85 | "\n", 86 | "> Note: in this practice notebooks the values of the word embedding vectors are meaningless after a single iteration with just one training example, but here's how you would proceed after the training process is complete.\n", 87 | "\n", 88 | "For example $\\mathbf{W_1}$ is this matrix:" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 3, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "data": { 98 | "text/plain": [ 99 | "array([[ 0.41687358, 0.08854191, -0.23495225, 0.28320538, 0.41800106],\n", 100 | " [ 0.32735501, 0.22795148, -0.23951958, 0.4117634 , -0.23924344],\n", 101 | " [ 0.26637602, -0.23846886, -0.37770863, -0.11399446, 0.34008124]])" 102 | ] 103 | }, 104 | "execution_count": 3, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "# Print W1\n", 111 | "W1" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "The first column, which is a 3-element vector, is the embedding vector of the first word of your vocabulary. The second column is the word embedding vector for the second word, and so on.\n", 119 | "\n", 120 | "The first, second, etc. words are ordered as follows." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": null, 126 | "metadata": {}, 127 | "outputs": [], 128 | "source": [ 129 | "# Print corresponding word for each index within vocabulary's range\n", 130 | "for i in range(V):\n", 131 | " print(Ind2word[i])" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "So the word embedding vectors corresponding to each word are:" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "# Loop through each word of the vocabulary\n", 148 | "for word in word2Ind:\n", 149 | " # Extract the column corresponding to the index of the word in the vocabulary\n", 150 | " word_embedding_vector = W1[:, word2Ind[word]]\n", 151 | " # Print word alongside word embedding vector\n", 152 | " print(f'{word}: {word_embedding_vector}')" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "### Option 2: extract embedding vectors from $\\mathbf{W_2}$" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "The second option is to take $\\mathbf{W_2}$ transposed, and take its columns as the word embedding vectors just like you did for $\\mathbf{W_1}$." 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": null, 172 | "metadata": {}, 173 | "outputs": [], 174 | "source": [ 175 | "# Print transposed W2\n", 176 | "W2.T" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": null, 182 | "metadata": {}, 183 | "outputs": [], 184 | "source": [ 185 | "# Loop through each word of the vocabulary\n", 186 | "for word in word2Ind:\n", 187 | " # Extract the column corresponding to the index of the word in the vocabulary\n", 188 | " word_embedding_vector = W2.T[:, word2Ind[word]]\n", 189 | " # Print word alongside word embedding vector\n", 190 | " print(f'{word}: {word_embedding_vector}')" 191 | ] 192 | }, 193 | { 194 | "cell_type": "markdown", 195 | "metadata": {}, 196 | "source": [ 197 | "### Option 3: extract embedding vectors from $\\mathbf{W_1}$ and $\\mathbf{W_2}$" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "The third option, which is the one you will use in this week's assignment, uses the average of $\\mathbf{W_1}$ and $\\mathbf{W_2^\\top}$." 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "**Calculate the average of $\\mathbf{W_1}$ and $\\mathbf{W_2^\\top}$, and store the result in `W3`.**" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": null, 217 | "metadata": {}, 218 | "outputs": [], 219 | "source": [ 220 | "# Compute W3 as the average of W1 and W2 transposed\n", 221 | "W3 = (W1+W2.T)/2\n", 222 | "\n", 223 | "# Print W3\n", 224 | "W3" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "Expected output:\n", 232 | "\n", 233 | " array([[ 0.09752647, 0.08665397, -0.02389858, 0.1768788 , 0.3764029 ],\n", 234 | " [-0.05136565, 0.15459171, -0.15029611, 0.19580601, -0.31673866],\n", 235 | " [ 0.19974284, -0.03063173, -0.27839106, 0.12353994, -0.04975536]])" 236 | ] 237 | }, 238 | { 239 | "cell_type": "markdown", 240 | "metadata": {}, 241 | "source": [ 242 | "Extracting the word embedding vectors works just like the two previous options, by taking the columns of the matrix you've just created." 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": {}, 249 | "outputs": [], 250 | "source": [ 251 | "# Loop through each word of the vocabulary\n", 252 | "for word in word2Ind:\n", 253 | " # Extract the column corresponding to the index of the word in the vocabulary\n", 254 | " word_embedding_vector = W3[:, word2Ind[word]]\n", 255 | " # Print word alongside word embedding vector\n", 256 | " print(f'{word}: {word_embedding_vector}')" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "Now you know 3 different options to get the word embedding vectors from a model! " 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "### How this practice relates to and differs from the upcoming graded assignment\n", 271 | "\n", 272 | "- After extracting the word embedding vectors, you will use principal component analysis (PCA) to visualize the vectors, which will enable you to perform an intrinsic evaluation of the quality of the vectors, as explained in the lecture." 273 | ] 274 | }, 275 | { 276 | "cell_type": "markdown", 277 | "metadata": {}, 278 | "source": [ 279 | "**Congratulations on finishing all lecture notebooks for this week!** \n", 280 | "\n", 281 | "You're now ready to take on this week's assignment!\n", 282 | "\n", 283 | "**Keep it up!**" 284 | ] 285 | } 286 | ], 287 | "metadata": { 288 | "kernelspec": { 289 | "display_name": "Python 3", 290 | "language": "python", 291 | "name": "python3" 292 | }, 293 | "language_info": { 294 | "codemirror_mode": { 295 | "name": "ipython", 296 | "version": 3 297 | }, 298 | "file_extension": ".py", 299 | "mimetype": "text/x-python", 300 | "name": "python", 301 | "nbconvert_exporter": "python", 302 | "pygments_lexer": "ipython3", 303 | "version": "3.7.1" 304 | } 305 | }, 306 | "nbformat": 4, 307 | "nbformat_minor": 4 308 | } 309 | -------------------------------------------------------------------------------- /NLP with Attention Models/Week_2/C4_W2_lecture_notebook_Attention.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# The Three Ways of Attention and Dot Product Attention: Ungraded Lab Notebook\n", 8 | "\n", 9 | "In this notebook you'll explore the three ways of attention (encoder-decoder attention, causal attention, and bi-directional self attention) and how to implement the latter two with dot product attention. \n", 10 | "\n", 11 | "## Background\n", 12 | "\n", 13 | "As you learned last week, **attention models** constitute powerful tools in the NLP practitioner's toolkit. Like LSTMs, they learn which words are most important to phrases, sentences, paragraphs, and so on. Moreover, they mitigate the vanishing gradient problem even better than LSTMs. You've already seen how to combine attention with LSTMs to build **encoder-decoder models** for applications such as machine translation. \n", 14 | "\n", 15 | "\n", 16 | "\n", 17 | "This week, you'll see how to integrate attention into **transformers**. Because transformers are not sequence models, they are much easier to parallelize and accelerate. Beyond machine translation, applications of transformers include: \n", 18 | "* Auto-completion\n", 19 | "* Named Entity Recognition\n", 20 | "* Chatbots\n", 21 | "* Question-Answering\n", 22 | "* And more!\n", 23 | "\n", 24 | "Along with embedding, positional encoding, dense layers, and residual connections, attention is a crucial component of transformers. At the heart of any attention scheme used in a transformer is **dot product attention**, of which the figures below display a simplified picture:\n", 25 | "\n", 26 | "\n", 27 | "\n", 28 | "\n", 29 | "\n", 30 | "With basic dot product attention, you capture the interactions between every word (embedding) in your query and every word in your key. If the queries and keys belong to the same sentences, this constitutes **bi-directional self-attention**. In some situations, however, it's more appropriate to consider only words which have come before the current one. Such cases, particularly when the queries and keys come from the same sentences, fall into the category of **causal attention**. \n", 31 | "\n", 32 | "\n", 33 | "\n", 34 | "For causal attention, we add a **mask** to the argument of our softmax function, as illustrated below: \n", 35 | "\n", 36 | "\n", 37 | "\n", 38 | "\n", 39 | "\n", 40 | "Now let's see how to implement attention with NumPy. When you integrate attention into a transformer network defined with Trax, you'll have to use `trax.fastmath.numpy` instead, since Trax's arrays are based on JAX DeviceArrays. Fortunately, the function interfaces are often identical." 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "## Imports" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 1, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "import sys\n", 57 | "\n", 58 | "import numpy as np\n", 59 | "import scipy.special\n", 60 | "\n", 61 | "import textwrap\n", 62 | "wrapper = textwrap.TextWrapper(width=70)\n", 63 | "\n", 64 | "# to print the entire np array\n", 65 | "np.set_printoptions(threshold=sys.maxsize)" 66 | ] 67 | }, 68 | { 69 | "cell_type": "markdown", 70 | "metadata": {}, 71 | "source": [ 72 | "Here are some helper functions that will help you create tensors and display useful information:\n", 73 | "\n", 74 | "* `create_tensor()` creates a numpy array from a list of lists.\n", 75 | "* `display_tensor()` prints out the shape and the actual tensor." 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 2, 81 | "metadata": {}, 82 | "outputs": [], 83 | "source": [ 84 | "def create_tensor(t):\n", 85 | " \"\"\"Create tensor from list of lists\"\"\"\n", 86 | " return np.array(t)\n", 87 | "\n", 88 | "\n", 89 | "def display_tensor(t, name):\n", 90 | " \"\"\"Display shape and tensor\"\"\"\n", 91 | " print(f'{name} shape: {t.shape}\\n')\n", 92 | " print(f'{t}\\n')" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "Create some tensors and display their shapes. Feel free to experiment with your own tensors. Keep in mind, though, that the query, key, and value arrays must all have the same embedding dimensions (number of columns), and the mask array must have the same shape as `np.dot(query, key.T)`. " 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 3, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "name": "stdout", 109 | "output_type": "stream", 110 | "text": [ 111 | "query shape: (2, 3)\n", 112 | "\n", 113 | "[[1 0 0]\n", 114 | " [0 1 0]]\n", 115 | "\n", 116 | "key shape: (2, 3)\n", 117 | "\n", 118 | "[[1 2 3]\n", 119 | " [4 5 6]]\n", 120 | "\n", 121 | "value shape: (2, 3)\n", 122 | "\n", 123 | "[[0 1 0]\n", 124 | " [1 0 1]]\n", 125 | "\n", 126 | "mask shape: (2, 2)\n", 127 | "\n", 128 | "[[ 0.e+00 0.e+00]\n", 129 | " [-1.e+09 0.e+00]]\n", 130 | "\n" 131 | ] 132 | } 133 | ], 134 | "source": [ 135 | "q = create_tensor([[1, 0, 0], [0, 1, 0]])\n", 136 | "display_tensor(q, 'query')\n", 137 | "k = create_tensor([[1, 2, 3], [4, 5, 6]])\n", 138 | "display_tensor(k, 'key')\n", 139 | "v = create_tensor([[0, 1, 0], [1, 0, 1]])\n", 140 | "display_tensor(v, 'value')\n", 141 | "m = create_tensor([[0, 0], [-1e9, 0]])\n", 142 | "display_tensor(m, 'mask')" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "## Dot product attention\n", 150 | "\n", 151 | "Here we come to the crux of this lab, in which we compute \n", 152 | "$\\textrm{softmax} \\left(\\frac{Q K^T}{\\sqrt{d}} + M \\right) V$, where the (optional, but default) scaling factor $\\sqrt{d}$ is the square root of the embedding dimension." 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 4, 158 | "metadata": {}, 159 | "outputs": [], 160 | "source": [ 161 | "def DotProductAttention(query, key, value, mask, scale=True):\n", 162 | " \"\"\"Dot product self-attention.\n", 163 | " Args:\n", 164 | " query (numpy.ndarray): array of query representations with shape (L_q by d)\n", 165 | " key (numpy.ndarray): array of key representations with shape (L_k by d)\n", 166 | " value (numpy.ndarray): array of value representations with shape (L_k by d) where L_v = L_k\n", 167 | " mask (numpy.ndarray): attention-mask, gates attention with shape (L_q by L_k)\n", 168 | " scale (bool): whether to scale the dot product of the query and transposed key\n", 169 | "\n", 170 | " Returns:\n", 171 | " numpy.ndarray: Self-attention array for q, k, v arrays. (L_q by L_k)\n", 172 | " \"\"\"\n", 173 | "\n", 174 | " assert query.shape[-1] == key.shape[-1] == value.shape[-1], \"Embedding dimensions of q, k, v aren't all the same\"\n", 175 | "\n", 176 | " # Save depth/dimension of the query embedding for scaling down the dot product\n", 177 | " if scale: \n", 178 | " depth = query.shape[-1]\n", 179 | " else:\n", 180 | " depth = 1\n", 181 | "\n", 182 | " # Calculate scaled query key dot product according to formula above\n", 183 | " dots = np.matmul(query, np.swapaxes(key, -1, -2)) / np.sqrt(depth) \n", 184 | " \n", 185 | " # Apply the mask\n", 186 | " if mask is not None:\n", 187 | " dots = np.where(mask, dots, np.full_like(dots, -1e9)) \n", 188 | " \n", 189 | " # Softmax formula implementation\n", 190 | " # Use scipy.special.logsumexp of masked_qkT to avoid underflow by division by large numbers\n", 191 | " # Note: softmax = e^(dots - logaddexp(dots)) = E^dots / sumexp(dots)\n", 192 | " logsumexp = scipy.special.logsumexp(dots, axis=-1, keepdims=True)\n", 193 | "\n", 194 | " # Take exponential of dots minus logsumexp to get softmax\n", 195 | " # Use np.exp()\n", 196 | " dots = np.exp(dots - logsumexp)\n", 197 | "\n", 198 | " # Multiply dots by value to get self-attention\n", 199 | " # Use np.matmul()\n", 200 | " attention = np.matmul(dots, value)\n", 201 | " \n", 202 | " return attention" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "Now let's implement the *masked* dot product self-attention (at the heart of causal attention) as a special case of dot product attention" 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 5, 215 | "metadata": {}, 216 | "outputs": [], 217 | "source": [ 218 | "def dot_product_self_attention(q, k, v, scale=True):\n", 219 | " \"\"\" Masked dot product self attention.\n", 220 | " Args:\n", 221 | " q (numpy.ndarray): queries.\n", 222 | " k (numpy.ndarray): keys.\n", 223 | " v (numpy.ndarray): values.\n", 224 | " Returns:\n", 225 | " numpy.ndarray: masked dot product self attention tensor.\n", 226 | " \"\"\"\n", 227 | " \n", 228 | " # Size of the penultimate dimension of the query\n", 229 | " mask_size = q.shape[-2]\n", 230 | "\n", 231 | " # Creates a matrix with ones below the diagonal and 0s above. It should have shape (1, mask_size, mask_size)\n", 232 | " # Use np.tril() - Lower triangle of an array and np.ones()\n", 233 | " mask = np.tril(np.ones((1, mask_size, mask_size), dtype=np.bool_), k=0) \n", 234 | " \n", 235 | " return DotProductAttention(q, k, v, mask, scale=scale)" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": 6, 241 | "metadata": {}, 242 | "outputs": [ 243 | { 244 | "data": { 245 | "text/plain": [ 246 | "array([[[0. , 1. , 0. ],\n", 247 | " [0.84967455, 0.15032545, 0.84967455]]])" 248 | ] 249 | }, 250 | "execution_count": 6, 251 | "metadata": {}, 252 | "output_type": "execute_result" 253 | } 254 | ], 255 | "source": [ 256 | "dot_product_self_attention(q, k, v)" 257 | ] 258 | }, 259 | { 260 | "cell_type": "code", 261 | "execution_count": null, 262 | "metadata": {}, 263 | "outputs": [], 264 | "source": [] 265 | } 266 | ], 267 | "metadata": { 268 | "kernelspec": { 269 | "display_name": "Python 3", 270 | "language": "python", 271 | "name": "python3" 272 | }, 273 | "language_info": { 274 | "codemirror_mode": { 275 | "name": "ipython", 276 | "version": 3 277 | }, 278 | "file_extension": ".py", 279 | "mimetype": "text/x-python", 280 | "name": "python", 281 | "nbconvert_exporter": "python", 282 | "pygments_lexer": "ipython3", 283 | "version": "3.7.6" 284 | } 285 | }, 286 | "nbformat": 4, 287 | "nbformat_minor": 4 288 | } 289 | -------------------------------------------------------------------------------- /NLP with Attention Models/Week_3/C4_W3_Assignment_Ungraded_BERT_Loss.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "7yuytuIllsv1" 8 | }, 9 | "source": [ 10 | "# Assignment 3 Ungraded Sections - Part 1: BERT Loss Model \n", 11 | "\n", 12 | "Welcome to the part 1 of testing the models for this week's assignment. We will perform decoding using the BERT Loss model. In this notebook we'll use an input, mask (hide) random word(s) in it and see how well we get the \"Target\" answer(s). \n", 13 | "\n", 14 | "## Colab\n", 15 | "\n", 16 | "Since this ungraded lab takes a lot of time to run on coursera, as an alternative we have a colab prepared for you.\n", 17 | "\n", 18 | "[BERT Loss Model Colab](https://drive.google.com/file/d/1fzaUIYuOmRernN8Lqigd6Du0qzwLkR26/view?usp=sharing)\n", 19 | "\n", 20 | "- If you run into a page that looks similar to the one below, with the option `Open with`, this would mean you need to download the `Colaboratory` app. You can do so by `Open with -> Connect more apps -> in the search bar write \"Colaboratory\" -> install`\n", 21 | "\n", 22 | " \n", 23 | "\n", 24 | "- After installation it should look like this. Click on `Open with Google Colaboratory`\n", 25 | "\n", 26 | " " 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": { 32 | "colab_type": "text", 33 | "id": "Db6LQW5cMSgx" 34 | }, 35 | "source": [ 36 | "## Outline\n", 37 | "\n", 38 | "- [Overview](#0)\n", 39 | "- [Part 1: Getting ready](#1)\n", 40 | "- [Part 2: BERT Loss](#2)\n", 41 | " - [2.1 Decoding](#2.1)" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": { 47 | "colab_type": "text", 48 | "id": "ysxogfC1M158" 49 | }, 50 | "source": [ 51 | "\n", 52 | "### Overview\n", 53 | "\n", 54 | "In this notebook you will:\n", 55 | "* Implement the Bidirectional Encoder Representation from Transformer (BERT) loss. \n", 56 | "* Use a pretrained version of the model you created in the assignment for inference." 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "\n", 64 | "# Part 1: Getting ready\n", 65 | "\n", 66 | "Run the code cells below to import the necessary libraries and to define some functions which will be useful for decoding. The code and the functions are the same as the ones you previsouly ran on the graded assignment." 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": null, 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "import pickle\n", 76 | "import string\n", 77 | "import ast\n", 78 | "import numpy as np\n", 79 | "import trax \n", 80 | "from trax.supervised import decoding\n", 81 | "import textwrap \n", 82 | "\n", 83 | "wrapper = textwrap.TextWrapper(width=70)" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "example_jsons = list(map(ast.literal_eval, open('data.txt')))\n", 93 | "\n", 94 | "natural_language_texts = [example_json['text'] for example_json in example_jsons]\n", 95 | "\n", 96 | "PAD, EOS, UNK = 0, 1, 2\n", 97 | "\n", 98 | "def detokenize(np_array):\n", 99 | " return trax.data.detokenize(\n", 100 | " np_array,\n", 101 | " vocab_type='sentencepiece',\n", 102 | " vocab_file='sentencepiece.model',\n", 103 | " vocab_dir='.')\n", 104 | "\n", 105 | "\n", 106 | "def tokenize(s):\n", 107 | " return next(trax.data.tokenize(\n", 108 | " iter([s]),\n", 109 | " vocab_type='sentencepiece',\n", 110 | " vocab_file='sentencepiece.model',\n", 111 | " vocab_dir='.'))\n", 112 | " \n", 113 | " \n", 114 | "vocab_size = trax.data.vocab_size(\n", 115 | " vocab_type='sentencepiece',\n", 116 | " vocab_file='sentencepiece.model',\n", 117 | " vocab_dir='.')\n", 118 | "\n", 119 | "\n", 120 | "def get_sentinels(vocab_size, display=False):\n", 121 | " sentinels = {}\n", 122 | " for i, char in enumerate(reversed(string.ascii_letters), 1):\n", 123 | " decoded_text = detokenize([vocab_size - i]) \n", 124 | " # Sentinels, ex: - \n", 125 | " sentinels[decoded_text] = f'<{char}>' \n", 126 | " if display:\n", 127 | " print(f'The sentinel is <{char}> and the decoded token is:', decoded_text)\n", 128 | " return sentinels\n", 129 | "\n", 130 | "\n", 131 | "sentinels = get_sentinels(vocab_size, display=False) \n", 132 | "\n", 133 | "\n", 134 | "def pretty_decode(encoded_str_list, sentinels=sentinels):\n", 135 | " # If already a string, just do the replacements.\n", 136 | " if isinstance(encoded_str_list, (str, bytes)):\n", 137 | " for token, char in sentinels.items():\n", 138 | " encoded_str_list = encoded_str_list.replace(token, char)\n", 139 | " return encoded_str_list\n", 140 | " \n", 141 | " # We need to decode and then prettyfy it.\n", 142 | " return pretty_decode(detokenize(encoded_str_list))\n", 143 | "\n", 144 | "\n", 145 | "inputs_targets_pairs = []\n", 146 | "\n", 147 | "# here you are reading already computed input/target pairs from a file\n", 148 | "with open ('inputs_targets_pairs_file.txt', 'rb') as fp:\n", 149 | " inputs_targets_pairs = pickle.load(fp) \n", 150 | "\n", 151 | "\n", 152 | "def display_input_target_pairs(inputs_targets_pairs):\n", 153 | " for i, inp_tgt_pair in enumerate(inputs_targets_pairs, 1):\n", 154 | " inps, tgts = inp_tgt_pair\n", 155 | " inps, tgts = pretty_decode(inps), pretty_decode(tgts)\n", 156 | " print(f'[{i}]\\n'\n", 157 | " f'inputs:\\n{wrapper.fill(text=inps)}\\n\\n'\n", 158 | " f'targets:\\n{wrapper.fill(text=tgts)}\\n\\n\\n\\n')\n", 159 | " \n", 160 | "display_input_target_pairs(inputs_targets_pairs) " 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "\n", 168 | "# Part 2: BERT Loss\n", 169 | "\n", 170 | "Now that you created the encoder, we will not make you train it. Training it could easily cost you a few days depending on which GPUs/TPUs you are using. Very few people train the full transformer from scratch. Instead, what the majority of people do, they load in a pretrained model, and they fine tune it on a specific task. That is exactly what you are about to do. Let's start by initializing and then loading in the model. \n", 171 | "\n", 172 | "Initialize the model from the saved checkpoint." 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": null, 178 | "metadata": {}, 179 | "outputs": [], 180 | "source": [ 181 | "# Initializing the model\n", 182 | "model = trax.models.Transformer(\n", 183 | " d_ff = 4096,\n", 184 | " d_model = 1024,\n", 185 | " max_len = 2048,\n", 186 | " n_heads = 16,\n", 187 | " dropout = 0.1,\n", 188 | " input_vocab_size = 32000,\n", 189 | " n_encoder_layers = 24,\n", 190 | " n_decoder_layers = 24,\n", 191 | " mode='predict')" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": {}, 198 | "outputs": [], 199 | "source": [ 200 | "# Now load in the model\n", 201 | "# this takes about 1 minute\n", 202 | "shape11 = trax.shapes.ShapeDtype((1, 1), dtype=np.int32) # Needed in predict mode.\n", 203 | "model.init_from_file('model.pkl.gz',\n", 204 | " weights_only=True, input_signature=(shape11, shape11))" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": { 210 | "colab_type": "text", 211 | "id": "HuTyft5EBQK6" 212 | }, 213 | "source": [ 214 | "\n", 215 | "### 2.1 Decoding\n", 216 | "\n", 217 | "Now you will use one of the `inputs_targets_pairs` for input and as target. Next you will use the `pretty_decode` to output the input and target. The code to perform all of this has been provided below." 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": null, 223 | "metadata": { 224 | "colab": { 225 | "base_uri": "https://localhost:8080/", 226 | "height": 139 227 | }, 228 | "colab_type": "code", 229 | "id": "gPggKamNBZxJ", 230 | "outputId": "4514c865-7534-4ce8-a339-2a4030bc6fb5" 231 | }, 232 | "outputs": [], 233 | "source": [ 234 | "# using the 3rd example\n", 235 | "c4_input = inputs_targets_pairs[2][0]\n", 236 | "c4_target = inputs_targets_pairs[2][1]\n", 237 | "\n", 238 | "print('pretty_decoded input: \\n\\n', pretty_decode(c4_input))\n", 239 | "print('\\npretty_decoded target: \\n\\n', pretty_decode(c4_target))\n", 240 | "print('\\nc4_input:\\n\\n', c4_input)\n", 241 | "print('\\nc4_target:\\n\\n', c4_target)\n", 242 | "print(len(c4_target))\n", 243 | "print(len(pretty_decode(c4_target)))" 244 | ] 245 | }, 246 | { 247 | "cell_type": "markdown", 248 | "metadata": {}, 249 | "source": [ 250 | "Run the cell below to decode.\n", 251 | "\n", 252 | "### Note: This will take some time to run" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": { 259 | "colab": { 260 | "base_uri": "https://localhost:8080/", 261 | "height": 477 262 | }, 263 | "colab_type": "code", 264 | "id": "-I12YqxMTwgo", 265 | "outputId": "4e2399fa-7cbd-4ae3-8cee-6c97cbd277af", 266 | "scrolled": true 267 | }, 268 | "outputs": [], 269 | "source": [ 270 | "# Temperature is a parameter for sampling.\n", 271 | "# # * 0.0: same as argmax, always pick the most probable token\n", 272 | "# # * 1.0: sampling from the distribution (can sometimes say random things)\n", 273 | "# # * values inbetween can trade off diversity and quality, try it out!\n", 274 | "output = decoding.autoregressive_sample(model, inputs=np.array(c4_input)[None, :],\n", 275 | " temperature=0.0, max_length=5) # originally max_length = 50\n", 276 | "print(wrapper.fill(pretty_decode(output[0])))" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": { 282 | "colab": {}, 283 | "colab_type": "code", 284 | "id": "PJh_Qw9G5jND" 285 | }, 286 | "source": [ 287 | "At this point the RAM is almost full, this happens because the model and the decoding is memory heavy. You can run decoding just once. Running it the second time with another example might give you an answer that makes no sense, or repetitive words. If that happens restart the runtime (see how to at the start of the notebook) and run all the cells again.\n", 288 | "\n", 289 | "You should also be aware that the quality of the decoding is not very good because max_length was downsized from 50 to 5 so that this runs faster within this environment. The colab version uses the original max_length so check that one for the actual decoding." 290 | ] 291 | } 292 | ], 293 | "metadata": { 294 | "accelerator": "GPU", 295 | "colab": { 296 | "collapsed_sections": [], 297 | "name": "C4W3-solutions.ipynb", 298 | "provenance": [], 299 | "toc_visible": true 300 | }, 301 | "kernelspec": { 302 | "display_name": "Python 3", 303 | "language": "python", 304 | "name": "python3" 305 | }, 306 | "language_info": { 307 | "codemirror_mode": { 308 | "name": "ipython", 309 | "version": 3 310 | }, 311 | "file_extension": ".py", 312 | "mimetype": "text/x-python", 313 | "name": "python", 314 | "nbconvert_exporter": "python", 315 | "pygments_lexer": "ipython3", 316 | "version": "3.7.6" 317 | } 318 | }, 319 | "nbformat": 4, 320 | "nbformat_minor": 4 321 | } 322 | -------------------------------------------------------------------------------- /NLP with Sequence Models/Week_4/C3_W4_lecture_notebook_accuracy.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Evaluate a Siamese model: Ungraded Lecture Notebook" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [ 15 | { 16 | "name": "stdout", 17 | "output_type": "stream", 18 | "text": [ 19 | "INFO:tensorflow:tokens_length=568 inputs_length=512 targets_length=114 noise_density=0.15 mean_noise_span_length=3.0 \n" 20 | ] 21 | } 22 | ], 23 | "source": [ 24 | "import trax.fastmath.numpy as np" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## Inspecting the necessary elements" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "In this lecture notebook you will learn how to evaluate a Siamese model using the accuracy metric. Because there are many steps before evaluating a Siamese network (as you will see in this week's assignment) the necessary elements and variables are replicated here using real data from the assignment:\n", 39 | "\n", 40 | " - `q1`: vector with dimension `(batch_size X max_length)` containing first questions to compare in the test set.\n", 41 | " - `q2`: vector with dimension `(batch_size X max_length)` containing second questions to compare in the test set.\n", 42 | " \n", 43 | " **Notice that for each pair of vectors within a batch $([q1_1, q1_2, q1_3, ...]$, $[q2_1, q2_2,q2_3, ...])$ $q1_i$ is associated to $q2_k$.**\n", 44 | " \n", 45 | " \n", 46 | " - `y_test`: 1 if $q1_i$ and $q2_k$ are duplicates, 0 otherwise.\n", 47 | " \n", 48 | " - `v1`: output vector from the model's prediction associated with the first questions.\n", 49 | " - `v2`: output vector from the model's prediction associated with the second questions." 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "You can inspect each one of these variables by running the following cells:" 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": 2, 62 | "metadata": {}, 63 | "outputs": [ 64 | { 65 | "name": "stdout", 66 | "output_type": "stream", 67 | "text": [ 68 | "q1 has shape: (512, 64) \n", 69 | "\n", 70 | "And it looks like this: \n", 71 | "\n", 72 | " [[ 32 38 4 ... 1 1 1]\n", 73 | " [ 30 156 78 ... 1 1 1]\n", 74 | " [ 32 38 4 ... 1 1 1]\n", 75 | " ...\n", 76 | " [ 32 33 4 ... 1 1 1]\n", 77 | " [ 30 156 317 ... 1 1 1]\n", 78 | " [ 30 156 6 ... 1 1 1]]\n", 79 | "\n", 80 | "\n" 81 | ] 82 | } 83 | ], 84 | "source": [ 85 | "q1 = np.load('q1.npy')\n", 86 | "print(f'q1 has shape: {q1.shape} \\n\\nAnd it looks like this: \\n\\n {q1}\\n\\n')" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "Notice those 1s on the right-hand side? \n", 94 | "\n", 95 | "Hope you remember that the value of `1` was used for padding. " 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 3, 101 | "metadata": {}, 102 | "outputs": [ 103 | { 104 | "name": "stdout", 105 | "output_type": "stream", 106 | "text": [ 107 | "q2 has shape: (512, 64) \n", 108 | "\n", 109 | "And looks like this: \n", 110 | "\n", 111 | " [[ 30 156 78 ... 1 1 1]\n", 112 | " [ 283 156 78 ... 1 1 1]\n", 113 | " [ 32 38 4 ... 1 1 1]\n", 114 | " ...\n", 115 | " [ 32 33 4 ... 1 1 1]\n", 116 | " [ 30 156 78 ... 1 1 1]\n", 117 | " [ 30 156 10596 ... 1 1 1]]\n", 118 | "\n", 119 | "\n" 120 | ] 121 | } 122 | ], 123 | "source": [ 124 | "q2 = np.load('q2.npy')\n", 125 | "print(f'q2 has shape: {q2.shape} \\n\\nAnd looks like this: \\n\\n {q2}\\n\\n')" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 4, 131 | "metadata": {}, 132 | "outputs": [ 133 | { 134 | "name": "stdout", 135 | "output_type": "stream", 136 | "text": [ 137 | "y_test has shape: (512,) \n", 138 | "\n", 139 | "And looks like this: \n", 140 | "\n", 141 | " [0 1 1 0 0 0 0 1 0 1 1 0 0 0 1 1 1 0 1 1 0 0 0 0 1 1 0 0 0 0 1 0 1 1 0 0 0\n", 142 | " 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 0 1 0 0 0 1 0 1 1 1 0 0 0 1 0 1 0\n", 143 | " 0 0 0 1 0 0 1 1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 1 1 1 0 1 0 1 1 0 0\n", 144 | " 0 1 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 0 0 1 1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0\n", 145 | " 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 1 0 1 1 1\n", 146 | " 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 1 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1\n", 147 | " 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0\n", 148 | " 1 0 0 0 0 1 0 0 0 0 1 0 1 0 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1\n", 149 | " 1 0 1 1 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 0 1 0 1 1 1 0 0\n", 150 | " 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 0\n", 151 | " 0 0 1 0 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0\n", 152 | " 1 1 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1\n", 153 | " 0 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1\n", 154 | " 1 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n", 155 | "\n", 156 | "\n" 157 | ] 158 | } 159 | ], 160 | "source": [ 161 | "y_test = np.load('y_test.npy')\n", 162 | "print(f'y_test has shape: {y_test.shape} \\n\\nAnd looks like this: \\n\\n {y_test}\\n\\n')" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 5, 168 | "metadata": {}, 169 | "outputs": [ 170 | { 171 | "name": "stdout", 172 | "output_type": "stream", 173 | "text": [ 174 | "v1 has shape: (512, 128) \n", 175 | "\n", 176 | "And looks like this: \n", 177 | "\n", 178 | " [[ 0.01273625 -0.1496373 -0.01982759 ... 0.02205012 -0.00169148\n", 179 | " -0.01598107]\n", 180 | " [-0.05592084 0.05792497 -0.02226785 ... 0.08156938 -0.02570007\n", 181 | " -0.00503111]\n", 182 | " [ 0.05686752 0.0294889 0.04522024 ... 0.03141788 -0.08459651\n", 183 | " -0.00968536]\n", 184 | " ...\n", 185 | " [ 0.15115018 0.17791134 0.02200656 ... -0.00851707 0.00571415\n", 186 | " -0.00431194]\n", 187 | " [ 0.06995274 0.13110274 0.0202337 ... -0.00902792 -0.01221745\n", 188 | " 0.00505962]\n", 189 | " [-0.16043712 -0.11899089 -0.15950686 ... 0.06544471 -0.01208312\n", 190 | " -0.01183368]]\n", 191 | "\n", 192 | "\n", 193 | "v2 has shape: (512, 128) \n", 194 | "\n", 195 | "And looks like this: \n", 196 | "\n", 197 | " [[ 0.07437647 0.02804951 -0.02974014 ... 0.02378932 -0.01696189\n", 198 | " -0.01897198]\n", 199 | " [ 0.03270066 0.15122835 -0.02175895 ... 0.00517202 -0.14617395\n", 200 | " 0.00204823]\n", 201 | " [ 0.05635608 0.05454165 0.042222 ... 0.03831453 -0.05387777\n", 202 | " -0.01447786]\n", 203 | " ...\n", 204 | " [ 0.04727105 -0.06748016 0.04194937 ... 0.07600753 -0.03072828\n", 205 | " 0.00400715]\n", 206 | " [ 0.00269269 0.15222628 0.01714724 ... 0.01482705 -0.0197884\n", 207 | " 0.01389528]\n", 208 | " [-0.15475044 -0.15718803 -0.14732707 ... 0.04299919 -0.01070975\n", 209 | " -0.01318042]]\n", 210 | "\n", 211 | "\n" 212 | ] 213 | } 214 | ], 215 | "source": [ 216 | "v1 = np.load('v1.npy')\n", 217 | "print(f'v1 has shape: {v1.shape} \\n\\nAnd looks like this: \\n\\n {v1}\\n\\n')\n", 218 | "v2 = np.load('v2.npy')\n", 219 | "print(f'v2 has shape: {v2.shape} \\n\\nAnd looks like this: \\n\\n {v2}\\n\\n')" 220 | ] 221 | }, 222 | { 223 | "cell_type": "markdown", 224 | "metadata": {}, 225 | "source": [ 226 | "## Calculating the accuracy\n", 227 | "\n", 228 | "You will calculate the accuracy by iterating over the test set and checking if the model predicts right or wrong.\n", 229 | "\n", 230 | "The first step is to set the accuracy to zero:" 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 6, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "accuracy = 0" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "You will also need the `batch size` and the `threshold` that determines if two questions are the same or not. \n", 247 | "\n", 248 | "**Note :A higher threshold means that only very similar questions will be considered as the same question.**" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 7, 254 | "metadata": {}, 255 | "outputs": [], 256 | "source": [ 257 | "batch_size = 512 # Note: The max it can be is y_test.shape[0] i.e all the samples in test data\n", 258 | "threshold = 0.7 # You can play around with threshold and then see the change in accuracy.\n" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "In the assignment you will iterate over multiple batches of data but since this is a simplified version only one batch is provided. \n", 266 | "\n", 267 | "**Note: Be careful with the indices when slicing the test data in the assignment!**\n", 268 | "\n", 269 | "The process is pretty straightforward:\n", 270 | " - Iterate over each one of the elements in the batch\n", 271 | " - Compute the cosine similarity between the predictions\n", 272 | " - For computing the cosine similarity, the two output vectors should have been normalized using L2 normalization meaning their magnitude will be 1. This has been taken care off by the Siamese network you will build in the assignment. Hence the cosine similarity here is just dot product between two vectors. You can check by implementing the usual cosine similarity formula and check if this holds or not.\n", 273 | " - Determine if this value is greater than the threshold (If it is, consider the two questions as the same and return 1 else 0)\n", 274 | " - Compare against the actual target and if the prediction matches, add 1 to the accuracy (increment the correct prediction counter)\n", 275 | " - Divide the accuracy by the number of processed elements" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 8, 281 | "metadata": {}, 282 | "outputs": [], 283 | "source": [ 284 | "for j in range(batch_size): # Iterate over each one of the elements in the batch\n", 285 | " \n", 286 | " d = np.dot(v1[j],v2[j]) # Compute the cosine similarity between the predictions as l2 normalized, ||v1[j]||==||v2[j]||==1 so only dot product is needed\n", 287 | " res = d > threshold # Determine if this value is greater than the threshold (if it is consider the two questions as the same)\n", 288 | " accuracy += (y_test[j] == res) # Compare against the actual target and if the prediction matches, add 1 to the accuracy\n", 289 | "\n", 290 | "accuracy = accuracy / batch_size # Divide the accuracy by the number of processed elements" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 9, 296 | "metadata": {}, 297 | "outputs": [ 298 | { 299 | "name": "stdout", 300 | "output_type": "stream", 301 | "text": [ 302 | "The accuracy of the model is: 0.7421875\n" 303 | ] 304 | } 305 | ], 306 | "source": [ 307 | "print(f'The accuracy of the model is: {accuracy}')" 308 | ] 309 | }, 310 | { 311 | "cell_type": "markdown", 312 | "metadata": {}, 313 | "source": [ 314 | "**Congratulations on finishing this lecture notebook!** \n", 315 | "\n", 316 | "Now you should have a clearer understanding of how to evaluate your Siamese language models using the accuracy metric. \n", 317 | "\n", 318 | "**Keep it up!**" 319 | ] 320 | } 321 | ], 322 | "metadata": { 323 | "kernelspec": { 324 | "display_name": "Python 3", 325 | "language": "python", 326 | "name": "python3" 327 | }, 328 | "language_info": { 329 | "codemirror_mode": { 330 | "name": "ipython", 331 | "version": 3 332 | }, 333 | "file_extension": ".py", 334 | "mimetype": "text/x-python", 335 | "name": "python", 336 | "nbconvert_exporter": "python", 337 | "pygments_lexer": "ipython3", 338 | "version": "3.7.1" 339 | } 340 | }, 341 | "nbformat": 4, 342 | "nbformat_minor": 4 343 | } 344 | -------------------------------------------------------------------------------- /NLP with Attention Models/Week_3/C4_W3_Assignment_Ungraded_T5.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Assignment 3 Ungraded Sections - Part 2: T5 SQuAD Model\n", 8 | "\n", 9 | "Welcome to the part 2 of testing the models for this week's assignment. This time we will perform decoding using the T5 SQuAD model. In this notebook we'll perform Question Answering by providing a \"Question\", its \"Context\" and see how well we get the \"Target\" answer. \n", 10 | "\n", 11 | "## Colab\n", 12 | "\n", 13 | "Since this ungraded lab takes a lot of time to run on coursera, as an alternative we have a colab prepared for you.\n", 14 | "\n", 15 | "[T5 SQuAD Model Colab](https://drive.google.com/file/d/1crITaM_gn6VGFLW70fUuWjAuwQefoBSw/view?usp=sharing)\n", 16 | "\n", 17 | "- If you run into a page that looks similar to the one below, with the option `Open with`, this would mean you need to download the `Colaboratory` app. You can do so by `Open with -> Connect more apps -> in the search bar write \"Colaboratory\" -> install`\n", 18 | "\n", 19 | " \n", 20 | "\n", 21 | "- After installation it should look like this. Click on `Open with Google Colaboratory`\n", 22 | "\n", 23 | " " 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "metadata": {}, 29 | "source": [ 30 | "## Outline\n", 31 | "\n", 32 | "- [Overview](#0)\n", 33 | "- [Part 1: Resuming the assignment (T5 SQuAD Model)](#1)\n", 34 | "- [Part 2: Fine-tuning on SQuAD](#2)\n", 35 | " - [2.1 Loading in the data and preprocessing](#2.1)\n", 36 | " - [2.2 Decoding from a fine-tuned model](#2.2)" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": {}, 42 | "source": [ 43 | "\n", 44 | "### Overview\n", 45 | "\n", 46 | "In this notebook you will:\n", 47 | "* Implement the Bidirectional Encoder Representation from Transformer (BERT) loss. \n", 48 | "* Use a pretrained version of the model you created in the assignment for inference." 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "\n", 56 | "# Part 1: Getting ready\n", 57 | "\n", 58 | "Run the code cells below to import the necessary libraries and to define some functions which will be useful for decoding. The code and the functions are the same as the ones you previsouly ran on the graded assignment." 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "import string\n", 68 | "import t5\n", 69 | "import numpy as np\n", 70 | "import trax \n", 71 | "from trax.supervised import decoding\n", 72 | "import textwrap \n", 73 | "\n", 74 | "wrapper = textwrap.TextWrapper(width=70)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": null, 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "PAD, EOS, UNK = 0, 1, 2\n", 84 | "\n", 85 | "\n", 86 | "def detokenize(np_array):\n", 87 | " return trax.data.detokenize(\n", 88 | " np_array,\n", 89 | " vocab_type='sentencepiece',\n", 90 | " vocab_file='sentencepiece.model',\n", 91 | " vocab_dir='.')\n", 92 | "\n", 93 | "\n", 94 | "def tokenize(s):\n", 95 | " return next(trax.data.tokenize(\n", 96 | " iter([s]),\n", 97 | " vocab_type='sentencepiece',\n", 98 | " vocab_file='sentencepiece.model',\n", 99 | " vocab_dir='.'))\n", 100 | " \n", 101 | " \n", 102 | "vocab_size = trax.data.vocab_size(\n", 103 | " vocab_type='sentencepiece',\n", 104 | " vocab_file='sentencepiece.model',\n", 105 | " vocab_dir='.')\n", 106 | "\n", 107 | "\n", 108 | "def get_sentinels(vocab_size, display=False):\n", 109 | " sentinels = {}\n", 110 | " for i, char in enumerate(reversed(string.ascii_letters), 1):\n", 111 | " decoded_text = detokenize([vocab_size - i]) \n", 112 | " # Sentinels, ex: - \n", 113 | " sentinels[decoded_text] = f'<{char}>' \n", 114 | " if display:\n", 115 | " print(f'The sentinel is <{char}> and the decoded token is:', decoded_text)\n", 116 | " return sentinels\n", 117 | "\n", 118 | "\n", 119 | "sentinels = get_sentinels(vocab_size, display=False) \n", 120 | "\n", 121 | "\n", 122 | "def pretty_decode(encoded_str_list, sentinels=sentinels):\n", 123 | " # If already a string, just do the replacements.\n", 124 | " if isinstance(encoded_str_list, (str, bytes)):\n", 125 | " for token, char in sentinels.items():\n", 126 | " encoded_str_list = encoded_str_list.replace(token, char)\n", 127 | " return encoded_str_list\n", 128 | " \n", 129 | " # We need to decode and then prettyfy it.\n", 130 | " return pretty_decode(detokenize(encoded_str_list))" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": { 136 | "colab_type": "text", 137 | "id": "HEoSSKNwgDVA" 138 | }, 139 | "source": [ 140 | "\n", 141 | "# Part 2: Fine-tuning on SQuAD\n", 142 | "\n", 143 | "Now let's try to fine tune on SQuAD and see what becomes of the model.For this, we need to write a function that will create and process the SQuAD `tf.data.Dataset`. Below is how T5 pre-processes SQuAD dataset as a text2text example. Before we jump in, we will have to first load in the data. \n", 144 | "\n", 145 | "\n", 146 | "### 2.1 Loading in the data and preprocessing\n", 147 | "\n", 148 | "You first start by loading in the dataset. The text2text example for a SQuAD example looks like:\n", 149 | "\n", 150 | "```json\n", 151 | "{\n", 152 | " 'inputs': 'question: context:
',\n", 153 | " 'targets': '',\n", 154 | "}\n", 155 | "```\n", 156 | "\n", 157 | "The squad pre-processing function takes in the dataset and processes it using the sentencePiece vocabulary you have seen above. It generates the features from the vocab and encodes the string features. It takes on question, context, and answer, and returns \"question: Q context: C\" as input and \"A\" as target." 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": null, 163 | "metadata": { 164 | "colab": {}, 165 | "colab_type": "code", 166 | "id": "RcdR5Dh9UVEw" 167 | }, 168 | "outputs": [], 169 | "source": [ 170 | "# Retrieve Question, C, A and return \"question: Q context: C\" as input and \"A\" as target.\n", 171 | "def squad_preprocess_fn(dataset, mode='train'):\n", 172 | " return t5.data.preprocessors.squad(dataset)" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": null, 178 | "metadata": {}, 179 | "outputs": [], 180 | "source": [ 181 | "# train generator, this takes about 1 minute\n", 182 | "train_generator_fn, eval_generator_fn = trax.data.tf_inputs.data_streams(\n", 183 | " 'squad/plain_text:1.0.0',\n", 184 | " data_dir='data/',\n", 185 | " bare_preprocess_fn=squad_preprocess_fn,\n", 186 | " input_name='inputs',\n", 187 | " target_name='targets'\n", 188 | ")\n", 189 | "\n", 190 | "train_generator = train_generator_fn()\n", 191 | "next(train_generator)" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": null, 197 | "metadata": { 198 | "colab": {}, 199 | "colab_type": "code", 200 | "id": "QGQsExH8xv40" 201 | }, 202 | "outputs": [], 203 | "source": [ 204 | "#print example from train_generator\n", 205 | "(inp, out) = next(train_generator)\n", 206 | "print(inp.decode('utf8').split('context:')[0])\n", 207 | "print()\n", 208 | "print('context:', inp.decode('utf8').split('context:')[1])\n", 209 | "print()\n", 210 | "print('target:', out.decode('utf8'))" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": { 216 | "colab_type": "text", 217 | "id": "cC3JaiSMpWma" 218 | }, 219 | "source": [ 220 | "\n", 221 | "### 2.2 Decoding from a fine-tuned model\n", 222 | "\n", 223 | "You will now use an existing model that we trained for you. You will initialize, then load in your model, and then try with your own input. " 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": null, 229 | "metadata": {}, 230 | "outputs": [], 231 | "source": [ 232 | "# Initialize the model \n", 233 | "model = trax.models.Transformer(\n", 234 | " d_ff = 4096,\n", 235 | " d_model = 1024,\n", 236 | " max_len = 2048,\n", 237 | " n_heads = 16,\n", 238 | " dropout = 0.1,\n", 239 | " input_vocab_size = 32000,\n", 240 | " n_encoder_layers = 24,\n", 241 | " n_decoder_layers = 24,\n", 242 | " mode='predict') # Change to 'eval' for slow decoding." 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": {}, 249 | "outputs": [], 250 | "source": [ 251 | "# load in the model\n", 252 | "# this will take a minute\n", 253 | "shape11 = trax.shapes.ShapeDtype((1, 1), dtype=np.int32)\n", 254 | "model.init_from_file('model_squad.pkl.gz',\n", 255 | " weights_only=True, input_signature=(shape11, shape11))" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": null, 261 | "metadata": { 262 | "colab": {}, 263 | "colab_type": "code", 264 | "id": "FdGy_pHJGEF6" 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "# create inputs\n", 269 | "# a simple example \n", 270 | "# inputs = 'question: She asked him where is john? context: John was at the game'\n", 271 | "\n", 272 | "# an extensive example\n", 273 | "inputs = 'question: What are some of the colours of a rose? context: A rose is a woody perennial flowering plant of the genus Rosa, in the family Rosaceae, or the flower it bears.There are over three hundred species and tens of thousands of cultivars. They form a group of plants that can be erect shrubs, climbing, or trailing, with stems that are often armed with sharp prickles. Flowers vary in size and shape and are usually large and showy, in colours ranging from white through yellows and reds. Most species are native to Asia, with smaller numbers native to Europe, North America, and northwestern Africa. Species, cultivars and hybrids are all widely grown for their beauty and often are fragrant.'" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [ 282 | "# tokenizing the input so we could feed it for decoding\n", 283 | "print(tokenize(inputs))\n", 284 | "test_inputs = tokenize(inputs) " 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "Run the cell below to decode.\n", 292 | "\n", 293 | "### Note: This will take some time to run" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": { 300 | "colab": {}, 301 | "colab_type": "code", 302 | "id": "c_CwYjXHIQOJ" 303 | }, 304 | "outputs": [], 305 | "source": [ 306 | "# Temperature is a parameter for sampling.\n", 307 | "# # * 0.0: same as argmax, always pick the most probable token\n", 308 | "# # * 1.0: sampling from the distribution (can sometimes say random things)\n", 309 | "# # * values inbetween can trade off diversity and quality, try it out!\n", 310 | "output = decoding.autoregressive_sample(model, inputs=np.array(test_inputs)[None, :],\n", 311 | " temperature=0.0, max_length=5) # originally max_length=10\n", 312 | "print(wrapper.fill(pretty_decode(output[0])))" 313 | ] 314 | }, 315 | { 316 | "cell_type": "markdown", 317 | "metadata": {}, 318 | "source": [ 319 | "You should also be aware that the quality of the decoding is not very good because max_length was downsized from 10 to 5 so that this runs faster within this environment. The colab version uses the original max_length so check that one for the actual decoding." 320 | ] 321 | } 322 | ], 323 | "metadata": { 324 | "accelerator": "GPU", 325 | "colab": { 326 | "collapsed_sections": [], 327 | "name": "C4W3-solutions.ipynb", 328 | "provenance": [], 329 | "toc_visible": true 330 | }, 331 | "kernelspec": { 332 | "display_name": "Python 3", 333 | "language": "python", 334 | "name": "python3" 335 | }, 336 | "language_info": { 337 | "codemirror_mode": { 338 | "name": "ipython", 339 | "version": 3 340 | }, 341 | "file_extension": ".py", 342 | "mimetype": "text/x-python", 343 | "name": "python", 344 | "nbconvert_exporter": "python", 345 | "pygments_lexer": "ipython3", 346 | "version": "3.7.6" 347 | } 348 | }, 349 | "nbformat": 4, 350 | "nbformat_minor": 4 351 | } 352 | -------------------------------------------------------------------------------- /NLP with Sequence Models/Week_2/C3_W2_lecture_notebook_perplexity.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Working with JAX numpy and calculating perplexity: Ungraded Lecture Notebook" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Normally you would import `numpy` and rename it as `np`. \n", 15 | "\n", 16 | "However in this week's assignment you will notice that this convention has been changed. \n", 17 | "\n", 18 | "Now standard `numpy` is not renamed and `trax.fastmath.numpy` is renamed as `np`. \n", 19 | "\n", 20 | "The rationale behind this change is that you will be using Trax's numpy (which is compatible with JAX) far more often. Trax's numpy supports most of the same functions as the regular numpy so the change won't be noticeable in most cases.\n" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "metadata": {}, 27 | "outputs": [ 28 | { 29 | "name": "stdout", 30 | "output_type": "stream", 31 | "text": [ 32 | "INFO:tensorflow:tokens_length=568 inputs_length=512 targets_length=114 noise_density=0.15 mean_noise_span_length=3.0 \n" 33 | ] 34 | } 35 | ], 36 | "source": [ 37 | "import numpy\n", 38 | "import trax\n", 39 | "import trax.fastmath.numpy as np\n", 40 | "\n", 41 | "# Setting random seeds\n", 42 | "trax.supervised.trainer_lib.init_random_number_generators(32)\n", 43 | "numpy.random.seed(32)" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "One important change to take into consideration is that the types of the resulting objects will be different depending on the version of numpy. With regular numpy you get `numpy.ndarray` but with Trax's numpy you will get `jax.interpreters.xla.DeviceArray`. These two types map to each other. So if you find some error logs mentioning DeviceArray type, don't worry about it, treat it like you would treat an ndarray and march ahead.\n", 51 | "\n", 52 | "You can get a randomized numpy array by using the `numpy.random.random()` function.\n", 53 | "\n", 54 | "This is one of the functionalities that Trax's numpy does not currently support in the same way as the regular numpy. " 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 2, 60 | "metadata": {}, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "The regular numpy array looks like this:\n", 67 | "\n", 68 | " [[0.85888927 0.37271115 0.55512878 0.95565655 0.7366696 0.81620514\n", 69 | " 0.10108656 0.92848807 0.60910917 0.59655344]\n", 70 | " [0.09178413 0.34518624 0.66275252 0.44171349 0.55148779 0.70371249\n", 71 | " 0.58940123 0.04993276 0.56179184 0.76635847]\n", 72 | " [0.91090833 0.09290995 0.90252139 0.46096041 0.45201847 0.99942549\n", 73 | " 0.16242374 0.70937058 0.16062408 0.81077677]\n", 74 | " [0.03514717 0.53488673 0.16650012 0.30841038 0.04506241 0.23857613\n", 75 | " 0.67483453 0.78238275 0.69520163 0.32895445]\n", 76 | " [0.49403187 0.52412136 0.29854125 0.46310814 0.98478429 0.50113492\n", 77 | " 0.39807245 0.72790532 0.86333097 0.02616954]]\n", 78 | "\n", 79 | "It is of type: \n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "numpy_array = numpy.random.random((5,10))\n", 85 | "print(f\"The regular numpy array looks like this:\\n\\n {numpy_array}\\n\")\n", 86 | "print(f\"It is of type: {type(numpy_array)}\")" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "You can easily cast regular numpy arrays or lists into trax numpy arrays using the `trax.fastmath.numpy.array()` function:" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 3, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "name": "stdout", 103 | "output_type": "stream", 104 | "text": [ 105 | "The trax numpy array looks like this:\n", 106 | "\n", 107 | " [[0.8588893 0.37271115 0.55512875 0.9556565 0.7366696 0.81620514\n", 108 | " 0.10108656 0.9284881 0.60910916 0.59655344]\n", 109 | " [0.09178413 0.34518623 0.6627525 0.44171348 0.5514878 0.70371246\n", 110 | " 0.58940125 0.04993276 0.56179184 0.7663585 ]\n", 111 | " [0.91090834 0.09290995 0.9025214 0.46096042 0.45201847 0.9994255\n", 112 | " 0.16242374 0.7093706 0.16062407 0.81077677]\n", 113 | " [0.03514718 0.5348867 0.16650012 0.30841038 0.04506241 0.23857613\n", 114 | " 0.67483455 0.7823827 0.69520164 0.32895446]\n", 115 | " [0.49403188 0.52412134 0.29854125 0.46310815 0.9847843 0.50113493\n", 116 | " 0.39807245 0.72790533 0.86333096 0.02616954]]\n", 117 | "\n", 118 | "It is of type: \n" 119 | ] 120 | } 121 | ], 122 | "source": [ 123 | "trax_numpy_array = np.array(numpy_array)\n", 124 | "print(f\"The trax numpy array looks like this:\\n\\n {trax_numpy_array}\\n\")\n", 125 | "print(f\"It is of type: {type(trax_numpy_array)}\")\n", 126 | "\n" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "Hope you now understand the differences (and similarities) between these two versions and numpy. **Great!**\n", 134 | "\n", 135 | "The previous section was a quick look at Trax's numpy. However this notebook also aims to teach you how you can calculate the perplexity of a trained model.\n" 136 | ] 137 | }, 138 | { 139 | "cell_type": "markdown", 140 | "metadata": {}, 141 | "source": [ 142 | "## Calculating Perplexity" 143 | ] 144 | }, 145 | { 146 | "cell_type": "markdown", 147 | "metadata": {}, 148 | "source": [ 149 | "The perplexity is a metric that measures how well a probability model predicts a sample and it is commonly used to evaluate language models. It is defined as: \n", 150 | "\n", 151 | "$$P(W) = \\sqrt[N]{\\prod_{i=1}^{N} \\frac{1}{P(w_i| w_1,...,w_{n-1})}}$$\n", 152 | "\n", 153 | "As an implementation hack, you would usually take the log of that formula (to enable us to use the log probabilities we get as output of our `RNN`, convert exponents to products, and products into sums which makes computations less complicated and computationally more efficient). You should also take care of the padding, since you do not want to include the padding when calculating the perplexity (because we do not want to have a perplexity measure artificially good). The algebra behind this process is explained next:\n", 154 | "\n", 155 | "\n", 156 | "$$log P(W) = {log\\big(\\sqrt[N]{\\prod_{i=1}^{N} \\frac{1}{P(w_i| w_1,...,w_{n-1})}}\\big)}$$\n", 157 | "\n", 158 | "$$ = {log\\big({\\prod_{i=1}^{N} \\frac{1}{P(w_i| w_1,...,w_{n-1})}}\\big)^{\\frac{1}{N}}}$$ \n", 159 | "\n", 160 | "$$ = {log\\big({\\prod_{i=1}^{N}{P(w_i| w_1,...,w_{n-1})}}\\big)^{-\\frac{1}{N}}} $$\n", 161 | "$$ = -\\frac{1}{N}{log\\big({\\prod_{i=1}^{N}{P(w_i| w_1,...,w_{n-1})}}\\big)} $$\n", 162 | "$$ = -\\frac{1}{N}{\\big({\\sum_{i=1}^{N}{logP(w_i| w_1,...,w_{n-1})}}\\big)} $$" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "You will be working with a real example from this week's assignment. The example is made up of:\n", 170 | " - `predictions` : batch of tensors corresponding to lines of text predicted by the model.\n", 171 | " - `targets` : batch of actual tensors corresponding to lines of text." 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": 4, 177 | "metadata": {}, 178 | "outputs": [ 179 | { 180 | "name": "stdout", 181 | "output_type": "stream", 182 | "text": [ 183 | "predictions has shape: (32, 64, 256)\n", 184 | "targets has shape: (32, 64)\n" 185 | ] 186 | } 187 | ], 188 | "source": [ 189 | "from trax import layers as tl\n", 190 | "\n", 191 | "# Load from .npy files\n", 192 | "predictions = numpy.load('predictions.npy')\n", 193 | "targets = numpy.load('targets.npy')\n", 194 | "\n", 195 | "# Cast to jax.interpreters.xla.DeviceArray\n", 196 | "predictions = np.array(predictions)\n", 197 | "targets = np.array(targets)\n", 198 | "\n", 199 | "# Print shapes\n", 200 | "print(f'predictions has shape: {predictions.shape}')\n", 201 | "print(f'targets has shape: {targets.shape}')" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "Notice that the predictions have an extra dimension with the same length as the size of the vocabulary used.\n", 209 | "\n", 210 | "Because of this you will need a way of reshaping `targets` to match this shape. For this you can use `trax.layers.one_hot()`.\n", 211 | "\n", 212 | "Notice that `predictions.shape[-1]` will return the size of the last dimension of `predictions`." 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 5, 218 | "metadata": {}, 219 | "outputs": [ 220 | { 221 | "name": "stdout", 222 | "output_type": "stream", 223 | "text": [ 224 | "reshaped_targets has shape: (32, 64, 256)\n" 225 | ] 226 | } 227 | ], 228 | "source": [ 229 | "reshaped_targets = tl.one_hot(targets, predictions.shape[-1]) #trax's one_hot function takes the input as one_hot(x, n_categories, dtype=optional)\n", 230 | "print(f'reshaped_targets has shape: {reshaped_targets.shape}')" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "By calculating the product of the predictions and the reshaped targets and summing across the last dimension, the total log perplexity can be computed:" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 6, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "total_log_ppx = np.sum(predictions * reshaped_targets, axis= -1)" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "Now you will need to account for the padding so this metric is not artificially deflated (since a lower perplexity means a better model). For identifying which elements are padding and which are not, you can use `np.equal()` and get a tensor with `1s` in the positions of actual values and `0s` where there are paddings." 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": null, 259 | "metadata": {}, 260 | "outputs": [], 261 | "source": [ 262 | "non_pad = 1.0 - np.equal(targets, 0)\n", 263 | "print(f'non_pad has shape: {non_pad.shape}\\n')\n", 264 | "print(f'non_pad looks like this: \\n\\n {non_pad}')" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": {}, 270 | "source": [ 271 | "By computing the product of the total log perplexity and the non_pad tensor we remove the effect of padding on the metric:" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": null, 277 | "metadata": {}, 278 | "outputs": [], 279 | "source": [ 280 | "real_log_ppx = total_log_ppx * non_pad\n", 281 | "print(f'real perplexity still has shape: {real_log_ppx.shape}')" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "You can check the effect of filtering out the padding by looking at the two log perplexity tensors:" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": null, 294 | "metadata": {}, 295 | "outputs": [], 296 | "source": [ 297 | "print(f'log perplexity tensor before filtering padding: \\n\\n {total_log_ppx}\\n')\n", 298 | "print(f'log perplexity tensor after filtering padding: \\n\\n {real_log_ppx}')" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "To get a single average log perplexity across all the elements in the batch you can sum across both dimensions and divide by the number of elements. Notice that the result will be the negative of the real log perplexity of the model:" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": null, 311 | "metadata": {}, 312 | "outputs": [], 313 | "source": [ 314 | "log_ppx = np.sum(real_log_ppx) / np.sum(non_pad)\n", 315 | "log_ppx = -log_ppx\n", 316 | "print(f'The log perplexity and perplexity of the model are respectively: {log_ppx} and {np.exp(log_ppx)}')" 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "metadata": {}, 322 | "source": [ 323 | "**Congratulations on finishing this lecture notebook!** Now you should have a clear understanding of how to work with Trax's numpy and how to compute the perplexity to evaluate your language models. **Keep it up!**" 324 | ] 325 | } 326 | ], 327 | "metadata": { 328 | "kernelspec": { 329 | "display_name": "Python 3", 330 | "language": "python", 331 | "name": "python3" 332 | }, 333 | "language_info": { 334 | "codemirror_mode": { 335 | "name": "ipython", 336 | "version": 3 337 | }, 338 | "file_extension": ".py", 339 | "mimetype": "text/x-python", 340 | "name": "python", 341 | "nbconvert_exporter": "python", 342 | "pygments_lexer": "ipython3", 343 | "version": "3.7.1" 344 | } 345 | }, 346 | "nbformat": 4, 347 | "nbformat_minor": 4 348 | } 349 | -------------------------------------------------------------------------------- /NLP with Sequence Models/Week_1/NLP_C3_W1_lecture_nb_03_data_generatos.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Data generators\n", 8 | "\n" 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "In Python, a generator is a function that behaves like an iterator. It will return the next item. Here is a [link](https://wiki.python.org/moin/Generators) to review python generators. In many AI applications, it is advantageous to have a data generator to handle loading and transforming data for different applications. \n", 16 | "\n", 17 | "You will now implement a custom data generator, using a common pattern that you will use during all assignments of this course.\n", 18 | "In the following example, we use a set of samples `a`, to derive a new set of samples, with more elements than the original set.\n", 19 | "\n", 20 | "**Note: Pay attention to the use of list `lines_index` and variable `index` to traverse the original list.**" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 4, 26 | "metadata": {}, 27 | "outputs": [ 28 | { 29 | "name": "stdout", 30 | "output_type": "stream", 31 | "text": [ 32 | "[1, 2, 3, 4, 1, 2, 3, 4, 1, 2]\n" 33 | ] 34 | } 35 | ], 36 | "source": [ 37 | "import random \n", 38 | "import numpy as np\n", 39 | "\n", 40 | "# Example of traversing a list of indexes to create a circular list\n", 41 | "a = [1, 2, 3, 4]\n", 42 | "b = [0] * 10\n", 43 | "\n", 44 | "a_size = len(a)\n", 45 | "b_size = len(b)\n", 46 | "lines_index = [*range(a_size)] # is equivalent to [i for i in range(0,a_size)], the difference being the advantage of using * to pass values of range iterator to list directly\n", 47 | "index = 0 # similar to index in data_generator below\n", 48 | "for i in range(b_size): # `b` is longer than `a` forcing a wrap\n", 49 | " # We wrap by resetting index to 0 so the sequences circle back at the end to point to the first index\n", 50 | " if index >= a_size:\n", 51 | " index = 0\n", 52 | " \n", 53 | " b[i] = a[lines_index[index]] # `indexes_list[index]` point to a index of a. Store the result in b\n", 54 | " index += 1\n", 55 | " \n", 56 | "print(b)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "## Shuffling the data order\n", 64 | "\n", 65 | "In the next example, we will do the same as before, but shuffling the order of the elements in the output list. Note that here, our strategy of traversing using `lines_index` and `index` becomes very important, because we can simulate a shuffle in the input data, without doing that in reality.\n" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 5, 71 | "metadata": {}, 72 | "outputs": [ 73 | { 74 | "name": "stdout", 75 | "output_type": "stream", 76 | "text": [ 77 | "Original order of index: [0, 1, 2, 3]\n", 78 | "Shuffled order of index: [1, 3, 0, 2]\n", 79 | "New value order for first batch: [2, 4, 1, 3]\n", 80 | "\n", 81 | "Shuffled Indexes for Batch No.2 :[1, 2, 0, 3]\n", 82 | "Values for Batch No.2 :[2, 3, 1, 4]\n", 83 | "\n", 84 | "Shuffled Indexes for Batch No.3 :[3, 1, 2, 0]\n", 85 | "Values for Batch No.3 :[4, 2, 3, 1]\n", 86 | "\n", 87 | "Final value of b: [2, 4, 1, 3, 2, 3, 1, 4, 4, 2]\n" 88 | ] 89 | } 90 | ], 91 | "source": [ 92 | "# Example of traversing a list of indexes to create a circular list\n", 93 | "a = [1, 2, 3, 4]\n", 94 | "b = []\n", 95 | "\n", 96 | "a_size = len(a)\n", 97 | "b_size = 10\n", 98 | "lines_index = [*range(a_size)]\n", 99 | "print(\"Original order of index:\",lines_index)\n", 100 | "\n", 101 | "# if we shuffle the index_list we can change the order of our circular list\n", 102 | "# without modifying the order or our original data\n", 103 | "random.shuffle(lines_index) # Shuffle the order\n", 104 | "print(\"Shuffled order of index:\",lines_index)\n", 105 | "\n", 106 | "print(\"New value order for first batch:\",[a[index] for index in lines_index])\n", 107 | "batch_counter = 1\n", 108 | "index = 0 # similar to index in data_generator below\n", 109 | "for i in range(b_size): # `b` is longer than `a` forcing a wrap\n", 110 | " # We wrap by resetting index to 0\n", 111 | " if index >= a_size:\n", 112 | " index = 0\n", 113 | " batch_counter += 1\n", 114 | " random.shuffle(lines_index) # Re-shuffle the order\n", 115 | " print(\"\\nShuffled Indexes for Batch No.{} :{}\".format(batch_counter,lines_index))\n", 116 | " print(\"Values for Batch No.{} :{}\".format(batch_counter,[a[index] for index in lines_index]))\n", 117 | " \n", 118 | " b.append(a[lines_index[index]]) # `indexes_list[index]` point to a index of a. Store the result in b\n", 119 | " index += 1\n", 120 | "print() \n", 121 | "print(\"Final value of b:\",b)" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "**Note: We call an epoch each time that an algorithm passes over all the training examples. Shuffling the examples for each epoch is known to reduce variance, making the models more general and overfit less.**\n", 129 | "\n", 130 | "\n" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "\n", 138 | "### Exercise\n", 139 | "\n", 140 | "**Instructions:** Implement a data generator function that takes in `batch_size, x, y shuffle` where x could be a large list of samples, and y is a list of the tags associated with those samples. Return a subset of those inputs in a tuple of two arrays `(X,Y)`. Each is an array of dimension (`batch_size`). If `shuffle=True`, the data will be traversed in a random form.\n", 141 | "\n", 142 | "**Details:**\n", 143 | "\n", 144 | "This code as an outer loop \n", 145 | "```\n", 146 | "while True: \n", 147 | "... \n", 148 | "yield((X,Y)) \n", 149 | "```\n", 150 | "\n", 151 | "Which runs continuously in the fashion of generators, pausing when yielding the next values. We will generate a batch_size output on each pass of this loop. \n", 152 | "\n", 153 | "It has an inner loop that stores in temporal lists (X, Y) the data samples to be included in the next batch.\n", 154 | "\n", 155 | "There are three slightly out of the ordinary features. \n", 156 | "\n", 157 | "1. The first is the use of a list of a predefined size to store the data for each batch. Using a predefined size list reduces the computation time if the elements in the array are of a fixed size, like numbers. If the elements are of different sizes, it is better to use an empty array and append one element at a time during the loop.\n", 158 | "\n", 159 | "2. The second is tracking the current location in the incoming lists of samples. Generators variables hold their values between invocations, so we create an `index` variable, initialize to zero, and increment by one for each sample included in a batch. However, we do not use the `index` to access the positions of the list of sentences directly. Instead, we use it to select one index from a list of indexes. In this way, we can change the order in which we traverse our original list, keeping untouched our original list. \n", 160 | "\n", 161 | "3. The third also relates to wrapping. Because `batch_size` and the length of the input lists are not aligned, gathering a batch_size group of inputs may involve wrapping back to the beginning of the input loop. In our approach, it is just enough to reset the `index` to 0. We can re-shuffle the list of indexes to produce different batches each time." 162 | ] 163 | }, 164 | { 165 | "cell_type": "code", 166 | "execution_count": null, 167 | "metadata": {}, 168 | "outputs": [], 169 | "source": [ 170 | "def data_generator(batch_size, data_x, data_y, shuffle=True):\n", 171 | " '''\n", 172 | " Input: \n", 173 | " batch_size - integer describing the batch size\n", 174 | " data_x - list containing samples\n", 175 | " data_y - list containing labels\n", 176 | " shuffle - Shuffle the data order\n", 177 | " Output:\n", 178 | " a tuple containing 2 elements:\n", 179 | " X - list of dim (batch_size) of samples\n", 180 | " Y - list of dim (batch_size) of labels\n", 181 | " '''\n", 182 | " \n", 183 | " data_lng = len(data_x) # len(data_x) must be equal to len(data_y)\n", 184 | " index_list = [*range(data_lng)] # Create a list with the ordered indexes of sample data\n", 185 | " \n", 186 | " \n", 187 | " # If shuffle is set to true, we traverse the list in a random way\n", 188 | " if shuffle:\n", 189 | " random.shuffle(index_list) # Inplace shuffle of the list\n", 190 | " \n", 191 | " index = 0 # Start with the first element\n", 192 | " # START CODE HERE \n", 193 | " # Fill all the None values with code taking reference of what you learned so far\n", 194 | " while True:\n", 195 | " X = [None]*ba # We can create a list with batch_size elements. \n", 196 | " Y = None # We can create a list with batch_size elements. \n", 197 | " \n", 198 | " for i in range(batch_size):\n", 199 | " \n", 200 | " # Wrap the index each time that we reach the end of the list\n", 201 | " if index >= data_lng:\n", 202 | " index = 0\n", 203 | " # Shuffle the index_list if shuffle is true\n", 204 | " if shuffle:\n", 205 | " random.shuffle(index_list) # re-shuffle the order\n", 206 | " \n", 207 | " X[i] = # We set the corresponding element in x\n", 208 | " Y[i] = None # We set the corresponding element in x\n", 209 | " # END CODE HERE \n", 210 | " index += 1\n", 211 | " \n", 212 | " yield((X, Y))\n", 213 | " " 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "If your function is correct, all the tests must pass. " 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": null, 226 | "metadata": {}, 227 | "outputs": [], 228 | "source": [ 229 | "def test_data_generator():\n", 230 | " x = [1, 2, 3, 4]\n", 231 | " y = [xi ** 2 for xi in x]\n", 232 | " \n", 233 | " generator = data_generator(3, x, y, shuffle=False)\n", 234 | "\n", 235 | " assert np.allclose(next(generator), ([1, 2, 3], [1, 4, 9])), \"First batch does not match\"\n", 236 | " assert np.allclose(next(generator), ([4, 1, 2], [16, 1, 4])), \"Second batch does not match\"\n", 237 | " assert np.allclose(next(generator), ([3, 4, 1], [9, 16, 1])), \"Third batch does not match\"\n", 238 | " assert np.allclose(next(generator), ([2, 3, 4], [4, 9, 16])), \"Fourth batch does not match\"\n", 239 | "\n", 240 | " print(\"\\033[92mAll tests passed!\")\n", 241 | "\n", 242 | "test_data_generator()" 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "If you could not solve the exercise, just run the next code to see the answer." 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": {}, 256 | "outputs": [], 257 | "source": [ 258 | "import base64\n", 259 | "\n", 260 | "solution = \"ZGVmIGRhdGFfZ2VuZXJhdG9yKGJhdGNoX3NpemUsIGRhdGFfeCwgZGF0YV95LCBzaHVmZmxlPVRydWUpOgoKICAgIGRhdGFfbG5nID0gbGVuKGRhdGFfeCkgIyBsZW4oZGF0YV94KSBtdXN0IGJlIGVxdWFsIHRvIGxlbihkYXRhX3kpCiAgICBpbmRleF9saXN0ID0gWypyYW5nZShkYXRhX2xuZyldICMgQ3JlYXRlIGEgbGlzdCB3aXRoIHRoZSBvcmRlcmVkIGluZGV4ZXMgb2Ygc2FtcGxlIGRhdGEKICAgIAogICAgIyBJZiBzaHVmZmxlIGlzIHNldCB0byB0cnVlLCB3ZSB0cmF2ZXJzZSB0aGUgbGlzdCBpbiBhIHJhbmRvbSB3YXkKICAgIGlmIHNodWZmbGU6CiAgICAgICAgcm5kLnNodWZmbGUoaW5kZXhfbGlzdCkgIyBJbnBsYWNlIHNodWZmbGUgb2YgdGhlIGxpc3QKICAgIAogICAgaW5kZXggPSAwICMgU3RhcnQgd2l0aCB0aGUgZmlyc3QgZWxlbWVudAogICAgd2hpbGUgVHJ1ZToKICAgICAgICBYID0gWzBdICogYmF0Y2hfc2l6ZSAjIFdlIGNhbiBjcmVhdGUgYSBsaXN0IHdpdGggYmF0Y2hfc2l6ZSBlbGVtZW50cy4gCiAgICAgICAgWSA9IFswXSAqIGJhdGNoX3NpemUgIyBXZSBjYW4gY3JlYXRlIGEgbGlzdCB3aXRoIGJhdGNoX3NpemUgZWxlbWVudHMuIAogICAgICAgIAogICAgICAgIGZvciBpIGluIHJhbmdlKGJhdGNoX3NpemUpOgogICAgICAgICAgICAKICAgICAgICAgICAgIyBXcmFwIHRoZSBpbmRleCBlYWNoIHRpbWUgdGhhdCB3ZSByZWFjaCB0aGUgZW5kIG9mIHRoZSBsaXN0CiAgICAgICAgICAgIGlmIGluZGV4ID49IGRhdGFfbG5nOgogICAgICAgICAgICAgICAgaW5kZXggPSAwCiAgICAgICAgICAgICAgICAjIFNodWZmbGUgdGhlIGluZGV4X2xpc3QgaWYgc2h1ZmZsZSBpcyB0cnVlCiAgICAgICAgICAgICAgICBpZiBzaHVmZmxlOgogICAgICAgICAgICAgICAgICAgIHJuZC5zaHVmZmxlKGluZGV4X2xpc3QpICMgcmUtc2h1ZmZsZSB0aGUgb3JkZXIKICAgICAgICAgICAgCiAgICAgICAgICAgIFhbaV0gPSBkYXRhX3hbaW5kZXhfbGlzdFtpbmRleF1dIAogICAgICAgICAgICBZW2ldID0gZGF0YV95W2luZGV4X2xpc3RbaW5kZXhdXSAKICAgICAgICAgICAgCiAgICAgICAgICAgIGluZGV4ICs9IDEKICAgICAgICAKICAgICAgICB5aWVsZCgoWCwgWSkp\"\n", 261 | "\n", 262 | "# Print the solution to the given assignment\n", 263 | "print(base64.b64decode(solution).decode(\"utf-8\"))" 264 | ] 265 | }, 266 | { 267 | "cell_type": "markdown", 268 | "metadata": {}, 269 | "source": [ 270 | "### Hope you enjoyed this tutorial on data generators which will help you with the assignments in this course." 271 | ] 272 | } 273 | ], 274 | "metadata": { 275 | "kernelspec": { 276 | "display_name": "Python 3", 277 | "language": "python", 278 | "name": "python3" 279 | }, 280 | "language_info": { 281 | "codemirror_mode": { 282 | "name": "ipython", 283 | "version": 3 284 | }, 285 | "file_extension": ".py", 286 | "mimetype": "text/x-python", 287 | "name": "python", 288 | "nbconvert_exporter": "python", 289 | "pygments_lexer": "ipython3", 290 | "version": "3.7.1" 291 | } 292 | }, 293 | "nbformat": 4, 294 | "nbformat_minor": 4 295 | } 296 | -------------------------------------------------------------------------------- /NLP with Probabilistic Models/Week_3/NLP_C2_W3_lecture_nb_01.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "# N-grams Corpus preprocessing\n", 9 | "\n", 10 | "The input corpus in this week's assignment is a continuous text that needs some preprocessing so that you can start calculating the n-gram probabilities.\n", 11 | "\n", 12 | "Some common preprocessing steps for the language models include:\n", 13 | "- lowercasing the text\n", 14 | "- remove special characters\n", 15 | "- split text to list of sentences\n", 16 | "- split sentence into list words\n", 17 | "\n", 18 | "Can you note the similarities and differences among the preprocessing steps shown during the Course 1 of this specialization?" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 1, 24 | "metadata": {}, 25 | "outputs": [ 26 | { 27 | "name": "stderr", 28 | "output_type": "stream", 29 | "text": [ 30 | "[nltk_data] Downloading package punkt to /home/jovyan/nltk_data...\n", 31 | "[nltk_data] Unzipping tokenizers/punkt.zip.\n" 32 | ] 33 | }, 34 | { 35 | "data": { 36 | "text/plain": [ 37 | "True" 38 | ] 39 | }, 40 | "execution_count": 1, 41 | "metadata": {}, 42 | "output_type": "execute_result" 43 | } 44 | ], 45 | "source": [ 46 | "import nltk # NLP toolkit\n", 47 | "import re # Library for Regular expression operations\n", 48 | "\n", 49 | "nltk.download('punkt') # Download the Punkt sentence tokenizer " 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "\n", 57 | "### Lowercase\n", 58 | "\n", 59 | "Words at the beginning of a sentence and names start with a capital letter. However, when counting words, you want to treat them the same as if they appeared in the middle of a sentence. \n", 60 | "\n", 61 | "You can do that by converting the text to lowercase using [str.lowercase]\n", 62 | "(https://docs.python.org/3/library/stdtypes.html?highlight=split#str.lower).\n" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 2, 68 | "metadata": { 69 | "scrolled": true 70 | }, 71 | "outputs": [ 72 | { 73 | "name": "stdout", 74 | "output_type": "stream", 75 | "text": [ 76 | "learning% makes 'me' happy. i am happy be-cause i am learning! :)\n" 77 | ] 78 | } 79 | ], 80 | "source": [ 81 | "# change the corpus to lowercase\n", 82 | "corpus = \"Learning% makes 'me' happy. I am happy be-cause I am learning! :)\"\n", 83 | "corpus = corpus.lower()\n", 84 | "\n", 85 | "# note that word \"learning\" will now be the same regardless of its position in the sentence\n", 86 | "print(corpus)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "\n", 94 | "### Remove special charactes\n", 95 | "\n", 96 | "Some of the characters may need to be removed from the corpus before we start processing the text to find n-grams. \n", 97 | "\n", 98 | "Often, the special characters such as double quotes '\"' or dash '-' are removed, and the interpunction such as full stop '.' or question mark '?' are left in the corpus." 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 3, 104 | "metadata": {}, 105 | "outputs": [ 106 | { 107 | "name": "stdout", 108 | "output_type": "stream", 109 | "text": [ 110 | "learning makes me happy. i am happy because i am learning! \n" 111 | ] 112 | } 113 | ], 114 | "source": [ 115 | "# remove special characters\n", 116 | "corpus = \"learning% makes 'me' happy. i am happy be-cause i am learning! :)\"\n", 117 | "corpus = re.sub(r\"[^a-zA-Z0-9.?! ]+\", \"\", corpus)\n", 118 | "print(corpus)" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "Note that this process gets rid of the happy face made with punctuations :). Remember that for sentiment analysis, this emoticon was very important. However, we will not consider it here." 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "\n", 133 | "### Text splitting\n", 134 | "\n", 135 | "In the assignment, the sentences in the corpus are separated by a special delimiter \\n. You will need to split the corpus into an array of sentences using this delimiter. \n", 136 | "One way to do that is by using the [str.split](https://docs.python.org/3/library/stdtypes.html?highlight=split#str.split) method.\n", 137 | "\n", 138 | "\n", 139 | "The following examples illustrate how to use this method. The code shows:\n", 140 | "- how to split a string containing a date into an array of date parts\n", 141 | "- how to split a string with time into an array containing hours, minutes and seconds \n", 142 | "\n", 143 | "Also, note what happens if there are several back-to-back delimiters like between \"May\" and \"9\"." 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 4, 149 | "metadata": {}, 150 | "outputs": [ 151 | { 152 | "name": "stdout", 153 | "output_type": "stream", 154 | "text": [ 155 | "date parts = ['Sat', 'May', '', '9', '07:33:35', 'CEST', '2020']\n", 156 | "time parts = ['07', '33', '35']\n" 157 | ] 158 | } 159 | ], 160 | "source": [ 161 | "# split text by a delimiter to array\n", 162 | "input_date=\"Sat May 9 07:33:35 CEST 2020\"\n", 163 | "\n", 164 | "# get the date parts in array\n", 165 | "date_parts = input_date.split(\" \")\n", 166 | "print(f\"date parts = {date_parts}\")\n", 167 | "\n", 168 | "#get the time parts in array\n", 169 | "time_parts = date_parts[4].split(\":\")\n", 170 | "print(f\"time parts = {time_parts}\")" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | " This text splitting is more complicated than the tokenization process used for sentiment analysis." 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "\n", 185 | "### Sentence tokenizing\n", 186 | "\n", 187 | "Once you have a list of sentences, the next step is to split each sentence into a list of words.\n", 188 | "\n", 189 | "This process could be done in several ways, even using the str.split method described above, but we will use the NLTK library [nltk](https://www.nltk.org/) to help us with that.\n", 190 | "\n", 191 | "In the code assignment, you will use the method [word_tokenize](https://www.nltk.org/api/nltk.tokenize.html#nltk.tokenize.punkt.PunktLanguageVars.word_tokenize) to split your sentence into a list of words. Let us try the method in an example." 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 5, 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "name": "stdout", 201 | "output_type": "stream", 202 | "text": [ 203 | "i am happy because i am learning. -> ['i', 'am', 'happy', 'because', 'i', 'am', 'learning', '.']\n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "# tokenize the sentence into an array of words\n", 209 | "\n", 210 | "sentence = 'i am happy because i am learning.'\n", 211 | "tokenized_sentence = nltk.word_tokenize(sentence)\n", 212 | "print(f'{sentence} -> {tokenized_sentence}')" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "Now that the sentence is tokenized, you can work with each word in the sentence separately. This will be useful later when creating and counting N-grams. In the following code example, you will see how to find the length of each word." 220 | ] 221 | }, 222 | { 223 | "cell_type": "code", 224 | "execution_count": 6, 225 | "metadata": {}, 226 | "outputs": [ 227 | { 228 | "name": "stdout", 229 | "output_type": "stream", 230 | "text": [ 231 | " Lengths of the words: \n", 232 | "[('i', 1), ('am', 2), ('happy', 5), ('because', 7), ('i', 1), ('am', 2), ('learning', 8), ('.', 1)]\n" 233 | ] 234 | } 235 | ], 236 | "source": [ 237 | "# find length of each word in the tokenized sentence\n", 238 | "sentence = ['i', 'am', 'happy', 'because', 'i', 'am', 'learning', '.']\n", 239 | "word_lengths = [(word, len(word)) for word in sentence] # Create a list with the word lengths using a list comprehension\n", 240 | "print(f' Lengths of the words: \\n{word_lengths}')" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "The previous result produces a list of pairs. This is not equivalent to a dictionary." 248 | ] 249 | }, 250 | { 251 | "cell_type": "markdown", 252 | "metadata": {}, 253 | "source": [ 254 | "\n", 255 | "## N-grams\n", 256 | "\n", 257 | "### Sentence to n-gram\n", 258 | "\n", 259 | "The next step is to build n-grams from the tokenized sentences. \n", 260 | "\n", 261 | "A sliding window of size n-words can generate the n-grams. The window scans the list of words starting at the sentence beginning, moving by a step of one word until it reaches the end of the sentence.\n", 262 | "\n", 263 | "Here is an example method that prints all trigrams in the given sentence." 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 8, 269 | "metadata": {}, 270 | "outputs": [ 271 | { 272 | "name": "stdout", 273 | "output_type": "stream", 274 | "text": [ 275 | "List all trigrams of sentence: ['i', 'am', 'happy', 'because', 'i', 'am', 'learning', '.']\n", 276 | "\n", 277 | "['i', 'am', 'happy']\n", 278 | "['am', 'happy', 'because']\n", 279 | "['happy', 'because', 'i']\n", 280 | "['because', 'i', 'am']\n", 281 | "['i', 'am', 'learning']\n", 282 | "['am', 'learning', '.']\n" 283 | ] 284 | } 285 | ], 286 | "source": [ 287 | "def sentence_to_trigram(tokenized_sentence):\n", 288 | " \"\"\"\n", 289 | " Prints all trigrams in the given tokenized sentence.\n", 290 | " \n", 291 | " Args:\n", 292 | " tokenized_sentence: The words list.\n", 293 | " \n", 294 | " Returns:\n", 295 | " No output\n", 296 | " \"\"\"\n", 297 | " # note that the last position of i is 3rd to the end\n", 298 | " for i in range(len(tokenized_sentence) - 3 + 1):\n", 299 | " # the sliding window starts at position i and contains 3 words\n", 300 | " trigram = tokenized_sentence[i : i + 3]\n", 301 | " print(trigram)\n", 302 | "\n", 303 | "tokenized_sentence = ['i', 'am', 'happy', 'because', 'i', 'am', 'learning', '.']\n", 304 | "\n", 305 | "print(f'List all trigrams of sentence: {tokenized_sentence}\\n')\n", 306 | "sentence_to_trigram(tokenized_sentence)\n" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "\n", 314 | "### Prefix of an n-gram\n", 315 | "\n", 316 | "As you saw in the lecture, the n-gram probability is often calculated based on the (n-1)-gram counts. The prefix is needed in the formula to calculate the probability of an n-gram.\n", 317 | "\n", 318 | "\\begin{equation*}\n", 319 | "P(w_n|w_1^{n-1})=\\frac{C(w_1^n)}{C(w_1^{n-1})}\n", 320 | "\\end{equation*}\n", 321 | "\n", 322 | "The following code shows how to get an (n-1)-gram prefix from n-gram on an example of getting trigram from a 4-gram." 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": null, 328 | "metadata": { 329 | "scrolled": true 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "# get trigram prefix from a 4-gram\n", 334 | "fourgram = ['i', 'am', 'happy','because']\n", 335 | "trigram = fourgram[0:-1] # Get the elements from 0, included, up to the last element, not included.\n", 336 | "print(trigram)" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "\n", 344 | "### Start and end of sentence word $$ and $$\n", 345 | "You could see in the lecture that we must add some special characters at the beginning and the end of each sentence: \n", 346 | "* $$ at beginning\n", 347 | "* $$ at the end\n", 348 | "\n", 349 | "For n-grams, we must prepend n-1 of characters at the begining of the sentence. \n", 350 | "\n", 351 | "Let us have a look at how you can implement this in code.\n" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": null, 357 | "metadata": {}, 358 | "outputs": [], 359 | "source": [ 360 | "# when working with trigrams, you need to prepend 2 and append one \n", 361 | "n = 3\n", 362 | "tokenized_sentence = ['i', 'am', 'happy', 'because', 'i', 'am', 'learning', '.']\n", 363 | "tokenized_sentence = [\"\"] * (n - 1) + tokenized_sentence + [\"\"]\n", 364 | "print(tokenized_sentence)" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "That's all for the lab for \"N-gram\" lesson of week 3." 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": null, 377 | "metadata": {}, 378 | "outputs": [], 379 | "source": [] 380 | } 381 | ], 382 | "metadata": { 383 | "kernelspec": { 384 | "display_name": "Python 3", 385 | "language": "python", 386 | "name": "python3" 387 | }, 388 | "language_info": { 389 | "codemirror_mode": { 390 | "name": "ipython", 391 | "version": 3 392 | }, 393 | "file_extension": ".py", 394 | "mimetype": "text/x-python", 395 | "name": "python", 396 | "nbconvert_exporter": "python", 397 | "pygments_lexer": "ipython3", 398 | "version": "3.7.1" 399 | } 400 | }, 401 | "nbformat": 4, 402 | "nbformat_minor": 4 403 | } 404 | -------------------------------------------------------------------------------- /NLP with Sequence Models/Week_2/C3_W2_Lecture_Notebook_Hidden_State_Activation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Hidden State Activation : Ungraded Lecture Notebook\n", 8 | "\n", 9 | "In this notebook you'll take another look at the hidden state activation function. It can be written in two different ways. \n", 10 | "\n", 11 | "I'll show you, step by step, how to implement each of them and then how to verify whether the results produced by each of them are same or not.\n", 12 | "\n", 13 | "## Background\n", 14 | "\n", 15 | "![vanilla rnn](vanilla_rnn.PNG)\n", 16 | "\n", 17 | "\n", 18 | "This is the hidden state activation function for a vanilla RNN.\n", 19 | "\n", 20 | "$h^{}=g(W_{h}[h^{},x^{}] + b_h)$ \n", 21 | "\n", 22 | "Which is another way of writing this: \n", 23 | "\n", 24 | "$h^{}=g(W_{hh}h^{} \\oplus W_{hx}x^{} + b_h)$ \n", 25 | "\n", 26 | "Where \n", 27 | "\n", 28 | "- $W_{h}$ in the first formula is denotes the *horizontal* concatenation of $W_{hh}$ and $W_{hx}$ from the second formula.\n", 29 | "\n", 30 | "- $W_{h}$ in the first formula is then multiplied by $[h^{},x^{}]$, another concatenation of parameters from the second formula but this time in a different direction, i.e *vertical*!\n", 31 | "\n", 32 | "Let us see what this means computationally.\n", 33 | "\n", 34 | "## Imports" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 2, 40 | "metadata": {}, 41 | "outputs": [], 42 | "source": [ 43 | "import numpy as np" 44 | ] 45 | }, 46 | { 47 | "cell_type": "markdown", 48 | "metadata": {}, 49 | "source": [ 50 | "## Joining (Concatenation)\n", 51 | "\n", 52 | "### Weights\n", 53 | "\n", 54 | "A join along the vertical boundary is called a *horizontal concatenation* or *horizontal stack*. \n", 55 | "\n", 56 | "Visually, it looks like this:- $W_h = \\left [ W_{hh} \\ | \\ W_{hx} \\right ]$\n", 57 | "\n", 58 | "I'll show you two different ways to achieve this using numpy.\n", 59 | "\n", 60 | "__Note: The values used to populate the arrays, below, have been chosen to aid in visual illustration only. They are NOT what you'd expect to use building a model, which would typically be random variables instead.__\n", 61 | "\n", 62 | "* Try using random initializations for the weight arrays." 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": 5, 68 | "metadata": { 69 | "tags": [] 70 | }, 71 | "outputs": [ 72 | { 73 | "name": "stdout", 74 | "output_type": "stream", 75 | "text": [ 76 | "-- Data --\n", 77 | "\n", 78 | "w_hh :\n", 79 | "[[1 1]\n", 80 | " [1 1]\n", 81 | " [1 1]]\n", 82 | "w_hh shape : (3, 2) \n", 83 | "\n", 84 | "w_hx :\n", 85 | "[[9 9 9]\n", 86 | " [9 9 9]\n", 87 | " [9 9 9]]\n", 88 | "w_hx shape : (3, 3) \n", 89 | "\n", 90 | "-- Joining --\n", 91 | "\n", 92 | "option 1 : concatenate\n", 93 | "\n", 94 | "w_h :\n", 95 | "[[1 1 9 9 9]\n", 96 | " [1 1 9 9 9]\n", 97 | " [1 1 9 9 9]]\n", 98 | "w_h shape : (3, 5) \n", 99 | "\n", 100 | "option 2 : hstack\n", 101 | "\n", 102 | "w_h :\n", 103 | "[[1 1 9 9 9]\n", 104 | " [1 1 9 9 9]\n", 105 | " [1 1 9 9 9]]\n", 106 | "w_h shape : (3, 5)\n" 107 | ] 108 | } 109 | ], 110 | "source": [ 111 | "# Create some dummy data\n", 112 | "\n", 113 | "w_hh = np.full((3, 2), 1) # illustration purposes only, returns an array of size 3x2 filled with all 1s\n", 114 | "w_hx = np.full((3, 3), 9) # illustration purposes only, returns an array of size 3x3 filled with all 9s\n", 115 | "\n", 116 | "\n", 117 | "### START CODE HERE ###\n", 118 | "# Try using some random initializations, though it will obfuscate the join. eg: uncomment these lines\n", 119 | "# w_hh = np.random.standard_normal((3,2))\n", 120 | "# w_hx = np.random.standard_normal((3,3))\n", 121 | "### END CODE HERE ###\n", 122 | "\n", 123 | "print(\"-- Data --\\n\")\n", 124 | "print(\"w_hh :\")\n", 125 | "print(w_hh)\n", 126 | "print(\"w_hh shape :\", w_hh.shape, \"\\n\")\n", 127 | "print(\"w_hx :\")\n", 128 | "print(w_hx)\n", 129 | "print(\"w_hx shape :\", w_hx.shape, \"\\n\")\n", 130 | "\n", 131 | "# Joining the arrays\n", 132 | "print(\"-- Joining --\\n\")\n", 133 | "# Option 1: concatenate - horizontal\n", 134 | "w_h1 = np.concatenate((w_hh, w_hx), axis=1)\n", 135 | "print(\"option 1 : concatenate\\n\")\n", 136 | "print(\"w_h :\")\n", 137 | "print(w_h1)\n", 138 | "print(\"w_h shape :\", w_h1.shape, \"\\n\")\n", 139 | "\n", 140 | "# Option 2: hstack\n", 141 | "w_h2 = np.hstack((w_hh, w_hx))\n", 142 | "print(\"option 2 : hstack\\n\")\n", 143 | "print(\"w_h :\")\n", 144 | "print(w_h2)\n", 145 | "print(\"w_h shape :\", w_h2.shape)" 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "### Hidden State & Inputs\n", 153 | "Joining along a horizontal boundary is called a vertical concatenation or vertical stack. Visually it looks like this:\n", 154 | "\n", 155 | "$[h^{},x^{}] = \\left[ \\frac{h^{}}{x^{}} \\right]$\n", 156 | "\n", 157 | "\n", 158 | "I'll show you two different ways to achieve this using numpy.\n", 159 | "\n", 160 | "*Try using random initializations for the hiddent state and input matrices.*\n" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 6, 166 | "metadata": { 167 | "tags": [] 168 | }, 169 | "outputs": [ 170 | { 171 | "name": "stdout", 172 | "output_type": "stream", 173 | "text": [ 174 | "-- Data --\n", 175 | "\n", 176 | "h_t_prev :\n", 177 | "[[1]\n", 178 | " [1]]\n", 179 | "h_t_prev shape : (2, 1) \n", 180 | "\n", 181 | "x_t :\n", 182 | "[[9]\n", 183 | " [9]\n", 184 | " [9]]\n", 185 | "x_t shape : (3, 1) \n", 186 | "\n", 187 | "-- Joining --\n", 188 | "\n", 189 | "option 1 : concatenate\n", 190 | "\n", 191 | "ax_1 :\n", 192 | "[[1]\n", 193 | " [1]\n", 194 | " [9]\n", 195 | " [9]\n", 196 | " [9]]\n", 197 | "ax_1 shape : (5, 1) \n", 198 | "\n", 199 | "option 2 : vstack\n", 200 | "\n", 201 | "ax_2 :\n", 202 | "[[1]\n", 203 | " [1]\n", 204 | " [9]\n", 205 | " [9]\n", 206 | " [9]]\n", 207 | "ax_2 shape : (5, 1)\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "# Create some more dummy data\n", 213 | "h_t_prev = np.full((2, 1), 1) # illustration purposes only, returns an array of size 2x1 filled with all 1s\n", 214 | "x_t = np.full((3, 1), 9) # illustration purposes only, returns an array of size 3x1 filled with all 9s\n", 215 | "\n", 216 | "# Try using some random initializations, though it will obfuscate the join. eg: uncomment these lines\n", 217 | "\n", 218 | "### START CODE HERE ###\n", 219 | "# h_t_prev = np.random.standard_normal((2,1))\n", 220 | "# x_t = np.random.standard_normal((3,1))\n", 221 | "### END CODE HERE ###\n", 222 | "\n", 223 | "print(\"-- Data --\\n\")\n", 224 | "print(\"h_t_prev :\")\n", 225 | "print(h_t_prev)\n", 226 | "print(\"h_t_prev shape :\", h_t_prev.shape, \"\\n\")\n", 227 | "print(\"x_t :\")\n", 228 | "print(x_t)\n", 229 | "print(\"x_t shape :\", x_t.shape, \"\\n\")\n", 230 | "\n", 231 | "# Joining the arrays\n", 232 | "print(\"-- Joining --\\n\")\n", 233 | "\n", 234 | "# Option 1: concatenate - vertical\n", 235 | "ax_1 = np.concatenate(\n", 236 | " (h_t_prev, x_t), axis=0\n", 237 | ") # note the difference in axis parameter vs earlier\n", 238 | "print(\"option 1 : concatenate\\n\")\n", 239 | "print(\"ax_1 :\")\n", 240 | "print(ax_1)\n", 241 | "print(\"ax_1 shape :\", ax_1.shape, \"\\n\")\n", 242 | "\n", 243 | "# Option 2: vstack\n", 244 | "ax_2 = np.vstack((h_t_prev, x_t))\n", 245 | "print(\"option 2 : vstack\\n\")\n", 246 | "print(\"ax_2 :\")\n", 247 | "print(ax_2)\n", 248 | "print(\"ax_2 shape :\", ax_2.shape)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "## Verify Formulas\n", 256 | "Now you know how to do the concatenations, horizontal and vertical, lets verify if the two formulas produce the same result.\n", 257 | "\n", 258 | "__Formula 1:__ $h^{}=g(W_{h}[h^{},x^{}] + b_h)$ \n", 259 | "\n", 260 | "__Formula 2:__ $h^{}=g(W_{hh}h^{} \\oplus W_{hx}x^{} + b_h)$\n", 261 | "\n", 262 | "\n", 263 | "To prove:- __Formula 1__ $\\Leftrightarrow$ __Formula 2__\n", 264 | "\n", 265 | "We will ignore the bias term $b_h$ and the activation function $g(\\ )$ because the transformation will be identical for each formula. So what we really want to compare is the result of the following parameters inside each formula:\n", 266 | "\n", 267 | "$W_{h}[h^{},x^{}] \\quad \\Leftrightarrow \\quad W_{hh}h^{} \\oplus W_{hx}x^{} $\n", 268 | "\n", 269 | "We'll see how to do this using matrix multiplication combined with the data and techniques (stacking/concatenating) from above.\n", 270 | "\n", 271 | "* Try adding a sigmoid activation function and bias term to the checks for completeness.\n" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 4, 277 | "metadata": { 278 | "tags": [] 279 | }, 280 | "outputs": [ 281 | { 282 | "name": "stdout", 283 | "output_type": "stream", 284 | "text": [ 285 | "-- Results --\n", 286 | "\n", 287 | "Formula 1\n", 288 | "Term1:\n", 289 | " [[1 1 9 9 9]\n", 290 | " [1 1 9 9 9]\n", 291 | " [1 1 9 9 9]]\n", 292 | "Term2:\n", 293 | " [[1]\n", 294 | " [1]\n", 295 | " [9]\n", 296 | " [9]\n", 297 | " [9]]\n", 298 | "Output:\n", 299 | "[[245]\n", 300 | " [245]\n", 301 | " [245]]\n", 302 | "\n", 303 | "Formula 2\n", 304 | "Term1:\n", 305 | " [[2]\n", 306 | " [2]\n", 307 | " [2]]\n", 308 | "Term2:\n", 309 | " [[243]\n", 310 | " [243]\n", 311 | " [243]]\n", 312 | "\n", 313 | "Output:\n", 314 | "[[245]\n", 315 | " [245]\n", 316 | " [245]] \n", 317 | "\n", 318 | "-- Verify --\n", 319 | "Results are the same : True\n" 320 | ] 321 | } 322 | ], 323 | "source": [ 324 | "# Data\n", 325 | "\n", 326 | "w_hh = np.full((3, 2), 1) # returns an array of size 3x2 filled with all 1s\n", 327 | "w_hx = np.full((3, 3), 9) # returns an array of size 3x3 filled with all 9s\n", 328 | "h_t_prev = np.full((2, 1), 1) # returns an array of size 2x1 filled with all 1s\n", 329 | "x_t = np.full((3, 1), 9) # returns an array of size 3x1 filled with all 9s\n", 330 | "\n", 331 | "\n", 332 | "# If you want to randomize the values, uncomment the next 4 lines\n", 333 | "\n", 334 | "# w_hh = np.random.standard_normal((3,2))\n", 335 | "# w_hx = np.random.standard_normal((3,3))\n", 336 | "# h_t_prev = np.random.standard_normal((2,1))\n", 337 | "# x_t = np.random.standard_normal((3,1))\n", 338 | "\n", 339 | "# Results\n", 340 | "print(\"-- Results --\")\n", 341 | "# Formula 1\n", 342 | "stack_1 = np.hstack((w_hh, w_hx))\n", 343 | "stack_2 = np.vstack((h_t_prev, x_t))\n", 344 | "\n", 345 | "print(\"\\nFormula 1\")\n", 346 | "print(\"Term1:\\n\",stack_1)\n", 347 | "print(\"Term2:\\n\",stack_2)\n", 348 | "formula_1 = np.matmul(np.hstack((w_hh, w_hx)), np.vstack((h_t_prev, x_t)))\n", 349 | "print(\"Output:\")\n", 350 | "print(formula_1)\n", 351 | "\n", 352 | "# Formula 2\n", 353 | "mul_1 = np.matmul(w_hh, h_t_prev)\n", 354 | "mul_2 = np.matmul(w_hx, x_t)\n", 355 | "print(\"\\nFormula 2\")\n", 356 | "print(\"Term1:\\n\",mul_1)\n", 357 | "print(\"Term2:\\n\",mul_2)\n", 358 | "\n", 359 | "formula_2 = np.matmul(w_hh, h_t_prev) + np.matmul(w_hx, x_t)\n", 360 | "print(\"\\nOutput:\")\n", 361 | "print(formula_2, \"\\n\")\n", 362 | "\n", 363 | "# Verification \n", 364 | "# np.allclose - to check if two arrays are elementwise equal upto certain tolerance, here \n", 365 | "# https://numpy.org/doc/stable/reference/generated/numpy.allclose.html\n", 366 | "\n", 367 | "print(\"-- Verify --\")\n", 368 | "print(\"Results are the same :\", np.allclose(formula_1, formula_2))\n", 369 | "\n", 370 | "### START CODE HERE ###\n", 371 | "# # Try adding a sigmoid activation function and bias term as a final check\n", 372 | "# # Activation\n", 373 | "# def sigmoid(x):\n", 374 | "# return 1 / (1 + np.exp(-x))\n", 375 | "\n", 376 | "# # Bias and check\n", 377 | "# b = np.random.standard_normal((formula_1.shape[0],1))\n", 378 | "# print(\"Formula 1 Output:\\n\",sigmoid(formula_1+b))\n", 379 | "# print(\"Formula 2 Output:\\n\",sigmoid(formula_2+b))\n", 380 | "\n", 381 | "# all_close = np.allclose(sigmoid(formula_1+b), sigmoid(formula_2+b))\n", 382 | "# print(\"Results after activation are the same :\",all_close)\n", 383 | "### END CODE HERE ###" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "## Summary\n", 391 | "That's it! We've verified that the two formulas produce the same results, and seen how to combine matrices vertically and horizontally to make that happen. We now have all the intuition needed to understand the math notation of RNNs." 392 | ] 393 | } 394 | ], 395 | "metadata": { 396 | "jupytext": { 397 | "formats": "ipynb,py:percent", 398 | "main_language": "python" 399 | }, 400 | "kernelspec": { 401 | "display_name": "Python 3", 402 | "language": "python", 403 | "name": "python3" 404 | }, 405 | "language_info": { 406 | "codemirror_mode": { 407 | "name": "ipython", 408 | "version": 3 409 | }, 410 | "file_extension": ".py", 411 | "mimetype": "text/x-python", 412 | "name": "python", 413 | "nbconvert_exporter": "python", 414 | "pygments_lexer": "ipython3", 415 | "version": "3.7.1" 416 | } 417 | }, 418 | "nbformat": 4, 419 | "nbformat_minor": 2 420 | } 421 | -------------------------------------------------------------------------------- /NLP with Sequence Models/Week_2/C3_W2_lecture_notebook_RNNs.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Vanilla RNNs, GRUs and the `scan` function" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this notebook, you will learn how to define the forward method for vanilla RNNs and GRUs. Additionally, you will see how to define and use the function `scan` to compute forward propagation for RNNs.\n", 15 | "\n", 16 | "By completing this notebook, you will:\n", 17 | "\n", 18 | "- Be able to define the forward method for vanilla RNNs and GRUs\n", 19 | "- Be able to define the `scan` function to perform forward propagation for RNNs\n", 20 | "- Understand how forward propagation is implemented for RNNs." 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 1, 26 | "metadata": {}, 27 | "outputs": [], 28 | "source": [ 29 | "import numpy as np\n", 30 | "from numpy import random\n", 31 | "from time import perf_counter" 32 | ] 33 | }, 34 | { 35 | "cell_type": "markdown", 36 | "metadata": {}, 37 | "source": [ 38 | "An implementation of the `sigmoid` function is provided below so you can use it in this notebook." 39 | ] 40 | }, 41 | { 42 | "cell_type": "code", 43 | "execution_count": 2, 44 | "metadata": {}, 45 | "outputs": [], 46 | "source": [ 47 | "def sigmoid(x): # Sigmoid function\n", 48 | " return 1.0 / (1.0 + np.exp(-x))" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "# Part 1: Forward method for vanilla RNNs and GRUs" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "In this part of the notebook, you'll see the implementation of the forward method for a vanilla RNN and you'll implement that same method for a GRU. For this excersice you'll use a set of random weights and variables with the following dimensions:\n", 63 | "\n", 64 | "- Embedding size (`emb`) : 128\n", 65 | "- Hidden state size (`h_dim`) : (16,1)\n", 66 | "\n", 67 | "The weights `w_` and biases `b_` are initialized with dimensions (`h_dim`, `emb + h_dim`) and (`h_dim`, 1). We expect the hidden state `h_t` to be a column vector with size (`h_dim`,1) and the initial hidden state `h_0` is a vector of zeros." 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "outputs": [], 75 | "source": [ 76 | "random.seed(10) # Random seed, so your results match ours\n", 77 | "emb = 128 # Embedding size\n", 78 | "T = 256 # Number of variables in the sequences\n", 79 | "h_dim = 16 # Hidden state dimension\n", 80 | "h_0 = np.zeros((h_dim, 1)) # Initial hidden state\n", 81 | "# Random initialization of weights and biases\n", 82 | "w1 = random.standard_normal((h_dim, emb+h_dim))\n", 83 | "w2 = random.standard_normal((h_dim, emb+h_dim))\n", 84 | "w3 = random.standard_normal((h_dim, emb+h_dim))\n", 85 | "b1 = random.standard_normal((h_dim, 1))\n", 86 | "b2 = random.standard_normal((h_dim, 1))\n", 87 | "b3 = random.standard_normal((h_dim, 1))\n", 88 | "X = random.standard_normal((T, emb, 1))\n", 89 | "weights = [w1, w2, w3, b1, b2, b3]" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "## 1.1 Forward method for vanilla RNNs" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "The vanilla RNN cell is quite straight forward. Its most general structure is presented in the next figure: \n", 104 | "\n", 105 | "\n", 106 | "\n", 107 | "As you saw in the lecture videos, the computations made in a vanilla RNN cell are equivalent to the following equations:\n", 108 | "\n", 109 | "\\begin{equation}\n", 110 | "h^{}=g(W_{h}[h^{},x^{}] + b_h)\n", 111 | "\\label{eq: htRNN}\n", 112 | "\\end{equation}\n", 113 | " \n", 114 | "\\begin{equation}\n", 115 | "\\hat{y}^{}=g(W_{yh}h^{} + b_y)\n", 116 | "\\label{eq: ytRNN}\n", 117 | "\\end{equation}\n", 118 | "\n", 119 | "where $[h^{},x^{}]$ means that $h^{}$ and $x^{}$ are concatenated together. In the next cell we provide the implementation of the forward method for a vanilla RNN. " 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 4, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "def forward_V_RNN(inputs, weights): # Forward propagation for a a single vanilla RNN cell\n", 129 | " x, h_t = inputs\n", 130 | "\n", 131 | " # weights.\n", 132 | " wh, _, _, bh, _, _ = weights\n", 133 | "\n", 134 | " # new hidden state\n", 135 | " h_t = np.dot(wh, np.concatenate([h_t, x])) + bh\n", 136 | " h_t = sigmoid(h_t)\n", 137 | "\n", 138 | " return h_t, h_t" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "As you can see, we omitted the computation of $\\hat{y}^{}$. This was done for the sake of simplicity, so you can focus on the way that hidden states are updated here and in the GRU cell." 146 | ] 147 | }, 148 | { 149 | "cell_type": "markdown", 150 | "metadata": {}, 151 | "source": [ 152 | "## 1.2 Forward method for GRUs" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "A GRU cell have more computations than the ones that vanilla RNNs have. You can see this visually in the following diagram:\n", 160 | "\n", 161 | "\n", 162 | "\n", 163 | "As you saw in the lecture videos, GRUs have relevance $\\Gamma_r$ and update $\\Gamma_u$ gates that control how the hidden state $h^{}$ is updated on every time step. With these gates, GRUs are capable of keeping relevant information in the hidden state even for long sequences. The equations needed for the forward method in GRUs are provided below: \n", 164 | "\n", 165 | "\\begin{equation}\n", 166 | "\\Gamma_r=\\sigma{(W_r[h^{}, x^{}]+b_r)}\n", 167 | "\\end{equation}\n", 168 | "\n", 169 | "\\begin{equation}\n", 170 | "\\Gamma_u=\\sigma{(W_u[h^{}, x^{}]+b_u)}\n", 171 | "\\end{equation}\n", 172 | "\n", 173 | "\\begin{equation}\n", 174 | "c^{}=\\tanh{(W_h[\\Gamma_r*h^{},x^{}]+b_h)}\n", 175 | "\\end{equation}\n", 176 | "\n", 177 | "\\begin{equation}\n", 178 | "h^{}=\\Gamma_u*c^{}+(1-\\Gamma_u)*h^{}\n", 179 | "\\end{equation}\n", 180 | "\n", 181 | "In the next cell, please implement the forward method for a GRU cell by computing the update `u` and relevance `r` gates, and the candidate hidden state `c`. " 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 5, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "def forward_GRU(inputs, weights): # Forward propagation for a single GRU cell\n", 191 | " x, h_t = inputs\n", 192 | "\n", 193 | " # weights.\n", 194 | " wu, wr, wc, bu, br, bc = weights\n", 195 | "\n", 196 | " # Update gate\n", 197 | " ### START CODE HERE (1-2 lINES) ###\n", 198 | " u = np.dot(wu, np.concatenate([h_t, x])) + bu\n", 199 | " u = sigmoid(u)\n", 200 | " ### END CODE HERE ###\n", 201 | " \n", 202 | " # Relevance gate\n", 203 | " ### START CODE HERE (1-2 lINES) ###\n", 204 | " r = np.dot(wr, np.concatenate([h_t, x])) + br\n", 205 | " r = sigmoid(r)\n", 206 | " ### END CODE HERE ###\n", 207 | " \n", 208 | " # Candidate hidden state \n", 209 | " ### START CODE HERE (1-2 lINES) ###\n", 210 | " c = np.dot(wc, np.concatenate([r * h_t, x])) + bc\n", 211 | " c = np.tanh(c)\n", 212 | " ### END CODE HERE ###\n", 213 | " \n", 214 | " # New Hidden state h_t\n", 215 | " h_t = u* c + (1 - u)* h_t\n", 216 | " return h_t, h_t" 217 | ] 218 | }, 219 | { 220 | "cell_type": "markdown", 221 | "metadata": {}, 222 | "source": [ 223 | "Run the following cell to check your implementation." 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 6, 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "data": { 233 | "text/plain": [ 234 | "array([[ 9.77779014e-01],\n", 235 | " [-9.97986240e-01],\n", 236 | " [-5.19958083e-01],\n", 237 | " [-9.99999886e-01],\n", 238 | " [-9.99707004e-01],\n", 239 | " [-3.02197037e-04],\n", 240 | " [-9.58733503e-01],\n", 241 | " [ 2.10804828e-02],\n", 242 | " [ 9.77365398e-05],\n", 243 | " [ 9.99833090e-01],\n", 244 | " [ 1.63200940e-08],\n", 245 | " [ 8.51874303e-01],\n", 246 | " [ 5.21399924e-02],\n", 247 | " [ 2.15495959e-02],\n", 248 | " [ 9.99878828e-01],\n", 249 | " [ 9.77165472e-01]])" 250 | ] 251 | }, 252 | "execution_count": 6, 253 | "metadata": {}, 254 | "output_type": "execute_result" 255 | } 256 | ], 257 | "source": [ 258 | "forward_GRU([X[1],h_0], weights)[0]" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": {}, 264 | "source": [ 265 | "Expected output:\n", 266 | "
\n",
267 |     "array([[ 9.77779014e-01],\n",
268 |     "       [-9.97986240e-01],\n",
269 |     "       [-5.19958083e-01],\n",
270 |     "       [-9.99999886e-01],\n",
271 |     "       [-9.99707004e-01],\n",
272 |     "       [-3.02197037e-04],\n",
273 |     "       [-9.58733503e-01],\n",
274 |     "       [ 2.10804828e-02],\n",
275 |     "       [ 9.77365398e-05],\n",
276 |     "       [ 9.99833090e-01],\n",
277 |     "       [ 1.63200940e-08],\n",
278 |     "       [ 8.51874303e-01],\n",
279 |     "       [ 5.21399924e-02],\n",
280 |     "       [ 2.15495959e-02],\n",
281 |     "       [ 9.99878828e-01],\n",
282 |     "       [ 9.77165472e-01]])\n",
283 |     "
" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "# Part 2: Implementation of the `scan` function" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "In the lectures you saw how the `scan` function is used for forward propagation in RNNs. It takes as inputs:\n", 298 | "\n", 299 | "- `fn` : the function to be called recurrently (i.e. `forward_GRU`)\n", 300 | "- `elems` : the list of inputs for each time step (`X`)\n", 301 | "- `weights` : the parameters needed to compute `fn`\n", 302 | "- `h_0` : the initial hidden state\n", 303 | "\n", 304 | "`scan` goes through all the elements `x` in `elems`, calls the function `fn` with arguments ([`x`, `h_t`],`weights`), stores the computed hidden state `h_t` and appends the result to a list `ys`. Complete the following cell by calling `fn` with arguments ([`x`, `h_t`],`weights`)." 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 7, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "def scan(fn, elems, weights, h_0=None): # Forward propagation for RNNs\n", 314 | " h_t = h_0\n", 315 | " ys = []\n", 316 | " for x in elems:\n", 317 | " ### START CODE HERE (1 lINE) ###\n", 318 | " y, h_t = fn([x, h_t], weights)\n", 319 | " ### END CODE HERE ###\n", 320 | " ys.append(y)\n", 321 | " return ys, h_t" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "# Part 3: Comparison between vanilla RNNs and GRUs" 329 | ] 330 | }, 331 | { 332 | "cell_type": "markdown", 333 | "metadata": {}, 334 | "source": [ 335 | "You have already seen how forward propagation is computed for vanilla RNNs and GRUs. As a quick recap, you need to have a forward method for the recurrent cell and a function like `scan` to go through all the elements from a sequence using a forward method. You saw that GRUs performed more computations than vanilla RNNs, and you can check that they have 3 times more parameters. In the next two cells, we compute forward propagation for a sequence with 256 time steps (`T`) for an RNN and a GRU with the same hidden state `h_t` size (`h_dim`=16). " 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": null, 341 | "metadata": {}, 342 | "outputs": [], 343 | "source": [ 344 | "# vanilla RNNs\n", 345 | "tic = perf_counter()\n", 346 | "ys, h_T = scan(forward_V_RNN, X, weights, h_0)\n", 347 | "toc = perf_counter()\n", 348 | "RNN_time=(toc-tic)*1000\n", 349 | "print (f\"It took {RNN_time:.2f}ms to run the forward method for the vanilla RNN.\")" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [ 358 | "# GRUs\n", 359 | "tic = perf_counter()\n", 360 | "ys, h_T = scan(forward_GRU, X, weights, h_0)\n", 361 | "toc = perf_counter()\n", 362 | "GRU_time=(toc-tic)*1000\n", 363 | "print (f\"It took {GRU_time:.2f}ms to run the forward method for the GRU.\")" 364 | ] 365 | }, 366 | { 367 | "cell_type": "markdown", 368 | "metadata": {}, 369 | "source": [ 370 | "As you were told in the lectures, GRUs take more time to compute (However, sometimes, although a rare occurrence, Vanilla RNNs take more time. Can you figure out what might cause this ?). This means that training and prediction would take more time for a GRU than for a vanilla RNN. However, GRUs allow you to propagate relevant information even for long sequences, so when selecting an architecture for NLP you should assess the tradeoff between computational time and performance. " 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "Congratulations! Now you know how the forward method is implemented for vanilla RNNs and GRUs, and you know how the scan function provides an abstraction for forward propagation in RNNs. " 378 | ] 379 | } 380 | ], 381 | "metadata": { 382 | "kernelspec": { 383 | "display_name": "Python 3", 384 | "language": "python", 385 | "name": "python3" 386 | }, 387 | "language_info": { 388 | "codemirror_mode": { 389 | "name": "ipython", 390 | "version": 3 391 | }, 392 | "file_extension": ".py", 393 | "mimetype": "text/x-python", 394 | "name": "python", 395 | "nbconvert_exporter": "python", 396 | "pygments_lexer": "ipython3", 397 | "version": "3.7.1" 398 | } 399 | }, 400 | "nbformat": 4, 401 | "nbformat_minor": 4 402 | } 403 | -------------------------------------------------------------------------------- /NLP with Attention Models/Week_2/C4_W2_lecture_notebook_Transformer_Decoder.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# The Transformer Decoder: Ungraded Lab Notebook\n", 8 | "\n", 9 | "In this notebook, you'll explore the transformer decoder and how to implement it with Trax. \n", 10 | "\n", 11 | "## Background\n", 12 | "\n", 13 | "In the last lecture notebook, you saw how to translate the mathematics of attention into NumPy code. Here, you'll see how multi-head causal attention fits into a GPT-2 transformer decoder, and how to build one with Trax layers. In the assignment notebook, you'll implement causal attention from scratch, but here, you'll exploit the handy-dandy `tl.CausalAttention()` layer.\n", 14 | "\n", 15 | "The schematic below illustrates the components and flow of a transformer decoder. Note that while the algorithm diagram flows from the bottom to the top, the overview and subsequent Trax layer codes are top-down.\n", 16 | "\n", 17 | "" 18 | ] 19 | }, 20 | { 21 | "cell_type": "markdown", 22 | "metadata": {}, 23 | "source": [ 24 | "## Imports" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 1, 30 | "metadata": {}, 31 | "outputs": [ 32 | { 33 | "name": "stdout", 34 | "output_type": "stream", 35 | "text": [ 36 | "INFO:tensorflow:tokens_length=568 inputs_length=512 targets_length=114 noise_density=0.15 mean_noise_span_length=3.0 \n" 37 | ] 38 | } 39 | ], 40 | "source": [ 41 | "import sys\n", 42 | "import os\n", 43 | "\n", 44 | "import time\n", 45 | "import numpy as np\n", 46 | "import gin\n", 47 | "\n", 48 | "import textwrap\n", 49 | "wrapper = textwrap.TextWrapper(width=70)\n", 50 | "\n", 51 | "import trax\n", 52 | "from trax import layers as tl\n", 53 | "from trax.fastmath import numpy as jnp\n", 54 | "\n", 55 | "# to print the entire np array\n", 56 | "np.set_printoptions(threshold=sys.maxsize)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "## Sentence gets embedded, add positional encoding\n", 64 | "Embed the words, then create vectors representing each word's position in each sentence $\\in \\{ 0, 1, 2, \\ldots , K\\}$ = `range(max_len)`, where `max_len` = $K+1$)" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 2, 70 | "metadata": {}, 71 | "outputs": [], 72 | "source": [ 73 | "def PositionalEncoder(vocab_size, d_model, dropout, max_len, mode):\n", 74 | " \"\"\"Returns a list of layers that: \n", 75 | " 1. takes a block of text as input, \n", 76 | " 2. embeds the words in that text, and \n", 77 | " 3. adds positional encoding, \n", 78 | " i.e. associates a number in range(max_len) with \n", 79 | " each word in each sentence of embedded input text \n", 80 | " \n", 81 | " The input is a list of tokenized blocks of text\n", 82 | " \n", 83 | " Args:\n", 84 | " vocab_size (int): vocab size.\n", 85 | " d_model (int): depth of embedding.\n", 86 | " dropout (float): dropout rate (how much to drop out).\n", 87 | " max_len (int): maximum symbol length for positional encoding.\n", 88 | " mode (str): 'train' or 'eval'.\n", 89 | " \"\"\"\n", 90 | " # Embedding inputs and positional encoder\n", 91 | " return [ \n", 92 | " # Add embedding layer of dimension (vocab_size, d_model)\n", 93 | " tl.Embedding(vocab_size, d_model), \n", 94 | " # Use dropout with rate and mode specified\n", 95 | " tl.Dropout(rate=dropout, mode=mode), \n", 96 | " # Add positional encoding layer with maximum input length and mode specified\n", 97 | " tl.PositionalEncoding(max_len=max_len, mode=mode)] " 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "## Multi-head causal attention\n", 105 | "\n", 106 | "The layers and array dimensions involved in multi-head causal attention (which looks at previous words in the input text) are summarized in the figure below: \n", 107 | "\n", 108 | "\n", 109 | "\n", 110 | "`tl.CausalAttention()` does all of this for you! You might be wondering, though, whether you need to pass in your input text 3 times, since for causal attention, the queries Q, keys K, and values V all come from the same source. Fortunately, `tl.CausalAttention()` handles this as well by making use of the [`tl.Branch()`](https://trax-ml.readthedocs.io/en/latest/trax.layers.html#module-trax.layers.combinators) combinator layer. In general, each branch within a `tl.Branch()` layer performs parallel operations on copies of the layer's inputs. For causal attention, each branch (representing Q, K, and V) applies a linear transformation (i.e. a dense layer without a subsequent activation) to its copy of the input, then splits that result into heads. You can see the syntax for this in the screenshot from the `trax.layers.attention.py` [source code](https://github.com/google/trax/blob/master/trax/layers/attention.py) below: \n", 111 | "\n", 112 | "" 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "## Feed-forward layer \n", 120 | "* Typically ends with a ReLU activation, but we'll leave open the possibility of a different activation\n", 121 | "* Most of the parameters are here" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 3, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "def FeedForward(d_model, d_ff, dropout, mode, ff_activation):\n", 131 | " \"\"\"Returns a list of layers that implements a feed-forward block.\n", 132 | "\n", 133 | " The input is an activation tensor.\n", 134 | "\n", 135 | " Args:\n", 136 | " d_model (int): depth of embedding.\n", 137 | " d_ff (int): depth of feed-forward layer.\n", 138 | " dropout (float): dropout rate (how much to drop out).\n", 139 | " mode (str): 'train' or 'eval'.\n", 140 | " ff_activation (function): the non-linearity in feed-forward layer.\n", 141 | "\n", 142 | " Returns:\n", 143 | " list: list of trax.layers.combinators.Serial that maps an activation tensor to an activation tensor.\n", 144 | " \"\"\"\n", 145 | " \n", 146 | " # Create feed-forward block (list) with two dense layers with dropout and input normalized\n", 147 | " return [ \n", 148 | " # Normalize layer inputs\n", 149 | " tl.LayerNorm(), \n", 150 | " # Add first feed forward (dense) layer (don't forget to set the correct value for n_units)\n", 151 | " tl.Dense(d_ff), \n", 152 | " # Add activation function passed in as a parameter (you need to call it!)\n", 153 | " ff_activation(), # Generally ReLU\n", 154 | " # Add dropout with rate and mode specified (i.e., don't use dropout during evaluation)\n", 155 | " tl.Dropout(rate=dropout, mode=mode), \n", 156 | " # Add second feed forward layer (don't forget to set the correct value for n_units)\n", 157 | " tl.Dense(d_model), \n", 158 | " # Add dropout with rate and mode specified (i.e., don't use dropout during evaluation)\n", 159 | " tl.Dropout(rate=dropout, mode=mode) \n", 160 | " ]" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "## Decoder block\n", 168 | "Here, we return a list containing two residual blocks. The first wraps around the causal attention layer, whose inputs are normalized and to which we apply dropout regulation. The second wraps around the feed-forward layer. You may notice that the second call to `tl.Residual()` doesn't call a normalization layer before calling the feed-forward layer. This is because the normalization layer is included in the feed-forward layer." 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 4, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "def DecoderBlock(d_model, d_ff, n_heads,\n", 178 | " dropout, mode, ff_activation):\n", 179 | " \"\"\"Returns a list of layers that implements a Transformer decoder block.\n", 180 | "\n", 181 | " The input is an activation tensor.\n", 182 | "\n", 183 | " Args:\n", 184 | " d_model (int): depth of embedding.\n", 185 | " d_ff (int): depth of feed-forward layer.\n", 186 | " n_heads (int): number of attention heads.\n", 187 | " dropout (float): dropout rate (how much to drop out).\n", 188 | " mode (str): 'train' or 'eval'.\n", 189 | " ff_activation (function): the non-linearity in feed-forward layer.\n", 190 | "\n", 191 | " Returns:\n", 192 | " list: list of trax.layers.combinators.Serial that maps an activation tensor to an activation tensor.\n", 193 | " \"\"\"\n", 194 | " \n", 195 | " # Add list of two Residual blocks: the attention with normalization and dropout and feed-forward blocks\n", 196 | " return [\n", 197 | " tl.Residual(\n", 198 | " # Normalize layer input\n", 199 | " tl.LayerNorm(), \n", 200 | " # Add causal attention \n", 201 | " tl.CausalAttention(d_model, n_heads=n_heads, dropout=dropout, mode=mode) \n", 202 | " ),\n", 203 | " tl.Residual(\n", 204 | " # Add feed-forward block\n", 205 | " # We don't need to normalize the layer inputs here. The feed-forward block takes care of that for us.\n", 206 | " FeedForward(d_model, d_ff, dropout, mode, ff_activation)\n", 207 | " ),\n", 208 | " ]" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "## The transformer decoder: putting it all together\n", 216 | "## A.k.a. repeat N times, dense layer and softmax for output" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": 5, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "def TransformerLM(vocab_size=33300,\n", 226 | " d_model=512,\n", 227 | " d_ff=2048,\n", 228 | " n_layers=6,\n", 229 | " n_heads=8,\n", 230 | " dropout=0.1,\n", 231 | " max_len=4096,\n", 232 | " mode='train',\n", 233 | " ff_activation=tl.Relu):\n", 234 | " \"\"\"Returns a Transformer language model.\n", 235 | "\n", 236 | " The input to the model is a tensor of tokens. (This model uses only the\n", 237 | " decoder part of the overall Transformer.)\n", 238 | "\n", 239 | " Args:\n", 240 | " vocab_size (int): vocab size.\n", 241 | " d_model (int): depth of embedding.\n", 242 | " d_ff (int): depth of feed-forward layer.\n", 243 | " n_layers (int): number of decoder layers.\n", 244 | " n_heads (int): number of attention heads.\n", 245 | " dropout (float): dropout rate (how much to drop out).\n", 246 | " max_len (int): maximum symbol length for positional encoding.\n", 247 | " mode (str): 'train', 'eval' or 'predict', predict mode is for fast inference.\n", 248 | " ff_activation (function): the non-linearity in feed-forward layer.\n", 249 | "\n", 250 | " Returns:\n", 251 | " trax.layers.combinators.Serial: A Transformer language model as a layer that maps from a tensor of tokens\n", 252 | " to activations over a vocab set.\n", 253 | " \"\"\"\n", 254 | " \n", 255 | " # Create stack (list) of decoder blocks with n_layers with necessary parameters\n", 256 | " decoder_blocks = [ \n", 257 | " DecoderBlock(d_model, d_ff, n_heads, dropout, mode, ff_activation) for _ in range(n_layers)] \n", 258 | "\n", 259 | " # Create the complete model as written in the figure\n", 260 | " return tl.Serial(\n", 261 | " # Use teacher forcing (feed output of previous step to current step)\n", 262 | " tl.ShiftRight(mode=mode), \n", 263 | " # Add embedding inputs and positional encoder\n", 264 | " PositionalEncoder(vocab_size, d_model, dropout, max_len, mode),\n", 265 | " # Add decoder blocks\n", 266 | " decoder_blocks, \n", 267 | " # Normalize layer\n", 268 | " tl.LayerNorm(), \n", 269 | "\n", 270 | " # Add dense layer of vocab_size (since need to select a word to translate to)\n", 271 | " # (a.k.a., logits layer. Note: activation already set by ff_activation)\n", 272 | " tl.Dense(vocab_size), \n", 273 | " # Get probabilities with Logsoftmax\n", 274 | " tl.LogSoftmax() \n", 275 | " )" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": {}, 281 | "source": [ 282 | "## Concluding remarks\n", 283 | "\n", 284 | "In this week's assignment, you'll see how to train a transformer decoder on the [cnn_dailymail](https://www.tensorflow.org/datasets/catalog/cnn_dailymail) dataset, available from TensorFlow Datasets (part of TensorFlow Data Services). Because training such a model from scratch is time-intensive, you'll use a pre-trained model to summarize documents later in the assignment. Due to time and storage concerns, we will also not train the decoder on a different summarization dataset in this lab. If you have the time and space, we encourage you to explore the other [summarization](https://www.tensorflow.org/datasets/catalog/overview#summarization) datasets at TensorFlow Datasets. Which of them might suit your purposes better than the `cnn_dailymail` dataset? Where else can you find datasets for text summarization models?" 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [] 293 | } 294 | ], 295 | "metadata": { 296 | "kernelspec": { 297 | "display_name": "Python 3", 298 | "language": "python", 299 | "name": "python3" 300 | }, 301 | "language_info": { 302 | "codemirror_mode": { 303 | "name": "ipython", 304 | "version": 3 305 | }, 306 | "file_extension": ".py", 307 | "mimetype": "text/x-python", 308 | "name": "python", 309 | "nbconvert_exporter": "python", 310 | "pygments_lexer": "ipython3", 311 | "version": "3.7.6" 312 | } 313 | }, 314 | "nbformat": 4, 315 | "nbformat_minor": 4 316 | } 317 | -------------------------------------------------------------------------------- /NLP with Probabilistic Models/Week_3/NLP_C2_W3_lecture_nb_03.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "\n", 8 | "# Out of vocabulary words (OOV)\n", 9 | "\n", 10 | "### Vocabulary\n", 11 | "In the video about the out of vocabulary words, you saw that the first step in dealing with the unknown words is to decide which words belong to the vocabulary. \n", 12 | "\n", 13 | "In the code assignment, you will try the method based on minimum frequency - all words appearing in the training set with frequency >= minimum frequency are added to the vocabulary.\n", 14 | "\n", 15 | "Here is a code for the other method, where the target size of the vocabulary is known in advance and the vocabulary is filled with words based on their frequency in the training set." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 3, 21 | "metadata": {}, 22 | "outputs": [ 23 | { 24 | "name": "stdout", 25 | "output_type": "stream", 26 | "text": [ 27 | "the new vocabulary containing 3 most frequent words: ['happy', 'because', 'learning']\n", 28 | "\n" 29 | ] 30 | } 31 | ], 32 | "source": [ 33 | "# build the vocabulary from M most frequent words\n", 34 | "# use Counter object from the collections library to find M most common words\n", 35 | "from collections import Counter\n", 36 | "\n", 37 | "# the target size of the vocabulary\n", 38 | "M = 3\n", 39 | "\n", 40 | "# pre-calculated word counts\n", 41 | "# Counter could be used to build this dictionary from the source corpus\n", 42 | "word_counts = {'happy': 5, 'because': 3, 'i': 2, 'am': 2, 'learning': 3, '.': 1}\n", 43 | "\n", 44 | "vocabulary = Counter(word_counts).most_common(M)\n", 45 | "\n", 46 | "# remove the frequencies and leave just the words\n", 47 | "vocabulary = [w[0] for w in vocabulary]\n", 48 | "\n", 49 | "print(f\"the new vocabulary containing {M} most frequent words: {vocabulary}\\n\") \n", 50 | " " 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "Now that the vocabulary is ready, you can use it to replace the OOV words with $$ as you saw in the lecture." 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": 4, 63 | "metadata": {}, 64 | "outputs": [ 65 | { 66 | "name": "stdout", 67 | "output_type": "stream", 68 | "text": [ 69 | "input sentence: ['am', 'i', 'learning']\n", 70 | "output sentence: ['', '', 'learning']\n" 71 | ] 72 | } 73 | ], 74 | "source": [ 75 | "# test if words in the input sentences are in the vocabulary, if OOV, print \n", 76 | "sentence = ['am', 'i', 'learning']\n", 77 | "output_sentence = []\n", 78 | "print(f\"input sentence: {sentence}\")\n", 79 | "\n", 80 | "for w in sentence:\n", 81 | " # test if word w is in vocabulary\n", 82 | " if w in vocabulary:\n", 83 | " output_sentence.append(w)\n", 84 | " else:\n", 85 | " output_sentence.append('')\n", 86 | " \n", 87 | "print(f\"output sentence: {output_sentence}\")" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "When building the vocabulary in the code assignment, you will need to know how to iterate through the word counts dictionary. \n", 95 | "\n", 96 | "Here is an example of a similar task showing how to go through all the word counts and print out only the words with the frequency equal to f. " 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 5, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "name": "stdout", 106 | "output_type": "stream", 107 | "text": [ 108 | "because\n", 109 | "learning\n" 110 | ] 111 | } 112 | ], 113 | "source": [ 114 | "# iterate through all word counts and print words with given frequency f\n", 115 | "f = 3\n", 116 | "\n", 117 | "word_counts = {'happy': 5, 'because': 3, 'i': 2, 'am': 2, 'learning':3, '.': 1}\n", 118 | "\n", 119 | "for word, freq in word_counts.items():\n", 120 | " if freq == f:\n", 121 | " print(word)" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": {}, 127 | "source": [ 128 | "As mentioned in the videos, if there are many $$ replacements in your train and test set, you may get a very low perplexity even though the model itself wouldn't be very helpful. \n", 129 | " \n", 130 | "Here is a sample code showing this unwanted effect. " 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 6, 136 | "metadata": {}, 137 | "outputs": [ 138 | { 139 | "name": "stdout", 140 | "output_type": "stream", 141 | "text": [ 142 | "perplexity for the training set: 1.2599210498948732\n", 143 | "perplexity for the training set with : 1.0\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "# many low perplexity \n", 149 | "training_set = ['i', 'am', 'happy', 'because','i', 'am', 'learning', '.']\n", 150 | "training_set_unk = ['i', 'am', '', '','i', 'am', '', '']\n", 151 | "\n", 152 | "test_set = ['i', 'am', 'learning']\n", 153 | "test_set_unk = ['i', 'am', '']\n", 154 | "\n", 155 | "M = len(test_set)\n", 156 | "probability = 1\n", 157 | "probability_unk = 1\n", 158 | "\n", 159 | "# pre-calculated probabilities\n", 160 | "bigram_probabilities = {('i', 'am'): 1.0, ('am', 'happy'): 0.5, ('happy', 'because'): 1.0, ('because', 'i'): 1.0, ('am', 'learning'): 0.5, ('learning', '.'): 1.0}\n", 161 | "bigram_probabilities_unk = {('i', 'am'): 1.0, ('am', ''): 1.0, ('', ''): 0.5, ('', 'i'): 0.25}\n", 162 | "\n", 163 | "# got through the test set and calculate its bigram probability\n", 164 | "for i in range(len(test_set) - 2 + 1):\n", 165 | " bigram = tuple(test_set[i: i + 2])\n", 166 | " probability = probability * bigram_probabilities[bigram]\n", 167 | " \n", 168 | " bigram_unk = tuple(test_set_unk[i: i + 2])\n", 169 | " probability_unk = probability_unk * bigram_probabilities_unk[bigram_unk]\n", 170 | "\n", 171 | "# calculate perplexity for both original test set and test set with \n", 172 | "perplexity = probability ** (-1 / M)\n", 173 | "perplexity_unk = probability_unk ** (-1 / M)\n", 174 | "\n", 175 | "print(f\"perplexity for the training set: {perplexity}\")\n", 176 | "print(f\"perplexity for the training set with : {perplexity_unk}\")\n" 177 | ] 178 | }, 179 | { 180 | "cell_type": "markdown", 181 | "metadata": {}, 182 | "source": [ 183 | "\n", 184 | "### Smoothing" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "Add-k smoothing was described as a method for smoothing of the probabilities for previously unseen n-grams. \n", 192 | "\n", 193 | "Here is an example code that shows how to implement add-k smoothing but also highlights a disadvantage of this method. The downside is that n-grams not previously seen in the training dataset get too high probability. \n", 194 | "\n", 195 | "In the code output bellow you'll see that a phrase that is in the training set gets the same probability as an unknown phrase." 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 7, 201 | "metadata": {}, 202 | "outputs": [ 203 | { 204 | "name": "stdout", 205 | "output_type": "stream", 206 | "text": [ 207 | "probability_known_trigram: 0.2\n", 208 | "probability_unknown_trigram: 0.2\n" 209 | ] 210 | } 211 | ], 212 | "source": [ 213 | "def add_k_smooting_probability(k, vocabulary_size, n_gram_count, n_gram_prefix_count):\n", 214 | " numerator = n_gram_count + k\n", 215 | " denominator = n_gram_prefix_count + k * vocabulary_size\n", 216 | " return numerator / denominator\n", 217 | "\n", 218 | "trigram_probabilities = {('i', 'am', 'happy') : 2}\n", 219 | "bigram_probabilities = {( 'am', 'happy') : 10}\n", 220 | "vocabulary_size = 5\n", 221 | "k = 1\n", 222 | "\n", 223 | "probability_known_trigram = add_k_smooting_probability(k, vocabulary_size, trigram_probabilities[('i', 'am', 'happy')], \n", 224 | " bigram_probabilities[( 'am', 'happy')])\n", 225 | "\n", 226 | "probability_unknown_trigram = add_k_smooting_probability(k, vocabulary_size, 0, 0)\n", 227 | "\n", 228 | "print(f\"probability_known_trigram: {probability_known_trigram}\")\n", 229 | "print(f\"probability_unknown_trigram: {probability_unknown_trigram}\")\n" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "\n", 237 | "### Back-off\n", 238 | "Back-off is a model generalization method that leverages information from lower order n-grams in case information about the high order n-grams is missing. For example, if the probability of an trigram is missing, use bigram information and so on.\n", 239 | "\n", 240 | "Here you can see an example of a simple back-off technique." 241 | ] 242 | }, 243 | { 244 | "cell_type": "code", 245 | "execution_count": 8, 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "besides the trigram ('are', 'you', 'happy') we also use bigram ('you', 'happy') and unigram (happy)\n", 253 | "\n", 254 | "probability for trigram ('are', 'you', 'happy') not found\n", 255 | "probability for bigram ('you', 'happy') not found\n", 256 | "probability for unigram happy found\n", 257 | "\n", 258 | "probability for trigram ('are', 'you', 'happy') estimated as 0.06400000000000002\n" 259 | ] 260 | } 261 | ], 262 | "source": [ 263 | "# pre-calculated probabilities of all types of n-grams\n", 264 | "trigram_probabilities = {('i', 'am', 'happy'): 0}\n", 265 | "bigram_probabilities = {( 'am', 'happy'): 0.3}\n", 266 | "unigram_probabilities = {'happy': 0.4}\n", 267 | "\n", 268 | "# this is the input trigram we need to estimate\n", 269 | "trigram = ('are', 'you', 'happy')\n", 270 | "\n", 271 | "# find the last bigram and unigram of the input\n", 272 | "bigram = trigram[1: 3]\n", 273 | "unigram = trigram[2]\n", 274 | "print(f\"besides the trigram {trigram} we also use bigram {bigram} and unigram ({unigram})\\n\")\n", 275 | "\n", 276 | "# 0.4 is used as an example, experimentally found for web-scale corpuses when using the \"stupid\" back-off\n", 277 | "lambda_factor = 0.4\n", 278 | "probability_hat_trigram = 0\n", 279 | "\n", 280 | "# search for first non-zero probability starting with trigram\n", 281 | "# to generalize this for any order of n-gram hierarchy, \n", 282 | "# you could loop through the probability dictionaries instead of if/else cascade\n", 283 | "if trigram not in trigram_probabilities or trigram_probabilities[trigram] == 0:\n", 284 | " print(f\"probability for trigram {trigram} not found\")\n", 285 | " \n", 286 | " if bigram not in bigram_probabilities or bigram_probabilities[bigram] == 0:\n", 287 | " print(f\"probability for bigram {bigram} not found\")\n", 288 | " \n", 289 | " if unigram in unigram_probabilities:\n", 290 | " print(f\"probability for unigram {unigram} found\\n\")\n", 291 | " probability_hat_trigram = lambda_factor * lambda_factor * unigram_probabilities[unigram]\n", 292 | " else:\n", 293 | " probability_hat_trigram = 0\n", 294 | " else:\n", 295 | " probability_hat_trigram = lambda_factor * bigram_probabilities[bigram]\n", 296 | "else:\n", 297 | " probability_hat_trigram = trigram_probabilities[trigram]\n", 298 | "\n", 299 | "print(f\"probability for trigram {trigram} estimated as {probability_hat_trigram}\")\n" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "\n", 307 | "### Interpolation\n", 308 | "The other method for using probabilities of lower order n-grams is the interpolation. In this case, you use weighted probabilities of n-grams of all orders every time, not just when high order information is missing. \n", 309 | "\n", 310 | "For example, you always combine trigram, bigram and unigram probability. You can see how this in the following code snippet." 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": null, 316 | "metadata": {}, 317 | "outputs": [], 318 | "source": [ 319 | "# pre-calculated probabilities of all types of n-grams\n", 320 | "trigram_probabilities = {('i', 'am', 'happy'): 0.15}\n", 321 | "bigram_probabilities = {( 'am', 'happy'): 0.3}\n", 322 | "unigram_probabilities = {'happy': 0.4}\n", 323 | "\n", 324 | "# the weights come from optimization on a validation set\n", 325 | "lambda_1 = 0.8\n", 326 | "lambda_2 = 0.15\n", 327 | "lambda_3 = 0.05\n", 328 | "\n", 329 | "# this is the input trigram we need to estimate\n", 330 | "trigram = ('i', 'am', 'happy')\n", 331 | "\n", 332 | "# find the last bigram and unigram of the input\n", 333 | "bigram = trigram[1: 3]\n", 334 | "unigram = trigram[2]\n", 335 | "print(f\"besides the trigram {trigram} we also use bigram {bigram} and unigram ({unigram})\\n\")\n", 336 | "\n", 337 | "# in the production code, you would need to check if the probability n-gram dictionary contains the n-gram\n", 338 | "probability_hat_trigram = lambda_1 * trigram_probabilities[trigram] \n", 339 | "+ lambda_2 * bigram_probabilities[bigram]\n", 340 | "+ lambda_3 * unigram_probabilities[unigram]\n", 341 | "\n", 342 | "print(f\"estimated probability of the input trigram {trigram} is {probability_hat_trigram}\")\n" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "That's it for week 3, you should be ready now for the code assignment. " 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": null, 355 | "metadata": {}, 356 | "outputs": [], 357 | "source": [] 358 | } 359 | ], 360 | "metadata": { 361 | "kernelspec": { 362 | "display_name": "Python 3", 363 | "language": "python", 364 | "name": "python3" 365 | }, 366 | "language_info": { 367 | "codemirror_mode": { 368 | "name": "ipython", 369 | "version": 3 370 | }, 371 | "file_extension": ".py", 372 | "mimetype": "text/x-python", 373 | "name": "python", 374 | "nbconvert_exporter": "python", 375 | "pygments_lexer": "ipython3", 376 | "version": "3.7.1" 377 | } 378 | }, 379 | "nbformat": 4, 380 | "nbformat_minor": 4 381 | } 382 | -------------------------------------------------------------------------------- /NLP with Probabilistic Models/Week_1/NLP_C2_W1_lecture_nb_01.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# NLP Course 2 Week 1 Lesson : Building The Model - Lecture Exercise 01\n", 8 | "Estimated Time: 10 minutes\n", 9 | "
\n", 10 | "# Vocabulary Creation \n", 11 | "Create a tiny vocabulary from a tiny corpus\n", 12 | "
\n", 13 | "It's time to start small !\n", 14 | "
\n", 15 | "### Imports and Data" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 1, 21 | "metadata": {}, 22 | "outputs": [], 23 | "source": [ 24 | "# imports\n", 25 | "import re # regular expression library; for tokenization of words\n", 26 | "from collections import Counter # collections library; counter: dict subclass for counting hashable objects\n", 27 | "import matplotlib.pyplot as plt # for data visualization" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 2, 33 | "metadata": {}, 34 | "outputs": [ 35 | { 36 | "name": "stdout", 37 | "output_type": "stream", 38 | "text": [ 39 | "red pink pink blue blue yellow ORANGE BLUE BLUE PINK\n", 40 | "string length : 52\n" 41 | ] 42 | } 43 | ], 44 | "source": [ 45 | "# the tiny corpus of text ! \n", 46 | "text = 'red pink pink blue blue yellow ORANGE BLUE BLUE PINK' # 🌈\n", 47 | "print(text)\n", 48 | "print('string length : ',len(text))" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "### Preprocessing" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 3, 61 | "metadata": {}, 62 | "outputs": [ 63 | { 64 | "name": "stdout", 65 | "output_type": "stream", 66 | "text": [ 67 | "red pink pink blue blue yellow orange blue blue pink\n", 68 | "string length : 52\n" 69 | ] 70 | } 71 | ], 72 | "source": [ 73 | "# convert all letters to lower case\n", 74 | "text_lowercase = text.lower()\n", 75 | "print(text_lowercase)\n", 76 | "print('string length : ',len(text_lowercase))" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 4, 82 | "metadata": {}, 83 | "outputs": [ 84 | { 85 | "name": "stdout", 86 | "output_type": "stream", 87 | "text": [ 88 | "['red', 'pink', 'pink', 'blue', 'blue', 'yellow', 'orange', 'blue', 'blue', 'pink']\n", 89 | "count : 10\n" 90 | ] 91 | } 92 | ], 93 | "source": [ 94 | "# some regex to tokenize the string to words and return them in a list\n", 95 | "words = re.findall(r'\\w+', text_lowercase)\n", 96 | "print(words)\n", 97 | "print('count : ',len(words))" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": {}, 103 | "source": [ 104 | "### Create Vocabulary\n", 105 | "Option 1 : A set of distinct words from the text" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 5, 111 | "metadata": {}, 112 | "outputs": [ 113 | { 114 | "name": "stdout", 115 | "output_type": "stream", 116 | "text": [ 117 | "{'red', 'blue', 'pink', 'orange', 'yellow'}\n", 118 | "count : 5\n" 119 | ] 120 | } 121 | ], 122 | "source": [ 123 | "# create vocab\n", 124 | "vocab = set(words)\n", 125 | "print(vocab)\n", 126 | "print('count : ',len(vocab))" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "### Add Information with Word Counts\n", 134 | "Option 2 : Two alternatives for including the word count as well" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 6, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "name": "stdout", 144 | "output_type": "stream", 145 | "text": [ 146 | "{'red': 1, 'pink': 3, 'blue': 4, 'yellow': 1, 'orange': 1}\n", 147 | "count : 5\n" 148 | ] 149 | } 150 | ], 151 | "source": [ 152 | "# create vocab including word count\n", 153 | "counts_a = dict()\n", 154 | "for w in words:\n", 155 | " counts_a[w] = counts_a.get(w,0)+1\n", 156 | "print(counts_a)\n", 157 | "print('count : ',len(counts_a))" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 7, 163 | "metadata": {}, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | "Counter({'blue': 4, 'pink': 3, 'red': 1, 'yellow': 1, 'orange': 1})\n", 170 | "count : 5\n" 171 | ] 172 | } 173 | ], 174 | "source": [ 175 | "# create vocab including word count using collections.Counter\n", 176 | "counts_b = dict()\n", 177 | "counts_b = Counter(words)\n", 178 | "print(counts_b)\n", 179 | "print('count : ',len(counts_b))" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 8, 185 | "metadata": {}, 186 | "outputs": [ 187 | { 188 | "data": { 189 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEqZJREFUeJzt3X2sXHd95/H3B8c0bAGlyLdNZPvGaOtCgYUkXEyi0G3KAkrStNnuZrvJtqTN7tYKDQLUJ9GHDYqqqtX+0d2GQFwvpElEgaXlQVZwCtFCNgnCIbZJHBIH1aJEsWIRE6iDSQp1+u0fc7zMTsaec++d62v//H5JR/c8/ObM98zM/cyZ35w5J1WFJKktz1vuAiRJ02e4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhp0ynLd8apVq2rdunXLdfeSdELasWPHN6tqZlK7ZQv3devWsX379uW6e0k6ISV5tE87u2UkqUGGuyQ1yHCXpAYZ7pLUIMNdkhrUO9yTrEjy5SS3jVmWJNcn2ZNkV5JzplumJGk+5rPn/k5g9xGWXQSs74aNwI2LrEuStAi9wj3JGuBngQ8cocmlwK01sA04LckZU6pRkjRPfffc/yfwO8A/HWH5auCxoem93TxJ0jKY+AvVJJcAT1TVjiQXHKnZmHnPufJ2ko0Mum2YnZ2dR5mj61nwTY87Xp9c0lLos+d+PvDzSb4OfBR4Y5IPjbTZC6wdml4DPD66oqraXFVzVTU3MzPx1AiSpAWaGO5V9btVtaaq1gGXA5+rql8eabYFuLI7auZc4EBV7Zt+uZKkPhZ84rAkVwNU1SZgK3AxsAd4GrhqKtVJkhZkXuFeVXcCd3bjm4bmF3DNNAuTJC2cv1CVpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBk0M9ySnJvlSkgeSPJTkujFtLkhyIMn93XDt0pQrSeqjz2X2vge8saoOJlkJ3JPk9qraNtLu7qq6ZPolSpLma2K4d9dHPdhNruyGWsqiJEmL06vPPcmKJPcDTwB3VNW9Y5qd13Xd3J7klVOtUpI0L73CvaqeraqzgDXAhiSvGmmyEzizql4DvBf41Lj1JNmYZHuS7fv3719M3ZKko5jX0TJV9ffAncCFI/OfqqqD3fhWYGWSVWNuv7mq5qpqbmZmZuFVS5KOqs/RMjNJTuvGXwC8CXhkpM3pSdKNb+jW++T0y5Uk9dHnaJkzgFuSrGAQ2h+rqtuSXA1QVZuAy4C3JTkEPANc3n0RK0laBn2OltkFnD1m/qah8RuAG6ZbmiRpofyFqiQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDWozzVUT03ypSQPJHkoyXVj2iTJ9Un2JNmV5JylKVeS1Eefa6h+D3hjVR1MshK4J8ntVbVtqM1FwPpueD1wY/dXkrQMJu6518DBbnJlN4xe/PpS4Nau7TbgtCRnTLdUSVJfffbcSbIC2AH8OPC+qrp3pMlq4LGh6b3dvH0j69kIbASYnZ1dYMni/25f7gqm56fnlrsCqUm9vlCtqmer6ixgDbAhyatGmmTczcasZ3NVzVXV3MzMzPyrlST1Mq+jZarq74E7gQtHFu0F1g5NrwEeX1RlkqQF63O0zEyS07rxFwBvAh4ZabYFuLI7auZc4EBV7UOStCz69LmfAdzS9bs/D/hYVd2W5GqAqtoEbAUuBvYATwNXLVG9kqQeJoZ7Ve0Czh4zf9PQeAHXTLc0SdJC+QtVSWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJalCfa6iuTfL5JLuTPJTknWPaXJDkQJL7u+HapSlXktRHn2uoHgJ+s6p2JnkRsCPJHVX18Ei7u6vqkumXKEmar4l77lW1r6p2duPfAXYDq5e6MEnSws2rzz3JOgYXy753zOLzkjyQ5PYkrzzC7Tcm2Z5k+/79++ddrCSpn97hnuSFwMeBd1XVUyOLdwJnVtVrgPcCnxq3jqraXFVzVTU3MzOz0JolSRP0CvckKxkE+19W1SdGl1fVU1V1sBvfCqxMsmqqlUqSeutztEyADwK7q+pPj9Dm9K4dSTZ0631ymoVKkvrrc7TM+cBbgQeT3N/N+z1gFqCqNgGXAW9Lcgh4Bri8qmoJ6pUk9TAx3KvqHiAT2twA3DCtoiRJi+MvVCWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBfa6hujbJ55PsTvJQkneOaZMk1yfZk2RXknOWplxJUh99rqF6CPjNqtqZ5EXAjiR3VNXDQ20uAtZ3w+uBG7u/kqRlMHHPvar2VdXObvw7wG5g9UizS4Fba2AbcFqSM6ZerSSpl3n1uSdZB5wN3DuyaDXw2ND0Xp77BkCSjUm2J9m+f//++VUqSeqtd7gneSHwceBdVfXU6OIxN6nnzKjaXFVzVTU3MzMzv0olSb31CvckKxkE+19W1SfGNNkLrB2aXgM8vvjyJEkL0edomQAfBHZX1Z8eodkW4MruqJlzgQNVtW+KdUqS5qHP0TLnA28FHkxyfzfv94BZgKraBGwFLgb2AE8DV02/VElSXxPDvaruYXyf+nCbAq6ZVlGSpMXxF6qS1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1yHCXpAb1uczeTUmeSPKVIyy/IMmBJPd3w7XTL1OSNB99LrN3M3ADcOtR2txdVZdMpSJJ0qJN3HOvqruAbx2DWiRJUzKtPvfzkjyQ5PYkr5zSOiVJC9SnW2aSncCZVXUwycXAp4D14xom2QhsBJidnZ3CXUuSxln0nntVPVVVB7vxrcDKJKuO0HZzVc1V1dzMzMxi71qSdASLDvckpydJN76hW+eTi12vJGnhJnbLJPkIcAGwKsle4D3ASoCq2gRcBrwtySHgGeDyqqolq1iSNNHEcK+qKyYsv4HBoZKSpOOEv1CVpAYZ7pLUIMNdkhpkuEtSgwx3SWqQ4S5JDTLcJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBk0M9yQ3JXkiyVeOsDxJrk+yJ8muJOdMv0xJ0nz02XO/GbjwKMsvAtZ3w0bgxsWXJUlajInhXlV3Ad86SpNLgVtrYBtwWpIzplWgJGn+ptHnvhp4bGh6bzdPkrRMTpnCOjJmXo1tmGxk0HXD7OzsFO5aJ52Me7mdoGrsv8kErWz/Arb9w61sO/CfFvLcz8809tz3AmuHptcAj49rWFWbq2ququZmZmamcNeSpHGmEe5bgCu7o2bOBQ5U1b4prFeStEATu2WSfAS4AFiVZC/wHmAlQFVtArYCFwN7gKeBq5aqWElSPxPDvaqumLC8gGumVpEkadH8haokNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUGGuyQ1qFe4J7kwyVeT7Eny7jHLL0hyIMn93XDt9EuVJPXV5xqqK4D3AW8G9gL3JdlSVQ+PNL27qi5ZgholSfPUZ899A7Cnqr5WVd8HPgpcurRlSZIWo0+4rwYeG5re280bdV6SB5LcnuSVU6lOkrQgE7tlgIyZVyPTO4Ezq+pgkouBTwHrn7OiZCOwEWB2dnaepUqS+uqz574XWDs0vQZ4fLhBVT1VVQe78a3AyiSrRldUVZuraq6q5mZmZhZRtiTpaPqE+33A+iQvTfJ84HJgy3CDJKcnSTe+oVvvk9MuVpLUz8Rumao6lOTtwGeAFcBNVfVQkqu75ZuAy4C3JTkEPANcXlWjXTeSpGOkT5/74a6WrSPzNg2N3wDcMN3SJEkL5S9UJalBhrskNchwl6QGGe6S1CDDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQYa7JDXIcJekBhnuktQgw12SGmS4S1KDDHdJapDhLkkNMtwlqUG9wj3JhUm+mmRPknePWZ4k13fLdyU5Z/qlSpL6mhjuSVYA7wMuAl4BXJHkFSPNLgLWd8NG4MYp1ylJmoc+e+4bgD1V9bWq+j7wUeDSkTaXArfWwDbgtCRnTLlWSVJPfcJ9NfDY0PTebt5820iSjpFTerTJmHm1gDYk2cig2wbgYJKv9rj/5bQK+OZS3kHGPXLHhyXf9uPc0m//8fvkH4Pn/mTeduCXFrX9Z/Zp1Cfc9wJrh6bXAI8voA1VtRnY3Kew40GS7VU1t9x1LIeTedvh5N5+t72Nbe/TLXMfsD7JS5M8H7gc2DLSZgtwZXfUzLnAgaraN+VaJUk9Tdxzr6pDSd4OfAZYAdxUVQ8lubpbvgnYClwM7AGeBq5aupIlSZP06ZahqrYyCPDheZuGxgu4ZrqlHRdOmC6kJXAybzuc3Nvvtjcgg1yWJLXE0w9IUoNOynBPsi7JV8bMvzNJE9+Uz1eSD4z55fFom5uTXHasajreJLkgyW3LXce0JTnY/R37f6ETU68+d7Wvqv7rctewXJKEQRflPy13LVo6J9vzfFLuuXdOSXJLd6Kzv07yL4YXHt6b6cYvS3JzNz6T5ONJ7uuG849x3YvS7Z09Mrrtw59akhxM8kdJHkiyLcmPjVnPH3Z78ifka6h7HHYneT+wE3hrki8m2Znkr5K8sGt3Yfd43QP8u2UtuqfuuXnn0PQfJXlHkt/uXrO7klw3YR2nJvmLJA8m+XKSn+nmb03y6m78y0muHbrPZd9BSPIbSb7SDe8a8zyvTXJjku1JHhp+HJJ8Pcl13WvgwSQv7+bPJLmjm//nSR5Nsqpb9stJvpTk/m7ZiuXZ8uc6If8xp+RlwOaqejXwFPDrPW/3Z8D/qKrXAf8e+MAS1beUJm37DwPbquo1wF3Arw0vTPLfgR8FrjrB94JeBtwKvBn4L8CbquocYDvwG0lOBf4X8HPATwGnL1eh8/RB4FcAujffy4FvMDix3wbgLOC1Sf71UdZxDUBV/SvgCuCW7vG4C/ipJC8GDgGHd27eANw9/U3pL8lrGRyG/XrgXAav2x+he56r6uyqehT4/e6HSq8Gfvrwm1Xnm91r4Ebgt7p57wE+183/JDDb3d9PAv8ROL+qzgKeBX5piTezt5O5W+axqvpCN/4h4B09b/cm4BX5wU/HX5zkRVX1nWkXuIQmbfv3gcN9yzsYhN9h/w24t6o2cuJ7tKq2JbmEwRlPv9A9r88Hvgi8HPi7qvpbgCQf4genzzhuVdXXkzyZ5Gzgx4AvA68D3tKNA7yQQdjfdYTVvAF4b7e+R5I8CvwEgwB/B/B3wKeBN3efetdV1XKfTuQNwCer6rsAST7B4E350e6Ehof9YganQjkFOIPBc7+rW/aJ7u8OfvBJ7Q3ALwBU1d8k+XY3/98ArwXu6143LwCeWILtWpCTOdxHjwE92vSpQ+PPA86rqmeWpKpjY9K2/2P94BjZZ/n/Xyf3Mdjre0lVfWupCjxGvtv9DXBHVV0xvDDJWYw5R9IJ4gPArzL4tHETgyD646r68563P9LJT+4D5oCvAXcwOBfLrzEIw+V2pJq/+/8aJC9lsEf+uqr6dtfdOvz//b3u7/Dr/kjrDXBLVf3ugiteQidzt8xskvO68SuAe0aWfyPJT3Yfa39haP5ngbcfnugC4EQzaduP5m+APwE+neRFU69seWwDzk/y4wDddxA/ATwCvDTJv+zaXXGkFRyHPglcyGCP/TPd8J+HvktYneRHj3L7u+i6GLrHYhb4anfa78eAX2TwuN3NICyXtUumcxfwb7vn74cZ/N+O1vViBmF/oPsu6aIe672HwfaS5C0MunoA/g9w2eHHMclLkvQ6qdexcDKH+27gV5LsAl7Ccy8w8m4GXROfA4bPk/MOYK77Uuph4OpjUeyUTdr2o6qqv2LQF70lyQuWoL5jqqr2M9jL/Uj3mGwDXl5V/8CgG+bT3Reqjy5flfPThfDngY9V1bNV9Vngw8AXkzwI/DVwtDfn9wMrurb/G/jVqjq8V3s38I2qerobX8NxEO5VtRO4GfgScC+DTy/fHmnzAIOuqYcYfKL5ApNdB7wlyU4Gbwb7gO9U1cPAHwCf7V43dzDo5jku+AvVk0ySdcBtVfWqZS5FS6j7xLkT+A+HvzPQwiT5IeDZ7jxb5wE3dl+gHtdO5j53qUkZ/BjtNgZfLhrsizcLfKx7w/w+I0ePHa/cc5ekBp3Mfe6S1CzDXZIaZLhLUoMMd0lqkOEuSQ0y3CWpQf8MjJQFBhqpBowAAAAASUVORK5CYII=\n", 190 | "text/plain": [ 191 | "
" 192 | ] 193 | }, 194 | "metadata": { 195 | "needs_background": "light" 196 | }, 197 | "output_type": "display_data" 198 | } 199 | ], 200 | "source": [ 201 | "# barchart of sorted word counts\n", 202 | "d = {'blue': counts_b['blue'], 'pink': counts_b['pink'], 'red': counts_b['red'], 'yellow': counts_b['yellow'], 'orange': counts_b['orange']}\n", 203 | "plt.bar(range(len(d)), list(d.values()), align='center', color=d.keys())\n", 204 | "_ = plt.xticks(range(len(d)), list(d.keys()))" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "### Ungraded Exercise\n", 212 | "Note that `counts_b`, above, returned by `collections.Counter` is sorted by word count\n", 213 | "\n", 214 | "Can you modify the tiny corpus of ***text*** so that a new color appears \n", 215 | "between ***pink*** and ***red*** in `counts_b` ?\n", 216 | "\n", 217 | "Do you need to run all the cells again, or just specific ones ? " 218 | ] 219 | }, 220 | { 221 | "cell_type": "code", 222 | "execution_count": 9, 223 | "metadata": {}, 224 | "outputs": [ 225 | { 226 | "name": "stdout", 227 | "output_type": "stream", 228 | "text": [ 229 | "counts_b : Counter({'blue': 4, 'pink': 3, 'red': 1, 'yellow': 1, 'orange': 1})\n", 230 | "count : 5\n" 231 | ] 232 | } 233 | ], 234 | "source": [ 235 | "print('counts_b : ', counts_b)\n", 236 | "print('count : ', len(counts_b))" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "Expected Outcome:\n", 244 | "\n", 245 | "counts_b : Counter({'blue': 4, 'pink': 3, **'your_new_color_here': 2**, red': 1, 'yellow': 1, 'orange': 1})\n", 246 | "
\n", 247 | "count : 6" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 10, 253 | "metadata": {}, 254 | "outputs": [ 255 | { 256 | "data": { 257 | "text/plain": [ 258 | "{'red': 1, 'pink': 3, 'blue': 4, 'yellow': 1, 'orange': 1}" 259 | ] 260 | }, 261 | "execution_count": 10, 262 | "metadata": {}, 263 | "output_type": "execute_result" 264 | } 265 | ], 266 | "source": [ 267 | "counts_b.update()" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "### Summary\n", 275 | "\n", 276 | "This is a tiny example but the methodology scales very well.\n", 277 | "
\n", 278 | "In the assignment you will create a large vocabulary of thousands of words, from a corpus\n", 279 | "
\n", 280 | "of tens of thousands or words! But the mechanics are exactly the same. \n", 281 | "
\n", 282 | "The only extra things to pay attention to should be; run time, memory management and the vocab data structure.\n", 283 | "
\n", 284 | "So the choice of approach used in code blocks `counts_a` vs `counts_b`, above, will be important." 285 | ] 286 | }, 287 | { 288 | "cell_type": "code", 289 | "execution_count": null, 290 | "metadata": {}, 291 | "outputs": [], 292 | "source": [] 293 | } 294 | ], 295 | "metadata": { 296 | "kernelspec": { 297 | "display_name": "Python 3", 298 | "language": "python", 299 | "name": "python3" 300 | }, 301 | "language_info": { 302 | "codemirror_mode": { 303 | "name": "ipython", 304 | "version": 3 305 | }, 306 | "file_extension": ".py", 307 | "mimetype": "text/x-python", 308 | "name": "python", 309 | "nbconvert_exporter": "python", 310 | "pygments_lexer": "ipython3", 311 | "version": "3.7.1" 312 | } 313 | }, 314 | "nbformat": 4, 315 | "nbformat_minor": 2 316 | } 317 | -------------------------------------------------------------------------------- /NLP with Sequence Models/Week_4/C3_W4_Lecture_Notebook_Modified_Triplet_Loss.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Modified Triplet Loss : Ungraded Lecture Notebook\n", 8 | "In this notebook you'll see how to calculate the full triplet loss, step by step, including the mean negative and the closest negative. You'll also calculate the matrix of similarity scores.\n", 9 | "\n", 10 | "## Background\n", 11 | "This is the original triplet loss function:\n", 12 | "\n", 13 | "$\\mathcal{L_\\mathrm{Original}} = \\max{(\\mathrm{s}(A,N) -\\mathrm{s}(A,P) +\\alpha, 0)}$\n", 14 | "\n", 15 | "It can be improved by including the mean negative and the closest negative, to create a new full loss function. The inputs are the Anchor $\\mathrm{A}$, Positive $\\mathrm{P}$ and Negative $\\mathrm{N}$.\n", 16 | "\n", 17 | "$\\mathcal{L_\\mathrm{1}} = \\max{(mean\\_neg -\\mathrm{s}(A,P) +\\alpha, 0)}$\n", 18 | "\n", 19 | "$\\mathcal{L_\\mathrm{2}} = \\max{(closest\\_neg -\\mathrm{s}(A,P) +\\alpha, 0)}$\n", 20 | "\n", 21 | "$\\mathcal{L_\\mathrm{Full}} = \\mathcal{L_\\mathrm{1}} + \\mathcal{L_\\mathrm{2}}$\n", 22 | "\n", 23 | "Let me show you what that means exactly, and how to calculate each step.\n", 24 | "\n", 25 | "## Imports" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 1, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "import numpy as np" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "## Similarity Scores\n", 42 | "The first step is to calculate the matrix of similarity scores using cosine similarity so that you can look up $\\mathrm{s}(A,P)$, $\\mathrm{s}(A,N)$ as needed for the loss formulas.\n", 43 | "\n", 44 | "### Two Vectors\n", 45 | "First I'll show you how to calculate the similarity score, using cosine similarity, for 2 vectors.\n", 46 | "\n", 47 | "$\\mathrm{s}(v_1,v_2) = \\mathrm{cosine \\ similarity}(v_1,v_2) = \\frac{v_1 \\cdot v_2}{||v_1||~||v_2||}$\n", 48 | "* Try changing the values in the second vector to see how it changes the cosine similarity.\n", 49 | "\n", 50 | "\n" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": { 57 | "tags": [] 58 | }, 59 | "outputs": [ 60 | { 61 | "name": "stdout", 62 | "output_type": "stream", 63 | "text": [ 64 | "-- Inputs --\n", 65 | "v1 : [1. 2. 3.]\n", 66 | "v2 : [1. 2. 3.5] \n", 67 | "\n", 68 | "-- Outputs --\n", 69 | "cosine similarity : 0.9974086507360697\n" 70 | ] 71 | } 72 | ], 73 | "source": [ 74 | "# Two vector example\n", 75 | "# Input data\n", 76 | "print(\"-- Inputs --\")\n", 77 | "v1 = np.array([1, 2, 3], dtype=float)\n", 78 | "v2 = np.array([1, 2, 3.5]) # notice the 3rd element is offset by 0.5\n", 79 | "### START CODE HERE ###\n", 80 | "# Try modifying the vector v2 to see how it impacts the cosine similarity\n", 81 | "# v2 = v1 # identical vector\n", 82 | "# v2 = v1 * -1 # opposite vector\n", 83 | "# v2 = np.array([0,-42,1]) # random example\n", 84 | "### END CODE HERE ###\n", 85 | "print(\"v1 :\", v1)\n", 86 | "print(\"v2 :\", v2, \"\\n\")\n", 87 | "\n", 88 | "# Similarity score\n", 89 | "def cosine_similarity(v1, v2):\n", 90 | " numerator = np.dot(v1, v2)\n", 91 | " denominator = np.sqrt(np.dot(v1, v1)) * np.sqrt(np.dot(v2, v2))\n", 92 | " return numerator / denominator\n", 93 | "\n", 94 | "print(\"-- Outputs --\")\n", 95 | "print(\"cosine similarity :\", cosine_similarity(v1, v2))" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "### Two Batches of Vectors\n", 103 | "Now i'll show you how to calculate the similarity scores, using cosine similarity, for 2 batches of vectors. These are rows of individual vectors, just like in the example above, but stacked vertically into a matrix. They would look like the image below for a batch size (row count) of 4 and embedding size (column count) of 5.\n", 104 | "\n", 105 | "The data is setup so that $v_{1\\_1}$ and $v_{2\\_1}$ represent duplicate inputs, but they are not duplicates with any other rows in the batch. This means $v_{1\\_1}$ and $v_{2\\_1}$ (green and green) have more similar vectors than say $v_{1\\_1}$ and $v_{2\\_2}$ (green and magenta).\n", 106 | "\n", 107 | "I'll show you two different methods for calculating the matrix of similarities from 2 batches of vectors.\n", 108 | "\n", 109 | "" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 3, 115 | "metadata": { 116 | "lines_to_next_cell": 2, 117 | "tags": [] 118 | }, 119 | "outputs": [ 120 | { 121 | "name": "stdout", 122 | "output_type": "stream", 123 | "text": [ 124 | "-- Inputs --\n", 125 | "v1 :\n", 126 | "[[ 1 2 3]\n", 127 | " [ 9 8 7]\n", 128 | " [-1 -4 -2]\n", 129 | " [ 1 -7 2]] \n", 130 | "\n", 131 | "v2 :\n", 132 | "[[ 2.44637227 8.0875426 3.43794167]\n", 133 | " [ 5.81933419 11.17931746 2.02086736]\n", 134 | " [-2.45693021 -2.33995399 -2.29832268]\n", 135 | " [-3.44095274 -6.79001593 3.26547968]] \n", 136 | "\n", 137 | "batch sizes match : True \n", 138 | "\n", 139 | "-- Outputs --\n", 140 | "option 1 : loop\n", 141 | "[[ 0.84775304 0.71693898 -0.91510344 -0.23310934]\n", 142 | " [ 0.87192477 0.87720801 -0.99721007 -0.54113799]\n", 143 | " [-0.99688437 -0.93307354 0.87399572 0.63413341]\n", 144 | " [-0.70547733 -0.72916751 0.30968768 0.83164778]] \n", 145 | "\n", 146 | "option 2 : vec norm & dot product\n", 147 | "[[ 0.84775304 0.71693898 -0.91510344 -0.23310934]\n", 148 | " [ 0.87192477 0.87720801 -0.99721007 -0.54113799]\n", 149 | " [-0.99688437 -0.93307354 0.87399572 0.63413341]\n", 150 | " [-0.70547733 -0.72916751 0.30968768 0.83164778]] \n", 151 | "\n", 152 | "outputs are the same : True\n" 153 | ] 154 | } 155 | ], 156 | "source": [ 157 | "# Two batches of vectors example\n", 158 | "# Input data\n", 159 | "print(\"-- Inputs --\")\n", 160 | "v1_1 = np.array([1, 2, 3])\n", 161 | "v1_2 = np.array([9, 8, 7])\n", 162 | "v1_3 = np.array([-1, -4, -2])\n", 163 | "v1_4 = np.array([1, -7, 2])\n", 164 | "v1 = np.vstack([v1_1, v1_2, v1_3, v1_4])\n", 165 | "print(\"v1 :\")\n", 166 | "print(v1, \"\\n\")\n", 167 | "v2_1 = v1_1 + np.random.normal(0, 2, 3) # add some noise to create approximate duplicate\n", 168 | "v2_2 = v1_2 + np.random.normal(0, 2, 3)\n", 169 | "v2_3 = v1_3 + np.random.normal(0, 2, 3)\n", 170 | "v2_4 = v1_4 + np.random.normal(0, 2, 3)\n", 171 | "v2 = np.vstack([v2_1, v2_2, v2_3, v2_4])\n", 172 | "print(\"v2 :\")\n", 173 | "print(v2, \"\\n\")\n", 174 | "\n", 175 | "# Batch sizes must match\n", 176 | "b = len(v1)\n", 177 | "print(\"batch sizes match :\", b == len(v2), \"\\n\")\n", 178 | "\n", 179 | "# Similarity scores\n", 180 | "print(\"-- Outputs --\")\n", 181 | "# Option 1 : nested loops and the cosine similarity function\n", 182 | "sim_1 = np.zeros([b, b]) # empty array to take similarity scores\n", 183 | "# Loop\n", 184 | "for row in range(0, sim_1.shape[0]):\n", 185 | " for col in range(0, sim_1.shape[1]):\n", 186 | " sim_1[row, col] = cosine_similarity(v1[row], v2[col])\n", 187 | "\n", 188 | "print(\"option 1 : loop\")\n", 189 | "print(sim_1, \"\\n\")\n", 190 | "\n", 191 | "# Option 2 : vector normalization and dot product\n", 192 | "def norm(x):\n", 193 | " return x / np.sqrt(np.sum(x * x, axis=1, keepdims=True))\n", 194 | "\n", 195 | "sim_2 = np.dot(norm(v1), norm(v2).T)\n", 196 | "\n", 197 | "print(\"option 2 : vec norm & dot product\")\n", 198 | "print(sim_2, \"\\n\")\n", 199 | "\n", 200 | "# Check\n", 201 | "print(\"outputs are the same :\", np.allclose(sim_1, sim_2))" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "## Hard Negative Mining\n", 209 | "\n", 210 | "I'll now show you how to calculate the mean negative $mean\\_neg$ and the closest negative $close\\_neg$ used in calculating $\\mathcal{L_\\mathrm{1}}$ and $\\mathcal{L_\\mathrm{2}}$.\n", 211 | "\n", 212 | "\n", 213 | "$\\mathcal{L_\\mathrm{1}} = \\max{(mean\\_neg -\\mathrm{s}(A,P) +\\alpha, 0)}$\n", 214 | "\n", 215 | "$\\mathcal{L_\\mathrm{2}} = \\max{(closest\\_neg -\\mathrm{s}(A,P) +\\alpha, 0)}$\n", 216 | "\n", 217 | "You'll do this using the matrix of similarity scores you already know how to make, like the example below for a batch size of 4. The diagonal of the matrix contains all the $\\mathrm{s}(A,P)$ values, similarities from duplicate question pairs (aka Positives). This is an important attribute for the calculations to follow.\n", 218 | "\n", 219 | "\n", 220 | "\n", 221 | "\n", 222 | "### Mean Negative\n", 223 | "$mean\\_neg$ is the average of the off diagonals, the $\\mathrm{s}(A,N)$ values, for each row.\n", 224 | "\n", 225 | "### Closest Negative\n", 226 | "$closest\\_neg$ is the largest off diagonal value, $\\mathrm{s}(A,N)$, that is smaller than the diagonal $\\mathrm{s}(A,P)$ for each row.\n", 227 | "* Try using a different matrix of similarity scores. " 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 7, 233 | "metadata": { 234 | "tags": [] 235 | }, 236 | "outputs": [ 237 | { 238 | "name": "stdout", 239 | "output_type": "stream", 240 | "text": [ 241 | "-- Inputs --\n", 242 | "sim :\n", 243 | "[[ 0.9 -0.8 0.3 -0.5]\n", 244 | " [-0.4 0.5 0.1 -0.1]\n", 245 | " [ 0.3 0.1 -0.4 -0.8]\n", 246 | " [-0.5 -0.2 -0.7 0.5]]\n", 247 | "shape : (4, 4) \n", 248 | "\n", 249 | "sim_ap :\n", 250 | "[[ 0.9 0. 0. 0. ]\n", 251 | " [ 0. 0.5 0. 0. ]\n", 252 | " [ 0. 0. -0.4 0. ]\n", 253 | " [ 0. 0. 0. 0.5]] \n", 254 | "\n", 255 | "sim_an :\n", 256 | "[[ 0. -0.8 0.3 -0.5]\n", 257 | " [-0.4 0. 0.1 -0.1]\n", 258 | " [ 0.3 0.1 0. -0.8]\n", 259 | " [-0.5 -0.2 -0.7 0. ]] \n", 260 | "\n", 261 | "-- Outputs --\n", 262 | "mean_neg :\n", 263 | "[[-0.33333333]\n", 264 | " [-0.13333333]\n", 265 | " [-0.13333333]\n", 266 | " [-0.46666667]] \n", 267 | "\n", 268 | "closest_neg :\n", 269 | "[[ 0.3]\n", 270 | " [ 0.1]\n", 271 | " [-0.8]\n", 272 | " [-0.2]] \n", 273 | "\n" 274 | ] 275 | } 276 | ], 277 | "source": [ 278 | "# Hardcoded matrix of similarity scores\n", 279 | "sim_hardcoded = np.array(\n", 280 | " [\n", 281 | " [0.9, -0.8, 0.3, -0.5],\n", 282 | " [-0.4, 0.5, 0.1, -0.1],\n", 283 | " [0.3, 0.1, -0.4, -0.8],\n", 284 | " [-0.5, -0.2, -0.7, 0.5],\n", 285 | " ]\n", 286 | ")\n", 287 | "\n", 288 | "sim = sim_hardcoded\n", 289 | "### START CODE HERE ###\n", 290 | "# Try using different values for the matrix of similarity scores\n", 291 | "# sim = 2 * np.random.random_sample((b,b)) -1 # random similarity scores between -1 and 1\n", 292 | "# sim = sim_2 # the matrix calculated previously\n", 293 | "### END CODE HERE ###\n", 294 | "\n", 295 | "# Batch size\n", 296 | "b = sim.shape[0]\n", 297 | "\n", 298 | "print(\"-- Inputs --\")\n", 299 | "print(\"sim :\")\n", 300 | "print(sim)\n", 301 | "print(\"shape :\", sim.shape, \"\\n\")\n", 302 | "\n", 303 | "# Positives\n", 304 | "# All the s(A,P) values : similarities from duplicate question pairs (aka Positives)\n", 305 | "# These are along the diagonal\n", 306 | "sim_ap = np.diag(sim)\n", 307 | "print(\"sim_ap :\")\n", 308 | "print(np.diag(sim_ap), \"\\n\")\n", 309 | "\n", 310 | "# Negatives\n", 311 | "# all the s(A,N) values : similarities the non duplicate question pairs (aka Negatives)\n", 312 | "# These are in the off diagonals\n", 313 | "sim_an = sim - np.diag(sim_ap)\n", 314 | "print(\"sim_an :\")\n", 315 | "print(sim_an, \"\\n\")\n", 316 | "\n", 317 | "print(\"-- Outputs --\")\n", 318 | "# Mean negative\n", 319 | "# Average of the s(A,N) values for each row\n", 320 | "mean_neg = np.sum(sim_an, axis=1, keepdims=True) / (b - 1)\n", 321 | "print(\"mean_neg :\")\n", 322 | "print(mean_neg, \"\\n\")\n", 323 | "\n", 324 | "# Closest negative\n", 325 | "# Max s(A,N) that is <= s(A,P) for each row\n", 326 | "mask_1 = np.identity(b) == 1 # mask to exclude the diagonal\n", 327 | "mask_2 = sim_an > sim_ap.reshape(b, 1) # mask to exclude sim_an > sim_ap\n", 328 | "mask = mask_1 | mask_2\n", 329 | "sim_an_masked = np.copy(sim_an) # create a copy to preserve sim_an\n", 330 | "sim_an_masked[mask] = -2\n", 331 | "\n", 332 | "closest_neg = np.max(sim_an_masked, axis=1, keepdims=True)\n", 333 | "print(\"closest_neg :\")\n", 334 | "print(closest_neg, \"\\n\")" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "## The Loss Functions\n", 342 | "\n", 343 | "The last step is to calculate the loss functions.\n", 344 | "\n", 345 | "$\\mathcal{L_\\mathrm{1}} = \\max{(mean\\_neg -\\mathrm{s}(A,P) +\\alpha, 0)}$\n", 346 | "\n", 347 | "$\\mathcal{L_\\mathrm{2}} = \\max{(closest\\_neg -\\mathrm{s}(A,P) +\\alpha, 0)}$\n", 348 | "\n", 349 | "$\\mathcal{L_\\mathrm{Full}} = \\mathcal{L_\\mathrm{1}} + \\mathcal{L_\\mathrm{2}}$" 350 | ] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "execution_count": 8, 355 | "metadata": { 356 | "tags": [] 357 | }, 358 | "outputs": [ 359 | { 360 | "name": "stdout", 361 | "output_type": "stream", 362 | "text": [ 363 | "-- Outputs --\n", 364 | "loss full :\n", 365 | "[[0. ]\n", 366 | " [0. ]\n", 367 | " [0.51666667]\n", 368 | " [0. ]] \n", 369 | "\n", 370 | "cost : 0.517\n" 371 | ] 372 | } 373 | ], 374 | "source": [ 375 | "# Alpha margin\n", 376 | "alpha = 0.25\n", 377 | "\n", 378 | "# Modified triplet loss\n", 379 | "# Loss 1\n", 380 | "l_1 = np.maximum(mean_neg - sim_ap.reshape(b, 1) + alpha, 0)\n", 381 | "# Loss 2\n", 382 | "l_2 = np.maximum(closest_neg - sim_ap.reshape(b, 1) + alpha, 0)\n", 383 | "# Loss full\n", 384 | "l_full = l_1 + l_2\n", 385 | "# Cost\n", 386 | "cost = np.sum(l_full)\n", 387 | "\n", 388 | "print(\"-- Outputs --\")\n", 389 | "print(\"loss full :\")\n", 390 | "print(l_full, \"\\n\")\n", 391 | "print(\"cost :\", \"{:.3f}\".format(cost))" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": {}, 397 | "source": [ 398 | "## Summary\n", 399 | "There were a lot of steps in there, so well done. You now know how to calculate a modified triplet loss, incorporating the mean negative and the closest negative. You also learned how to create a matrix of similarity scores based on cosine similarity." 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": null, 405 | "metadata": {}, 406 | "outputs": [], 407 | "source": [] 408 | } 409 | ], 410 | "metadata": { 411 | "jupytext": { 412 | "formats": "ipynb,py:percent", 413 | "main_language": "python" 414 | }, 415 | "kernelspec": { 416 | "display_name": "Python 3", 417 | "language": "python", 418 | "name": "python3" 419 | }, 420 | "language_info": { 421 | "codemirror_mode": { 422 | "name": "ipython", 423 | "version": 3 424 | }, 425 | "file_extension": ".py", 426 | "mimetype": "text/x-python", 427 | "name": "python", 428 | "nbconvert_exporter": "python", 429 | "pygments_lexer": "ipython3", 430 | "version": "3.7.1" 431 | } 432 | }, 433 | "nbformat": 4, 434 | "nbformat_minor": 2 435 | } 436 | --------------------------------------------------------------------------------