├── LICENSE
├── README.md
├── _config.yml
├── data-science
├── DS - Correlation Coefficients.ipynb
├── DS - Cross-Entropy.ipynb
├── DS - Iris DataViz.ipynb
└── DS - Linear Discriminant Analysis.ipynb
├── deep-learning
├── DL - Feed Forward NN.ipynb
├── DL - Image Classifier.ipynb
└── DL - XOR NN Solution.ipynb
├── docs
├── CODE_OF_CONDUCT.md
└── CONTRIBUTING.md
├── machine-learning
├── supervised-learning
│ ├── ML - Linear Regression.ipynb
│ ├── ML - Logistic Regression.ipynb
│ └── ML - ROC Curve.ipynb
└── unsupervised-learning
│ ├── ML - Clustering Validation.ipynb
│ ├── ML - Clustering with Dendogram.ipynb
│ └── ML - DBSCAN Clustering.ipynb
└── nlp
└── NLP - Text Normalization.ipynb
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Andres Segura
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Machine Learning Course
2 | 
3 | 
4 | 
5 | 
6 | 
7 |
8 | Free hands-on and interactive course in Python, which starting from Data Science offers examples (in Google Colab) and explanation (in Twitter threads) on concepts and techniques of Machine Learning, Deep Learning and NLP.
9 |
10 | Although it is not intended to have the formal rigor of a book, we tried to be as faithful as possible to the original algorithms and methods, only adding variants, when these were necessary for didactic purposes.
11 |
12 | ## Quick Start
13 | The best way to get the most out of this course is to carefully read each selected problem, try to think of a possible solution (language independent) and then look at the proposed Python code and try to reproduce it in your favorite IDE. If you already have knowledge of the Python language, then you can go directly to programming your solution and then compare it with the one proposed in the course.
14 |
15 | If you want to play with these notebooks online without having to install any library or configure hardware, you can use the following service:
16 | -
17 |
18 | ## Contents
19 | 1. Data Science
20 | - Correlation Coefficients: explanation \| code
21 | - Feature Scaling: explanation \| code
22 | - Entropy and Cross-Entropy: explanation \| code
23 | - Data Visualization & Dimensionality Reduction: code
24 | - Linear Discriminant Analysis: code
25 |
26 | 2. Machine Learning
27 | - What is Machine Learning: explanation
28 | - Fundamentals: explanation
29 | - Tips for deploying ML models: explanation
30 | - Unsupervised Learning (UL)
31 | - Clustering: explanation
32 | - Hierarchical Clustering: explanation \| code
33 | - DBSCAN Clustering: explanation \| code
34 | - Clustering Validation: explanation \| code
35 | - Supervised Learning (SL)
36 | - Methodology for regression problems: explanation
37 | - Linear Regression: code
38 | - Logistic Regression: code
39 | - Confusion Matrix: explanation \| resource
40 | - ROC Curve: code
41 | - Overfitting and Regularization: explanation
42 |
43 | 3. Deep Learning
44 | - XOR NN Solution: explanation \| code
45 | - Feed Forward NN: explanation \| code
46 | - CNN to Classify Images: explanation \| code
47 |
48 | 4. Natural Language Processing
49 | - Computational Linguistics vs NLP: explanation
50 | - Top NLP Libraries: explanation
51 | - Text Normalization: explanation \| code
52 | - Split Sentences: explanation \| code
53 | - Embeddings vs one-hot encoding: explanation
54 | - spaCy: explanation
55 | - Hugging Face example: code
56 |
57 | ## Contributing and Feedback
58 | Any kind of feedback/suggestions would be greatly appreciated (algorithm design, documentation, improvement ideas, spelling mistakes, etc...). If you want to make a contribution to the course you can do it through a PR.
59 |
60 | ## Author
61 | - Created by Andrés Segura-Tinoco
62 | - Created on Mar 06, 2021
63 |
64 | ## License
65 | This project is licensed under the terms of the MIT license.
66 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | title: Machine Learning Course
2 | description: Starting from Data Science offers examples (with Python code) and explanation (in Twitter threads) on concepts and techniques of Machine Learning, Deep Learning and NLP
3 | show_downloads: false
4 | google_analytics:
5 | theme: jekyll-theme-cayman
6 |
--------------------------------------------------------------------------------
/data-science/DS - Cross-Entropy.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"DS - Cross-Entropy.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true,"authorship_tag":"ABX9TyMb4ZCAS2f9bNmhHXfh6HYc"},"kernelspec":{"name":"python3","display_name":"Python 3"}},"cells":[{"cell_type":"markdown","metadata":{"id":"9FhFIxkiWhsr"},"source":["# Entropy and Cross-Entropy\n","\n","Created by Andres Segura-Tinoco \n","Created on Mar 15, 2021\n","\n","Source: https://en.wikipedia.org/wiki/Entropy_(information_theory)"]},{"cell_type":"code","metadata":{"id":"F50pPNvTW7O2"},"source":["import math"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uyfoxcwjXojH"},"source":["**Entropy** is a measure of unpredictability, so its value is inversely related to the compression capacity of a chain of symbols.\n","\\begin{align}\n"," Entropy(X) = H(X) = \\sum_{i=1}^n P(x_i) \\log_{2} \\frac{1}{P(x_i)} \\tag{1}\n","\\end{align}"]},{"cell_type":"markdown","metadata":{"id":"GFQ2DEzHXrZl"},"source":["The **Cross-Entropy** of the distribution $q$ relative to a distribution $p$ over a given set is defined as follows:\n","\\begin{align}\n"," Cross-Entropy(p, q) = H(p, q) = \\sum_{i=1}^n P(x_i) \\log_{2} \\frac{1}{Q(x_i)} \\tag{2}\n","\\end{align}"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"f9QnmARTWc_5","executionInfo":{"status":"ok","timestamp":1615838598764,"user_tz":300,"elapsed":951,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"67361a6e-00af-4430-d9f8-89dab11966f5"},"source":["# Probability distributions P of x set\n","p_x = [0.1, 0.2, 0.15, 0.15, 0.4]\n","sum(p_x)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["1.0"]},"metadata":{"tags":[]},"execution_count":2}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"0rzI4sSEWgdA","executionInfo":{"status":"ok","timestamp":1615838598765,"user_tz":300,"elapsed":947,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"6c13796d-21b0-44cf-c22a-57861f7adc3b"},"source":["# Calculate entropy of Px\n","h_x = 0\n","for x in p_x:\n"," h_x -= x * math.log2(x)\n","h_x"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["2.1464393446710153"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"QH8NwLrVYrqw","executionInfo":{"status":"ok","timestamp":1615838598765,"user_tz":300,"elapsed":943,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"f19c29ad-618e-4172-d13b-0a656c799484"},"source":["# Probability distributions Qgu of x set\n","q_x = [0.08, 0.24, 0.13, 0.17, 0.38]\n","sum(q_x)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["1.0"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"FNvfKScpXIry","executionInfo":{"status":"ok","timestamp":1615838598766,"user_tz":300,"elapsed":941,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"9ede76f7-4df9-4d51-f0d3-8957aebfb8f5"},"source":["# Calculate cross-entropy of Qx relative to Px\n","ce_x = 0\n","for p, q in zip(p_x, q_x):\n"," ce_x += p * math.log2(1/q)\n","ce_x"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["2.1595073003443446"]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"fycgxlV1eq4s","executionInfo":{"status":"ok","timestamp":1615838598767,"user_tz":300,"elapsed":938,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"91d234c1-58af-4720-dafa-24c891340ff0"},"source":["# Validation\n","h_x < ce_x"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["True"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"markdown","metadata":{"id":"D_Lvtzkme7wp"},"source":["
\n","You can contact me on Twitter | GitHub | LinkedIn"]}]}
--------------------------------------------------------------------------------
/deep-learning/DL - Feed Forward NN.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"DL - Feed Forward NN.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true,"authorship_tag":"ABX9TyM/F6tbrCHTSqFCrnxgftMs"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"HyeqmF4temTO"},"source":["# Feed Forward Neural Network\n","\n","Created by Andres Segura-Tinoco \n","Created on Mar 23, 2021"]},{"cell_type":"markdown","metadata":{"id":"awFa8dBm_jVt"},"source":["The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. \n","https://archive.ics.uci.edu/ml/datasets/iris"]},{"cell_type":"markdown","metadata":{"id":"AXx9yBN0ilkS"},"source":["## 1. Load libraries"]},{"cell_type":"code","metadata":{"id":"Rglh6Qt3ij14"},"source":["import numpy as np\n","from sklearn.datasets import load_iris\n","from sklearn.model_selection import train_test_split\n","from sklearn.preprocessing import OneHotEncoder\n","from keras.models import Sequential\n","from keras.layers import Dense\n","from keras.optimizers import Adam\n","from keras.utils.vis_utils import plot_model\n","import matplotlib.pyplot as plt"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"t0Hkl-vRipv_"},"source":["## 2. Load the Iris dataset"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eW7cJrZ6iuVt","executionInfo":{"status":"ok","timestamp":1616537451440,"user_tz":300,"elapsed":3118,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"153afd1c-21b8-485d-bacc-b5ac9c6b6c22"},"source":["# Load dataset\n","iris_data = load_iris() \n","\n","# Check our data\n","print('Data Features:')\n","print(iris_data.feature_names)\n","\n","print('Example data:')\n","print(iris_data.data[:5])\n","\n","print('Example labels:')\n","print(iris_data.target[:5])"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Data Features:\n","['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']\n","Example data:\n","[[5.1 3.5 1.4 0.2]\n"," [4.9 3. 1.4 0.2]\n"," [4.7 3.2 1.3 0.2]\n"," [4.6 3.1 1.5 0.2]\n"," [5. 3.6 1.4 0.2]]\n","Example labels:\n","[0 0 0 0 0]\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"id":"1FH-J6bbIW1D"},"source":["# Create X and Y sets\n","x = iris_data.data\n","y = iris_data.target.reshape(-1, 1) # Convert data to a single column"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"3NNziG5-iuvB"},"source":["## 3. One Hot encode the class labels"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"d4spSYKaV_nu","executionInfo":{"status":"ok","timestamp":1616537451441,"user_tz":300,"elapsed":3111,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"7611c770-2330-4a51-b5e0-fb8ce05438cc"},"source":["encoder = OneHotEncoder(sparse=False)\n","y = encoder.fit_transform(y)\n","print('OneHotEncoder: ')\n","print(y[[0, 50, 100]])\n","\n","#Encodes the output as:\n","#Setosa,\tVersicolor,\tVirginica\n","#1\t\t0\t\t\t0\n","#0\t\t1 \t\t0\n","#0 \t0 \t\t1"],"execution_count":null,"outputs":[{"output_type":"stream","text":["OneHotEncoder: \n","[[1. 0. 0.]\n"," [0. 1. 0.]\n"," [0. 0. 1.]]\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"j1aQ5hGgi5_f"},"source":["## 4. Split the data for training and testing"]},{"cell_type":"code","metadata":{"id":"npo8xqD4i6QI"},"source":["# Split the data into 80-20%\n","train_x, test_x, train_y, test_y = train_test_split(x, y, test_size=0.20)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ETEL5IwvjFiF"},"source":["## 5. Build the feedforward network model"]},{"cell_type":"code","metadata":{"id":"9zRPd3JfjE31"},"source":["# Create FFNN model\n","model = Sequential(name=\"Iris_FFNN\")\n","model.add(Dense(10, input_shape=(4,), activation='relu', name='L1'))\n","model.add(Dense(10, activation='relu', name='L2'))\n","model.add(Dense(3, activation='softmax', name='Output'))\n","\n","# Adam optimizer with learning rate of 0.001\n","optimizer = Adam(lr=0.001)\n","model.compile(optimizer, loss='categorical_crossentropy', metrics=['accuracy'])"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"utGRoG5QkYyY"},"source":["## 6. Show the model architecture"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"W-7bzX2DkZRO","executionInfo":{"status":"ok","timestamp":1616537451710,"user_tz":300,"elapsed":3371,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"591e20b5-79ac-4d7a-f9cd-0108e0fe64f5"},"source":["model.summary()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Model: \"Iris_FFNN\"\n","_________________________________________________________________\n","Layer (type) Output Shape Param # \n","=================================================================\n","L1 (Dense) (None, 10) 50 \n","_________________________________________________________________\n","L2 (Dense) (None, 10) 110 \n","_________________________________________________________________\n","Output (Dense) (None, 3) 33 \n","=================================================================\n","Total params: 193\n","Trainable params: 193\n","Non-trainable params: 0\n","_________________________________________________________________\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":422},"id":"xggbou1OjuZX","executionInfo":{"status":"ok","timestamp":1616537451711,"user_tz":300,"elapsed":3368,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"d0af9a3c-b939-43c2-9582-e3149251f2b3"},"source":["plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"image/png":"\n","text/plain":[""]},"metadata":{"tags":[]},"execution_count":8}]},{"cell_type":"markdown","metadata":{"id":"B8S3GioMwy2O"},"source":["## 7. Train model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"aU6hE_3MwzYK","executionInfo":{"status":"ok","timestamp":1616537454779,"user_tz":300,"elapsed":6431,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"d7e4b1a4-cddb-4f6d-9902-0ea0907027e9"},"source":["print(\"Fit model on training data\")\n","results = model.fit(train_x, train_y, batch_size=25, epochs=50, validation_data=(test_x, test_y), verbose=False)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fit model on training data\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"-nDssxaAyQRx"},"source":["## 8. Plot model results"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"phPDQip_xhwE","executionInfo":{"status":"ok","timestamp":1616537454780,"user_tz":300,"elapsed":6428,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"82af0a20-18e5-47ad-ed6c-7e6ddfd3dee9"},"source":["# List all data in history\n","print(results.history.keys())"],"execution_count":null,"outputs":[{"output_type":"stream","text":["dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":295},"id":"Ew_zZWo9yyyl","executionInfo":{"status":"ok","timestamp":1616537454999,"user_tz":300,"elapsed":6643,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"634d5455-35ee-4374-caad-fc7d3c75395c"},"source":["# Summarize history for accuracy\n","plt.plot(results.history['accuracy'])\n","plt.plot(results.history['val_accuracy'])\n","plt.title('Model Accuracy')\n","plt.ylabel('accuracy')\n","plt.xlabel('epoch')\n","plt.legend(['train', 'test'], loc='upper left')\n","plt.show()"],"execution_count":null,"outputs":[{"output_type":"display_data","data":{"image/png":"\n","text/plain":[""]},"metadata":{"tags":[],"needs_background":"light"}}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":295},"id":"oDwK8iB2yyoN","executionInfo":{"status":"ok","timestamp":1616537455236,"user_tz":300,"elapsed":6875,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"c4246fba-9b65-4a2f-9d11-b648f765b679"},"source":["# Summarize history for loss\n","plt.plot(results.history['loss'])\n","plt.plot(results.history['val_loss'])\n","plt.title('Model Loss')\n","plt.ylabel('loss')\n","plt.xlabel('epoch')\n","plt.legend(['train', 'test'], loc='upper left')\n","plt.show()"],"execution_count":null,"outputs":[{"output_type":"display_data","data":{"image/png":"\n","text/plain":[""]},"metadata":{"tags":[],"needs_background":"light"}}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"N1I3oubcIaiq","executionInfo":{"status":"ok","timestamp":1616537455237,"user_tz":300,"elapsed":6871,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"e00924b2-01be-454a-a67f-cb31b4efec04"},"source":["# Test on data not used for the training\n","results = model.evaluate(test_x, test_y)\n","\n","print('Final test set loss: {:4f}'.format(results[0]))\n","print('Final test set accuracy: {:4f}'.format(results[1]))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["1/1 [==============================] - 0s 16ms/step - loss: 0.5509 - accuracy: 0.8667\n","Final test set loss: 0.550854\n","Final test set accuracy: 0.866667\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"U51vdkg7UQeR"},"source":["\n","You can contact me on Twitter | GitHub | LinkedIn"]}]}
--------------------------------------------------------------------------------
/deep-learning/DL - XOR NN Solution.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"DL - XOR NN Solution.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true,"authorship_tag":"ABX9TyMLSQQfn+xUzItcU/1Oia0U"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"qJF-WuID6OJK"},"source":["# Neural Network that implements the XOR gate\n","### Solving XOR with a NN with a hidden layer\n","\n","Created by Andres Segura-Tinoco \n","Created on Mar 29, 2021"]},{"cell_type":"code","metadata":{"id":"LWjGn3X-6JkR"},"source":["# Importing Keras libraries\n","import numpy as np\n","from keras.models import Sequential\n","from keras.layers.core import Dense\n","from tensorflow.keras import initializers\n","from keras.optimizers import Adam\n","from keras.utils.vis_utils import plot_model\n","import matplotlib.pyplot as plt"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ScXABjya_Wpm"},"source":["## 1. Create NN Model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"E--4x5lsF89r","executionInfo":{"status":"ok","timestamp":1617023847908,"user_tz":300,"elapsed":2960,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"a349cf61-aeb1-406f-e246-0c12ccf1b64b"},"source":["# Layer weight initializers\n","initializer = initializers.RandomNormal(mean=0.0, stddev=0.5, seed=None)\n","values = initializer(shape=(2, 2))\n","values"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[""]},"metadata":{"tags":[]},"execution_count":2}]},{"cell_type":"code","metadata":{"id":"PirxbEVG61Z9"},"source":["# Create model\n","model = Sequential(name=\"XOR_MLP\")\n","model.add(Dense(units=4, input_dim=2, use_bias=True, activation='relu', kernel_initializer=initializer, name='HL'))\n","model.add(Dense(units=1, use_bias=True, activation='sigmoid', kernel_initializer=initializer, name='Output'))\n","\n","# Adam optimizer with learning rate of 0.02\n","model.compile(optimizer=Adam(lr=0.02), loss='binary_crossentropy', metrics=['accuracy'])"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"hSGKDZLc8sOm","executionInfo":{"status":"ok","timestamp":1617023848331,"user_tz":300,"elapsed":3376,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"f637c3a9-1e66-4714-ba43-8c497d53c0ef"},"source":["model.summary()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Model: \"XOR_MLP\"\n","_________________________________________________________________\n","Layer (type) Output Shape Param # \n","=================================================================\n","HL (Dense) (None, 4) 12 \n","_________________________________________________________________\n","Output (Dense) (None, 1) 5 \n","=================================================================\n","Total params: 17\n","Trainable params: 17\n","Non-trainable params: 0\n","_________________________________________________________________\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":312},"id":"5i9nmdeF9N3h","executionInfo":{"status":"ok","timestamp":1617023848332,"user_tz":300,"elapsed":3372,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"898e708b-4680-4970-b27d-ae97d0df05f0"},"source":["plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"image/png":"\n","text/plain":[""]},"metadata":{"tags":[]},"execution_count":5}]},{"cell_type":"markdown","metadata":{"id":"ZYIcX-1l_eNC"},"source":["## 2. Train model with Binary data"]},{"cell_type":"code","metadata":{"id":"6m0_JNaDCBdh"},"source":["# Define binary encoding\n","training_data = np.array([[0,0],[0,1],[1,0],[1,1]], \"float32\")\n","target_data = np.array([[0],[1],[1],[0]], \"float32\")"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"MYCsgbQHASgm","executionInfo":{"status":"ok","timestamp":1617023849064,"user_tz":300,"elapsed":4093,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"53a9ef53-bfd2-46d3-af2d-f6fe4159285d"},"source":["print(\"Fit model on training data\")\n","results = model.fit(training_data, target_data, epochs=100, verbose=False)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fit model on training data\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"q-i8LDhaBSfL","executionInfo":{"status":"ok","timestamp":1617023849065,"user_tz":300,"elapsed":4090,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"c02ea6d5-60ff-4081-ec7f-a74382ee3500"},"source":["print(model.predict(training_data).round())"],"execution_count":null,"outputs":[{"output_type":"stream","text":["[[0.]\n"," [1.]\n"," [1.]\n"," [0.]]\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":295},"id":"oRYZ2U-VDrk0","executionInfo":{"status":"ok","timestamp":1617023849399,"user_tz":300,"elapsed":4419,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"687c0cf6-7bf7-4531-b2f0-8e941502771e"},"source":["# Summarize history for accuracy\n","plt.plot(results.history['accuracy'])\n","plt.title('Model Accuracy')\n","plt.ylabel('accuracy')\n","plt.xlabel('epoch')\n","plt.show()"],"execution_count":null,"outputs":[{"output_type":"display_data","data":{"image/png":"\n","text/plain":[""]},"metadata":{"tags":[],"needs_background":"light"}}]},{"cell_type":"markdown","metadata":{"id":"RrGJ7lSDCGsK"},"source":["## 3. Train model with Bipolar input data"]},{"cell_type":"code","metadata":{"id":"mrz-tcbmCPcB"},"source":["# Define bipolar encoding\n","training_data = np.array([[-1,-1],[-1,1],[1,-1],[1,1]], \"float32\")\n","target_data = np.array([[0],[1],[1],[0]], \"float32\")"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Ju4wi84iCP-L","executionInfo":{"status":"ok","timestamp":1617023849678,"user_tz":300,"elapsed":4690,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"19891cce-ed6c-489e-81c8-6723aace280c"},"source":["print(\"Fit model on training data\")\n","results = model.fit(training_data, target_data, epochs=10, verbose=False)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Fit model on training data\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"Ry4CSKLUCPt5","executionInfo":{"status":"ok","timestamp":1617023849679,"user_tz":300,"elapsed":4688,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"8a4a6f2b-b8b1-48df-de37-42e1677cb863"},"source":["print(model.predict(training_data).round())"],"execution_count":null,"outputs":[{"output_type":"stream","text":["[[0.]\n"," [1.]\n"," [1.]\n"," [0.]]\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":295},"id":"_tBI4YrCD3mt","executionInfo":{"status":"ok","timestamp":1617023849679,"user_tz":300,"elapsed":4684,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"167699f8-ca85-45a7-ed0d-0b7083170af7"},"source":["# Summarize history for accuracy\n","plt.plot(results.history['accuracy'])\n","plt.title('Model Accuracy')\n","plt.ylabel('accuracy')\n","plt.xlabel('epoch')\n","plt.show()"],"execution_count":null,"outputs":[{"output_type":"display_data","data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAWKklEQVR4nO3de7BlZX3m8e8DzVUujXTHYDfSqMTYOAp4xFsMeIkFGiUaR0XlllGmIihkQiWgTuGQWM5M1BgTRyQGtcWAhqBDDHIRFYtRlMNVAS8tit1NE9pwUTCIwG/+2OvI7sPb3Rvo3ev0Od9P1a7a633XWvu3V3Xv57zrXXuvVBWSJE23Rd8FSJJmJgNCktRkQEiSmgwISVKTASFJajIgJElNBoTmvCRLklSSeSOse2SSSzdFXVLfDAhtVpL8OMm9SRZMa7+q+5Bf0k9la9WyQ5K7knyx71qkR8OA0OboR8ChUwtJ/hOwfX/lPMQfAr8Efi/Jb27KFx5lFCSNyoDQ5uhTwOFDy0cAy4ZXSLJzkmVJ1iS5Kcm7kmzR9W2Z5H1JfprkRuDljW3/IcnqJKuS/GWSLR9GfUcApwLXAm+atu/fSfL1JHckWZHkyK59uyTv72q9M8mlXduBSVZO28ePk7yke/7uJGcnOSPJz4Ajk+yf5Bvda6xO8ndJth7afu8kFyW5Lcm/JXlHkt9M8oskuw6tt193/LZ6GO9ds4gBoc3RZcBOSZ7afXC/Hjhj2jp/C+wMPBE4gEGgHNX1vQX4fWBfYAJ4zbRtPwHcBzy5W+elwJtHKSzJHsCBwKe7x+HT+r7Y1bYQ2Ae4uut+H/BM4HnAY4E/Ax4Y5TWBQ4Czgfnda94P/AmwAHgu8GLgrV0NOwJfAs4HHt+9x4ur6hbgq8Brh/Z7GHBWVf1qxDo0yxgQ2lxNjSJ+D7gBWDXVMRQaJ1XVz6vqx8D7GXzgweBD8INVtaKqbgPeO7Tt44CXAcdX1d1VdSvw193+RnEYcG1VXQ+cBeydZN+u7w3Al6rqzKr6VVX9e1Vd3Y1s/gg4rqpWVdX9VfX1qvrliK/5jar6fFU9UFX/UVVXVNVlVXVf994/yiAkYRCMt1TV+6vqnu74fLPr+yTdiKc7hocyOM6aozxfqc3Vp4CvAXsy7fQSg7+ctwJuGmq7CVjUPX88sGJa35Q9um1XJ5lq22La+utzOPD3AFW1KsklDE45XQXsDvywsc0CYNt19I1irdqS/BbwAQajo+0Z/D+/outeVw0A/xc4NcmewFOAO6vqW4+wJs0CjiC0WaqqmxhMVr8MOGda90+BXzH4sJ/yBB4cZaxm8EE53DdlBYMJ5gVVNb977FRVe2+opiTPA/YCTkpyS5JbgGcDb+gmj1cAT2ps+lPgnnX03c3QBHz3l/3CaetM/0nmjwDfBfaqqp2AdwBTabeCwWm3h6iqe4DPMhhFHIajhznPgNDm7L8AL6qqu4cbq+p+Bh9070myY3fu/7/x4DzFZ4G3J1mcZBfgxKFtVwMXAu9PslOSLZI8KckBbNgRwEXAUgbzC/sATwO2Aw5mMD/wkiSvTTIvya5J9qmqB4DTgQ8keXw3if7cJNsA3we2TfLybrL4XcA2G6hjR+BnwF1Jfhv446G+LwC7JTk+yTbd8Xn2UP8y4EjglRgQc54Boc1WVf2wqibX0f02Bn993whcCvwjgw9hGJwCugC4BriSh45ADge2Bq4HbmcwAbzb+mpJsi2DuY2/rapbhh4/YvBBe0RV/YTBiOdPgdsYTFA/o9vFCcC3gcu7vv8FbFFVdzKYYP4YgxHQ3cBaVzU1nMBgvuPn3Xv9zFRHVf2cwbzNK4BbgB8ALxzq/38MJsev7EZpmsPiDYMkDUvyZeAfq+pjfdeifhkQkn4tybMYnCbbvRttaA7zFJMkAJJ8ksF3JI43HASOICRJ6+AIQpLUNGu+KLdgwYJasmRJ32VI0mbliiuu+GlVTf9uDTCLAmLJkiVMTq7rikdJUkuSdV7O7CkmSVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVLT2AIiyelJbk3ynXX0J8mHkixPcm2S/ab175RkZZK/G1eNkqR1G+cI4hPAQevpPxjYq3scDXxkWv9fAF8bS2WSpA0aW0BU1deA29azyiHAshq4DJifZDeAJM8EHgdcOK76JEnr1+ccxCJgxdDySmBRki2A9wMnbGgHSY5OMplkcs2aNWMqU5Lmppk4Sf1W4LyqWrmhFavqtKqaqKqJhQsXboLSJGnumNfja68Cdh9aXty1PRd4QZK3AjsAWye5q6pO7KFGSZqz+gyIc4Fjk5wFPBu4s6pWA2+cWiHJkcCE4SBJm97YAiLJmcCBwIIkK4GTga0AqupU4DzgZcBy4BfAUeOqRZL08I0tIKrq0A30F3DMBtb5BIPLZSVJm9hMnKSWJM0ABoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkprGFhBJTk9ya5LvrKM/ST6UZHmSa5Ps17Xvk+QbSa7r2l83rholSes2zhHEJ4CD1tN/MLBX9zga+EjX/gvg8Krau9v+g0nmj7FOSVLDvHHtuKq+lmTJelY5BFhWVQVclmR+kt2q6vtD+7g5ya3AQuCOcdUqSXqoPucgFgErhpZXdm2/lmR/YGvgh5uwLkkSM3iSOsluwKeAo6rqgXWsc3SSySSTa9as2bQFStIs12dArAJ2H1pe3LWRZCfgX4F3VtVl69pBVZ1WVRNVNbFw4cKxFitJc02fAXEucHh3NdNzgDuranWSrYHPMZifOLvH+iRpThvbJHWSM4EDgQVJVgInA1sBVNWpwHnAy4DlDK5cOqrb9LXA7wK7Jjmyazuyqq4eV62SpIca51VMh26gv4BjGu1nAGeMqy5J0mhm7CS1JKlfBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaRgqIJOckeXkSA0WS5ohRP/D/D/AG4AdJ/meSp4yxJknSDDBSQFTVl6rqjcB+wI+BLyX5epKjkmw1zgIlSf0Y+ZRRkl2BI4E3A1cBf8MgMC4aS2WSpF6NdMvRJJ8DngJ8CnhFVa3uuj6TZHJcxUmS+jPqPak/VFVfaXVU1cRGrEeSNEOMeoppaZL5UwtJdkny1jHVJEmaAUYNiLdU1R1TC1V1O/CW8ZQkSZoJRg2ILZNkaiHJlsDW4ylJkjQTjDoHcT6DCemPdsv/tWuTJM1SowbEnzMIhT/uli8CPjaWiiRJM8JIAVFVDwAf6R6SpDlg1O9B7AW8F1gKbDvVXlVPHFNdkqSejTpJ/XEGo4f7gBcCy4AzxlWUJKl/owbEdlV1MZCquqmq3g28fHxlSZL6Nuok9S+7n/r+QZJjgVXADuMrS5LUt1FHEMcB2wNvB54JvAk4YlxFSZL6t8GA6L4U97qququqVlbVUVX1h1V12Qa2Oz3JrUm+s47+JPlQkuVJrk2y31DfEUl+0D0MIknqwQYDoqruB37nEez7E8BB6+k/GNirexxNdwltkscCJwPPBvYHTk6yyyN4fUnSozDqHMRVSc4F/gm4e6qxqs5Z1wZV9bUkS9azz0OAZVVVwGVJ5ifZDTgQuKiqbgNIchGDoDlzxFoftv/xL9dx/c0/G9fuJWmslj5+J05+xd4bfb+jBsS2wL8DLxpqK2CdATGCRcCKoeWVXdu62h8iydEMRh884QlPeBSlSJKmG/Wb1EeNu5BHoqpOA04DmJiYqEe6n3EkryRt7kb9JvXHGYwY1lJVf/QoXnsVsPvQ8uKubRWD00zD7V99FK8jSXoERr3M9QvAv3aPi4GdgLse5WufCxzeXc30HODO7lamFwAv7W5KtAvw0q5NkrQJjXqK6Z+Hl5OcCVy6vm26dQ4EFiRZyeDKpK26/Z0KnAe8DFgO/AI4quu7LclfAJd3uzplasJakrTpjDpJPd1ewG+sb4WqOnQD/QUcs46+04HTH2FtkqSNYNQ5iJ+z9hzELQzuESFJmqVGPcW047gLkSTNLCNNUid5VZKdh5bnJ/mD8ZUlSerbqFcxnVxVd04tVNUdDCadJUmz1KgB0VrvkU5wS5I2A6MGxGSSDyR5Uvf4AHDFOAuTJPVr1IB4G3Av8BngLOAe1nGJqiRpdhj1Kqa7gRPHXIskaQYZ9Sqmi5LMH1reJYk/fyFJs9iop5gWdFcuAVBVt7OBb1JLkjZvowbEA0l+fcOF7kZAj/jntSVJM9+ol6q+E7g0ySVAgBfQ3ahHkjQ7jTpJfX6SCQahcBXweeA/xlmYJKlfo/5Y35uB4xjcvOdq4DnAN1j7FqSSpFlk1DmI44BnATdV1QuBfYE71r+JJGlzNmpA3FNV9wAk2aaqvgs8ZXxlSZL6Nuok9cruexCfBy5Kcjtw0/jKkiT1bdRJ6ld1T9+d5CvAzsD5Y6tKktS7h/2LrFV1yTgKkSTNLKPOQUiS5hgDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaxhoQSQ5K8r0ky5Oc2OjfI8nFSa5N8tUki4f6/neS65LckORDSTLOWiVJaxtbQCTZEvgwcDCwFDg0ydJpq70PWFZVTwdOAd7bbfs84PnA04GnMbhZ0QHjqlWS9FDjHEHsDyyvqhur6l7gLOCQaessBb7cPf/KUH8B2wJbA9sAWwH/NsZaJUnTjDMgFgErhpZXdm3DrgFe3T1/FbBjkl2r6hsMAmN197igqm4YY62SpGn6nqQ+ATggyVUMTiGtAu5P8mTgqcBiBqHyoiQvmL5xkqOTTCaZXLNmzaasW5JmvXEGxCpg96HlxV3br1XVzVX16qraF3hn13YHg9HEZVV1V1XdBXwReO70F6iq06pqoqomFi5cOK73IUlz0jgD4nJgryR7JtkaeD1w7vAKSRYkmarhJOD07vlPGIws5iXZisHowlNMkrQJjS0gquo+4FjgAgYf7p+tquuSnJLkld1qBwLfS/J94HHAe7r2s4EfAt9mME9xTVX9y7hqlSQ9VKqq7xo2iomJiZqcnOy7DEnarCS5oqomWn19T1JLkmYoA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmgwISVKTASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpaawBkeSgJN9LsjzJiY3+PZJcnOTaJF9Nsnio7wlJLkxyQ5LrkywZZ62SpLWNLSCSbAl8GDgYWAocmmTptNXeByyrqqcDpwDvHepbBvxVVT0V2B+4dVy1SpIeapwjiP2B5VV1Y1XdC5wFHDJtnaXAl7vnX5nq74JkXlVdBFBVd1XVL8ZYqyRpmnEGxCJgxdDyyq5t2DXAq7vnrwJ2TLIr8FvAHUnOSXJVkr/qRiRrSXJ0kskkk2vWrBnDW5CkuavvSeoTgAOSXAUcAKwC7gfmAS/o+p8FPBE4cvrGVXVaVU1U1cTChQs3WdGSNBeMMyBWAbsPLS/u2n6tqm6uqldX1b7AO7u2OxiMNq7uTk/dB3we2G+MtUqSphlnQFwO7JVkzyRbA68Hzh1eIcmCJFM1nAScPrTt/CRTw4IXAdePsVZJ0jRjC4juL/9jgQuAG4DPVtV1SU5J8sputQOB7yX5PvA44D3dtvczOL10cZJvAwH+fly1SpIeKlXVdw0bxcTERE1OTvZdhiRtVpJcUVUTrb6+J6klSTOUASFJajIgJElNBoQkqcmAkCQ1GRCSpCYDQpLUZEBIkpoMCElSkwEhSWoyICRJTQaEJKnJgJAkNRkQkqQmA0KS1GRASJKaDAhJUpMBIUlqMiAkSU0GhCSpyYCQJDUZEJKkJgNCktRkQEiSmlJVfdewUSRZA9z0KHaxAPjpRipnc+exWJvHY20ejwfNhmOxR1UtbHXMmoB4tJJMVtVE33XMBB6LtXk81ubxeNBsPxaeYpIkNRkQkqQmA+JBp/VdwAzisVibx2NtHo8Hzepj4RyEJKnJEYQkqcmAkCQ1zfmASHJQku8lWZ7kxL7r6VOS3ZN8Jcn1Sa5LclzfNfUtyZZJrkryhb5r6VuS+UnOTvLdJDckeW7fNfUpyZ90/0++k+TMJNv2XdPGNqcDIsmWwIeBg4GlwKFJlvZbVa/uA/60qpYCzwGOmePHA+A44Ia+i5gh/gY4v6p+G3gGc/i4JFkEvB2YqKqnAVsCr++3qo1vTgcEsD+wvKpurKp7gbOAQ3quqTdVtbqqruye/5zBB8CifqvqT5LFwMuBj/VdS9+S7Az8LvAPAFV1b1Xd0W9VvZsHbJdkHrA9cHPP9Wx0cz0gFgErhpZXMoc/EIclWQLsC3yz30p69UHgz4AH+i5kBtgTWAN8vDvl9rEkj+m7qL5U1SrgfcBPgNXAnVV1Yb9VbXxzPSDUkGQH4J+B46vqZ33X04ckvw/cWlVX9F3LDDEP2A/4SFXtC9wNzNk5uyS7MDjbsCfweOAxSd7Ub1Ub31wPiFXA7kPLi7u2OSvJVgzC4dNVdU7f9fTo+cArk/yYwanHFyU5o9+SerUSWFlVUyPKsxkExlz1EuBHVbWmqn4FnAM8r+eaNrq5HhCXA3sl2TPJ1gwmmc7tuabeJAmDc8w3VNUH+q6nT1V1UlUtrqolDP5dfLmqZt1fiKOqqluAFUme0jW9GLi+x5L69hPgOUm27/7fvJhZOGk/r+8C+lRV9yU5FriAwVUIp1fVdT2X1afnA4cB305yddf2jqo6r8eaNHO8Dfh098fUjcBRPdfTm6r6ZpKzgSsZXP13FbPwZzf8qQ1JUtNcP8UkSVoHA0KS1GRASJKaDAhJUpMBIUlqMiCkGSDJgf5irGYaA0KS1GRASA9Dkjcl+VaSq5N8tLtfxF1J/rq7N8DFSRZ26+6T5LIk1yb5XPf7PSR5cpIvJbkmyZVJntTtfoeh+y18uvuGrtQbA0IaUZKnAq8Dnl9V+wD3A28EHgNMVtXewCXAyd0my4A/r6qnA98eav808OGqegaD3+9Z3bXvCxzP4N4kT2TwzXapN3P6pzakh+nFwDOBy7s/7rcDbmXwc+Cf6dY5Azinu3/C/Kq6pGv/JPBPSXYEFlXV5wCq6h6Abn/fqqqV3fLVwBLg0vG/LanNgJBGF+CTVXXSWo3Jf5+23iP9/ZpfDj2/H/9/qmeeYpJGdzHwmiS/AZDksUn2YPD/6DXdOm8ALq2qO4Hbk7ygaz8MuKS7U9/KJH/Q7WObJNtv0nchjci/UKQRVdX1Sd4FXJhkC+BXwDEMbp6zf9d3K4N5CoAjgFO7ABj+9dPDgI8mOaXbx3/ehG9DGpm/5io9Sknuqqod+q5D2tg8xSRJanIEIUlqcgQhSWoyICRJTQaEJKnJgJAkNRkQkqSm/w+luNshdkFXDQAAAABJRU5ErkJggg==\n","text/plain":[""]},"metadata":{"tags":[],"needs_background":"light"}}]},{"cell_type":"markdown","metadata":{"id":"mh3kUIeDUVa3"},"source":["\n","You can contact me on Twitter | GitHub | LinkedIn"]}]}
--------------------------------------------------------------------------------
/docs/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Pledge
4 |
5 | We as members, contributors, and leaders pledge to make participation in our
6 | community a harassment-free experience for everyone, regardless of age, body
7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
8 | identity and expression, level of experience, education, socio-economic status,
9 | nationality, personal appearance, race, religion, or sexual identity
10 | and orientation.
11 |
12 | We pledge to act and interact in ways that contribute to an open, welcoming,
13 | diverse, inclusive, and healthy community.
14 |
15 | ## Our Standards
16 |
17 | Examples of behavior that contributes to a positive environment for our
18 | community include:
19 |
20 | * Demonstrating empathy and kindness toward other people
21 | * Being respectful of differing opinions, viewpoints, and experiences
22 | * Giving and gracefully accepting constructive feedback
23 | * Accepting responsibility and apologizing to those affected by our mistakes,
24 | and learning from the experience
25 | * Focusing on what is best not just for us as individuals, but for the
26 | overall community
27 |
28 | Examples of unacceptable behavior include:
29 |
30 | * The use of sexualized language or imagery, and sexual attention or
31 | advances of any kind
32 | * Trolling, insulting or derogatory comments, and personal or political attacks
33 | * Public or private harassment
34 | * Publishing others' private information, such as a physical or email
35 | address, without their explicit permission
36 | * Other conduct which could reasonably be considered inappropriate in a
37 | professional setting
38 |
39 | ## Enforcement Responsibilities
40 |
41 | Community leaders are responsible for clarifying and enforcing our standards of
42 | acceptable behavior and will take appropriate and fair corrective action in
43 | response to any behavior that they deem inappropriate, threatening, offensive,
44 | or harmful.
45 |
46 | Community leaders have the right and responsibility to remove, edit, or reject
47 | comments, commits, code, wiki edits, issues, and other contributions that are
48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
49 | decisions when appropriate.
50 |
51 | ## Scope
52 |
53 | This Code of Conduct applies within all community spaces, and also applies when
54 | an individual is officially representing the community in public spaces.
55 | Examples of representing our community include using an official e-mail address,
56 | posting via an official social media account, or acting as an appointed
57 | representative at an online or offline event.
58 |
59 | ## Enforcement
60 |
61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
62 | reported to the community leaders responsible for enforcement.
63 |
64 | All complaints will be reviewed and investigated promptly and fairly.
65 |
66 | All community leaders are obligated to respect the privacy and security of the
67 | reporter of any incident.
68 |
69 | ## Enforcement Guidelines
70 |
71 | Community leaders will follow these Community Impact Guidelines in determining
72 | the consequences for any action they deem in violation of this Code of Conduct:
73 |
74 | ### 1. Correction
75 |
76 | **Community Impact**: Use of inappropriate language or other behavior deemed
77 | unprofessional or unwelcome in the community.
78 |
79 | **Consequence**: A private, written warning from community leaders, providing
80 | clarity around the nature of the violation and an explanation of why the
81 | behavior was inappropriate. A public apology may be requested.
82 |
83 | ### 2. Warning
84 |
85 | **Community Impact**: A violation through a single incident or series
86 | of actions.
87 |
88 | **Consequence**: A warning with consequences for continued behavior. No
89 | interaction with the people involved, including unsolicited interaction with
90 | those enforcing the Code of Conduct, for a specified period of time. This
91 | includes avoiding interactions in community spaces as well as external channels
92 | like social media. Violating these terms may lead to a temporary or
93 | permanent ban.
94 |
95 | ### 3. Temporary Ban
96 |
97 | **Community Impact**: A serious violation of community standards, including
98 | sustained inappropriate behavior.
99 |
100 | **Consequence**: A temporary ban from any sort of interaction or public
101 | communication with the community for a specified period of time. No public or
102 | private interaction with the people involved, including unsolicited interaction
103 | with those enforcing the Code of Conduct, is allowed during this period.
104 | Violating these terms may lead to a permanent ban.
105 |
106 | ### 4. Permanent Ban
107 |
108 | **Community Impact**: Demonstrating a pattern of violation of community
109 | standards, including sustained inappropriate behavior, harassment of an
110 | individual, or aggression toward or disparagement of classes of individuals.
111 |
112 | **Consequence**: A permanent ban from any sort of public interaction within
113 | the community.
114 |
115 | ## Attribution
116 |
117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118 | version 2.0, available at
119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120 |
121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
122 | enforcement ladder](https://github.com/mozilla/diversity).
123 |
124 | [homepage]: https://www.contributor-covenant.org
125 |
126 | For answers to common questions about this code of conduct, see the FAQ at
127 | https://www.contributor-covenant.org/faq. Translations are available at
128 | https://www.contributor-covenant.org/translations.
129 |
--------------------------------------------------------------------------------
/docs/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing
2 | Welcome, and thank you for your interest in helping to improve and accelerate/ease the adoption of the Python Algorithms Course.
3 |
4 | There are many ways in which you can contribute, beyond writing code. The goal of this document is to provide a high-level overview of how you can get involved, and hopefully not feel intimidated.
5 |
6 | ## Asking Questions and Providing Feedback
7 | Have a question? Rather than emailing the author, open an issue.
8 |
9 | The community will be eager to assist you. Your well-worded question will serve as a resource to others searching for help.
10 |
11 | ## Reporting Issues and Ideas
12 | Have you identified an oversight or problem? Have a feature request? We want to hear about it! Here's how you can make reporting your issue as effective as possible.
13 |
14 | > **Note:** If you already know what you want to change, feel free to just fork/clone the repo, change it, and submit a pull request. No need to add overhead by creating an issue!
15 |
16 | ### Look For an Existing Issue
17 | Before you create a new issue, please do a search in [open issues](https://github.com/ansegura7/Algorithms/issues) to see if the issue or feature request has already been filed.
18 |
19 | If you cannot find an existing issue that describes your bug or feature, create a new issue using the guidelines below.
20 |
21 | ### Writing Good Bug Reports and Feature Requests
22 | File a single issue per problem and feature request. Do not enumerate multiple bugs or feature requests in the same issue.
23 |
24 | Below is some information you can provide. The more you can provide, the more likely someone will be successful at understanding and incorporating it. However be mindful of the cost/benefit for documenting vs simply implementing the change.
25 |
26 | Please include the following with each issue:
27 | - **Title** - Concise and clear to quickly identify the topic.
28 | - **Problem** - Summary of the issue/idea/feature.
29 | - **Possible Solution** - If a solution seems clear, share it as an option.
30 | - **Examples** - "What was expected" vs "What actually ocurred".
31 | - **Context** - External factors that restrict possible solutions. (stuff that can't be changed)
32 | > Example: Please upgrade the vehicle from 30mph to 60mph. Context: budget is $5000.
33 |
34 | # Thank You!
35 | Your contributions to the Python Algorithms Course, large or small, make great projects like this possible. Thank you for taking the time to contribute!
36 |
--------------------------------------------------------------------------------
/machine-learning/supervised-learning/ML - ROC Curve.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"ML - ROC Curve.ipynb","provenance":[],"collapsed_sections":[],"authorship_tag":"ABX9TyMks3MbMYGawx2mWhVMVQLX"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"_GV4XfYNy_f4"},"source":["# ROC Curve\n","\n","Created by Andres Segura-Tinoco \n","Created on Apr 08, 2021"]},{"cell_type":"code","metadata":{"id":"3VJM-P6ptjal"},"source":["# Import libraries\n","from matplotlib import pyplot as mpl\n","from sklearn.metrics import confusion_matrix\n","from sklearn.metrics import roc_curve, roc_auc_score"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"eHBo-ok4s8uF","executionInfo":{"status":"ok","timestamp":1617906847084,"user_tz":300,"elapsed":768,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"395c5ef8-28e3-44a3-e9c7-a0791410d4e5"},"source":["y_real = [0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0]\n","print(len(y_real))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["40\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"KZQzwZi0tH4H","executionInfo":{"status":"ok","timestamp":1617906847084,"user_tz":300,"elapsed":765,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"3a49ba70-9f7a-441c-a038-3faf92f8e28a"},"source":["y_pred = [0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0]\n","print(len(y_pred))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["40\n"],"name":"stdout"}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"k8bZZysUtXG7","executionInfo":{"status":"ok","timestamp":1617906847085,"user_tz":300,"elapsed":763,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"2162e541-14aa-432a-ae84-03036d1af1b2"},"source":["matrix = confusion_matrix(y_real, y_pred)\n","matrix"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["array([[14, 7],\n"," [ 7, 12]])"]},"metadata":{"tags":[]},"execution_count":4}]},{"cell_type":"code","metadata":{"id":"1KWGn_IeuTS0"},"source":["fpr, tpr, thresholds = roc_curve(y_real, y_pred)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"01XR73guuoG5","executionInfo":{"status":"ok","timestamp":1617906847086,"user_tz":300,"elapsed":758,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"5cbb502e-4ac3-48f4-e978-4da80fa0088e"},"source":["roc_auc_score(y_real, y_pred)"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":["0.6491228070175439"]},"metadata":{"tags":[]},"execution_count":6}]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":296},"id":"znnFrtO9uzsC","executionInfo":{"status":"ok","timestamp":1617906847758,"user_tz":300,"elapsed":1428,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"973ba94a-4bc5-41d3-cba8-765333e6736e"},"source":["mpl.plot(fpr, tpr, marker='.')\n","mpl.xlabel('False Positive Rate')\n","mpl.ylabel('True Positive Rate')\n","mpl.legend()\n","mpl.show()"],"execution_count":null,"outputs":[{"output_type":"stream","text":["No handles with labels found to put in legend.\n"],"name":"stderr"},{"output_type":"display_data","data":{"image/png":"\n","text/plain":[""]},"metadata":{"tags":[],"needs_background":"light"}}]},{"cell_type":"markdown","metadata":{"id":"rdkGr59QzRJk"},"source":["\n","You can contact me on Twitter | GitHub | LinkedIn"]}]}
--------------------------------------------------------------------------------
/nlp/NLP - Text Normalization.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"NLP - Text Normalization.ipynb","provenance":[],"collapsed_sections":[],"toc_visible":true,"authorship_tag":"ABX9TyPjYQeQilkD4xwGHd8eQJD2"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","metadata":{"id":"Frd_wyWebdfF"},"source":["# NLP - Text Normalization\n","### With NLTK and spaCy\n","\n","Created by Andres Segura-Tinoco \n","Created on May 15, 2021"]},{"cell_type":"code","metadata":{"id":"KuEStjoNbQ5s"},"source":["# Import libraries\n","import spacy\n","from spacy.lang.en import English"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":35},"id":"tE9VQGYncOQM","executionInfo":{"status":"ok","timestamp":1621109534846,"user_tz":300,"elapsed":958,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"e40c9e4f-7eed-47c0-cd3b-ac9d0bf09e4b"},"source":["# Verify installed spacy version\n","spacy.__version__"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"},"text/plain":["'2.2.4'"]},"metadata":{"tags":[]},"execution_count":2}]},{"cell_type":"markdown","metadata":{"id":"Plh556jfeSR7"},"source":["## 1. Create NLP model"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/","height":120},"id":"Avq9JTkLdbHH","executionInfo":{"status":"ok","timestamp":1621109534848,"user_tz":300,"elapsed":956,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"3e27ed86-84e7-4a93-9185-56a51ec36caf"},"source":["# Document composed of a paragraph from the book The Adventures of Sherlock Holmes, by Arthur Conan Doyle\n","book_en = \"\"\"\n"," To Sherlock Holmes she is always _the_ woman. I have seldom heard him\n"," mention her under any other name. In his eyes she eclipses and\n"," predominates the whole of her sex. It was not that he felt any emotion\n"," akin to love for Irene Adler. All emotions, and that one particularly,\n"," were abhorrent to his cold, precise but admirably balanced mind. He\n"," was, I take it, the most perfect reasoning and observing machine that\n"," the world has seen, but as a lover he would have placed himself in a\n"," false position. He never spoke of the softer passions, save with a gibe\n"," and a sneer. They were admirable things for the observer—excellent for\n"," drawing the veil from men’s motives and actions. But for the trained\n"," reasoner to admit such intrusions into his own delicate and finely\n"," adjusted temperament was to introduce a distracting factor which might\n"," throw a doubt upon all his mental results. Grit in a sensitive\n"," instrument, or a crack in one of his own high-power lenses, would not\n"," be more disturbing than a strong emotion in a nature such as his. And\n"," yet there was but one woman to him, and that woman was the late Irene\n"," Adler, of dubious and questionable memory.\n"," \"\"\"\n","\n","# Data quality\n","book_en = book_en.replace(\"\\n \", \"\").lower()\n","book_en"],"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"application/vnd.google.colaboratory.intrinsic+json":{"type":"string"},"text/plain":["'to sherlock holmes she is always _the_ woman. i have seldom heard himmention her under any other name. in his eyes she eclipses andpredominates the whole of her sex. it was not that he felt any emotionakin to love for irene adler. all emotions, and that one particularly,were abhorrent to his cold, precise but admirably balanced mind. hewas, i take it, the most perfect reasoning and observing machine thatthe world has seen, but as a lover he would have placed himself in afalse position. he never spoke of the softer passions, save with a gibeand a sneer. they were admirable things for the observer—excellent fordrawing the veil from men’s motives and actions. but for the trainedreasoner to admit such intrusions into his own delicate and finelyadjusted temperament was to introduce a distracting factor which mightthrow a doubt upon all his mental results. grit in a sensitiveinstrument, or a crack in one of his own high-power lenses, would notbe more disturbing than a strong emotion in a nature such as his. andyet there was but one woman to him, and that woman was the late ireneadler, of dubious and questionable memory.'"]},"metadata":{"tags":[]},"execution_count":3}]},{"cell_type":"code","metadata":{"id":"A5_1t1cwcONO"},"source":["# Create NLP model for English language\n","nlp = spacy.load('en_core_web_sm')\n","doc_en = nlp(book_en)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"DjkHGe9Tepf4"},"source":["## 2. Remove Stopwords"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"GOlCGSTNcOKQ","executionInfo":{"status":"ok","timestamp":1621109535840,"user_tz":300,"elapsed":1942,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"dd6e3241-a032-4302-eefc-a15cda813d70"},"source":["# Bag of words\n","doc_sw = []\n","doc_words = []\n","\n","for token in doc_en:\n"," word = str(token).replace(',', '').strip()\n","\n"," if not token.is_stop and len(word) > 2:\n"," if word not in doc_words:\n"," doc_words.append(word)\n"," else:\n"," if word not in doc_sw:\n"," doc_sw.append(word)\n","\n","print('Count of Stopwords:', len(doc_sw))\n","print('Count of words:', len(doc_words))"],"execution_count":null,"outputs":[{"output_type":"stream","text":["Count of Stopwords: 54\n","Count of words: 80\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"eCQDOq1xhmTn"},"source":["## 3. Stemming\n","\n","Process in which terms are transformed to their root in order to reduce the size of the vocabulary. It is carried by applying word reduction rules.\n","\n","Two of the most common stemming algorithms are:\n","- Porter\n","- Snowball"]},{"cell_type":"code","metadata":{"id":"NFg-hmI6fX1s"},"source":["from nltk.stem.porter import *"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"K-JmLWH8hyoD"},"source":["### 3.1. Porter Stemmer"]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"nlj46XVtfXzY","executionInfo":{"status":"ok","timestamp":1621109536465,"user_tz":300,"elapsed":2561,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"0bc5e592-91d8-4e78-caba-62b53b13cc20"},"source":["stemmer = PorterStemmer()\n","\n","for word in doc_words:\n"," root_word = stemmer.stem(word)\n"," if word != root_word:\n"," print(word + ' --> ' + root_word)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["holmes --> holm\n","himmention --> himment\n","eyes --> eye\n","eclipses --> eclips\n","andpredominates --> andpredomin\n","irene --> iren\n","emotions --> emot\n","particularly --> particularli\n","abhorrent --> abhorr\n","precise --> precis\n","admirably --> admir\n","balanced --> balanc\n","hewas --> hewa\n","reasoning --> reason\n","observing --> observ\n","machine --> machin\n","thatthe --> thatth\n","placed --> place\n","afalse --> afals\n","position --> posit\n","passions --> passion\n","admirable --> admir\n","things --> thing\n","observer --> observ\n","excellent --> excel\n","fordrawing --> fordraw\n","motives --> motiv\n","actions --> action\n","trainedreasoner --> trainedreason\n","intrusions --> intrus\n","delicate --> delic\n","finelyadjusted --> finelyadjust\n","temperament --> tempera\n","introduce --> introduc\n","distracting --> distract\n","results --> result\n","sensitiveinstrument --> sensitiveinstru\n","lenses --> lens\n","notbe --> notb\n","disturbing --> disturb\n","emotion --> emot\n","nature --> natur\n","ireneadler --> ireneadl\n","dubious --> dubiou\n","questionable --> question\n","memory --> memori\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"DYOOHZbXh13i"},"source":["### 3.2. Snowball Stemmer"]},{"cell_type":"code","metadata":{"id":"OCyfXZYBh2Mq"},"source":["from nltk.stem.snowball import SnowballStemmer"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"f1PqTBmih9m6","executionInfo":{"status":"ok","timestamp":1621109536714,"user_tz":300,"elapsed":2803,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"b68b6870-1f15-47e5-a6c5-2ee8c4d7e15a"},"source":["stemmer = SnowballStemmer(language='english')\n","\n","for word in doc_words:\n"," root_word = stemmer.stem(word)\n"," if word != root_word:\n"," print(word + ' --> ' + root_word)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["holmes --> holm\n","himmention --> himment\n","eyes --> eye\n","eclipses --> eclips\n","andpredominates --> andpredomin\n","irene --> iren\n","emotions --> emot\n","particularly --> particular\n","abhorrent --> abhorr\n","precise --> precis\n","admirably --> admir\n","balanced --> balanc\n","hewas --> hewa\n","reasoning --> reason\n","observing --> observ\n","machine --> machin\n","thatthe --> thatth\n","placed --> place\n","afalse --> afals\n","position --> posit\n","passions --> passion\n","admirable --> admir\n","things --> thing\n","observer --> observ\n","excellent --> excel\n","fordrawing --> fordraw\n","motives --> motiv\n","actions --> action\n","trainedreasoner --> trainedreason\n","intrusions --> intrus\n","delicate --> delic\n","finelyadjusted --> finelyadjust\n","temperament --> tempera\n","introduce --> introduc\n","distracting --> distract\n","results --> result\n","sensitiveinstrument --> sensitiveinstru\n","lenses --> lens\n","notbe --> notb\n","disturbing --> disturb\n","emotion --> emot\n","nature --> natur\n","ireneadler --> ireneadl\n","questionable --> question\n","memory --> memori\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"hcco2M6ShtNn"},"source":["## 4. Lemmatization\n","\n","It performs a morphological analysis using reference dictionaries to create equivalence classes between words.\n","\n","For example, for the token “eclipses”, a Stmm rule would return the term “eclips“, while through Lmmt we would get the term “eclipse“."]},{"cell_type":"code","metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"LgpGSm2thto8","executionInfo":{"status":"ok","timestamp":1621109536715,"user_tz":300,"elapsed":2801,"user":{"displayName":"Andres Segura Tinoco","photoUrl":"https://lh3.googleusercontent.com/a-/AOh14GiweZ_blXJ6DNDPTMRkqLGa414Wso2PBYrLKYCU=s64","userId":"03707731297563483663"}},"outputId":"0a8dc196-ebfd-43ca-cc7c-95e9fb1bfa31"},"source":["for token in doc_en:\n"," word = str(token).replace(',', '').strip()\n","\n"," if not token.is_stop and len(word) > 2:\n"," root_word = token.lemma_\n"," if word != root_word:\n"," print(word + ' --> ' + root_word)"],"execution_count":null,"outputs":[{"output_type":"stream","text":["heard --> hear\n","eyes --> eye\n","eclipses --> eclipse\n","andpredominates --> andpredominate\n","felt --> feel\n","emotions --> emotion\n","observing --> observe\n","seen --> see\n","placed --> place\n","spoke --> speak\n","softer --> soft\n","passions --> passion\n","things --> thing\n","men --> man\n","motives --> motive\n","actions --> action\n","intrusions --> intrusion\n","results --> result\n","lenses --> lense\n"],"name":"stdout"}]},{"cell_type":"markdown","metadata":{"id":"65UO4VrycKPM"},"source":["\n","You can contact me on Twitter | GitHub | LinkedIn"]}]}
--------------------------------------------------------------------------------