├── .github ├── CODEOWNERS ├── ISSUE_TEMPLATE.md ├── PULL_REQUEST_TEMPLATE.md ├── feedback_data.csv └── workflows │ └── main.yml ├── .gitignore ├── CONTRIBUTING.md ├── Chapter 2 ├── Ch2_challenge.ipynb ├── Ch2_solution.ipynb └── sentiment_examples.txt ├── Chapter 3 ├── Ch3_challenge.ipynb ├── Ch3_solution.ipynb └── feedback_data.csv ├── Chapter 4 ├── Ch4_challenge.ipynb ├── Ch4_solution.ipynb └── ch4_feedback_data.csv ├── Chapter 5 ├── Ch5_challenge.ipynb └── Ch5_solution.ipynb ├── LICENSE ├── NOTICE └── README.md /.github/CODEOWNERS: -------------------------------------------------------------------------------- 1 | # Codeowners for these exercise files: 2 | # * (asterisk) denotes "all files and folders" 3 | # Example: * @producer @instructor 4 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | 7 | 8 | ## Issue Overview 9 | 10 | 11 | ## Describe your environment 12 | 13 | 14 | ## Steps to Reproduce 15 | 16 | 1. 17 | 2. 18 | 3. 19 | 4. 20 | 21 | ## Expected Behavior 22 | 23 | 24 | ## Current Behavior 25 | 26 | 27 | ## Possible Solution 28 | 29 | 30 | ## Screenshots / Video 31 | 32 | 33 | ## Related Issues 34 | 35 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /.github/feedback_data.csv: -------------------------------------------------------------------------------- 1 | Your products are excellent. I really love the quality! However, delivery to my location in Los Angeles was a bit slow. abbey@email.com 2 | The customer service team in New York was helpful in resolving my issue. I appreciate the assistance. brian@email.com 3 | The new features in the latest release are fantastic! They have greatly improved the user experience in San Francisco. 4 | The product didn't meet my expectations, and I'm disappointed. I hope you can address the issues in Chicago. My email address is emailme@email.com 5 | The store was closed but the hours said it should have been open. I wasted gas and tme. helpme@email.com -------------------------------------------------------------------------------- /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | name: Copy To Branches 2 | on: 3 | workflow_dispatch: 4 | jobs: 5 | copy-to-branches: 6 | runs-on: ubuntu-latest 7 | steps: 8 | - uses: actions/checkout@v2 9 | with: 10 | fetch-depth: 0 11 | - name: Copy To Branches Action 12 | uses: planetoftheweb/copy-to-branches@v1.2 13 | env: 14 | key: main 15 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | node_modules 3 | .tmp 4 | npm-debug.log 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | 2 | Contribution Agreement 3 | ====================== 4 | 5 | This repository does not accept pull requests (PRs). All pull requests will be closed. 6 | 7 | However, if any contributions (through pull requests, issues, feedback or otherwise) are provided, as a contributor, you represent that the code you submit is your original work or that of your employer (in which case you represent you have the right to bind your employer). By submitting code (or otherwise providing feedback), you (and, if applicable, your employer) are licensing the submitted code (and/or feedback) to LinkedIn and the open source community subject to the BSD 2-Clause license. 8 | -------------------------------------------------------------------------------- /Chapter 2/Ch2_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "deletable": true, 7 | "editable": true, 8 | "id": "vqKvqP6UJLUo" 9 | }, 10 | "source": [ 11 | "# Build a spaCy Processing Pipeline\n", 12 | "\n" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "source": [ 18 | "Our use case is to perform a sentiment analysis using spaCy." 19 | ], 20 | "metadata": { 21 | "id": "oL0fn7Up5uyx" 22 | } 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "source": [ 27 | "![image.png]()" 28 | ], 29 | "metadata": { 30 | "id": "_yLfT1b55cx0" 31 | } 32 | }, 33 | { 34 | "cell_type": "markdown", 35 | "source": [ 36 | "#Exercise #1: Load Resources" 37 | ], 38 | "metadata": { 39 | "id": "RGDeqLD9pQS3" 40 | } 41 | }, 42 | { 43 | "cell_type": "code", 44 | "source": [ 45 | "# SOLUTION\n", 46 | "\n", 47 | "#Import pandas as pd\n", 48 | "import pandas as pd\n", 49 | "\n", 50 | "#Import spacy\n", 51 | "import spacy\n", 52 | "\n", 53 | "#Install spaCy\n", 54 | "!pip install spacy\n", 55 | "\n", 56 | "#Download the English langugage model for spaCy\n", 57 | "!python -m spacy download en_core_web_sm\n", 58 | "\n", 59 | "#Load the English model\n", 60 | "nlp = spacy.load(\"en_core_web_sm\") #Here, spacy.load, loads the English model and assigns it to variable nlp\n", 61 | "\n", 62 | "#When you execute nlp = spacy.load('en'), spaCy downloads and loads the pre-trained English language model\n", 63 | "#into memory and assigns it to the variable nlp.\n", 64 | "#This pre-trained model contains information about word vectors, part-of-speech tags, syntactic dependencies, and other linguistic features necessary for various NLP tasks." 65 | ], 66 | "metadata": { 67 | "id": "JYVceT0cBAK1", 68 | "colab": { 69 | "base_uri": "https://localhost:8080/" 70 | }, 71 | "outputId": "14c40dc8-8c88-416e-9407-4fa87ef01563" 72 | }, 73 | "execution_count": null, 74 | "outputs": [ 75 | { 76 | "output_type": "stream", 77 | "name": "stdout", 78 | "text": [ 79 | "Requirement already satisfied: spacy in /usr/local/lib/python3.10/dist-packages (3.7.4)\n", 80 | "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.0.12)\n", 81 | "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.0.5)\n", 82 | "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.0.10)\n", 83 | "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.0.8)\n", 84 | "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.0.9)\n", 85 | "Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /usr/local/lib/python3.10/dist-packages (from spacy) (8.2.3)\n", 86 | "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.1.2)\n", 87 | "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.4.8)\n", 88 | "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.0.10)\n", 89 | "Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (0.3.4)\n", 90 | "Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (0.9.4)\n", 91 | "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.10/dist-packages (from spacy) (6.4.0)\n", 92 | "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (4.66.2)\n", 93 | "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.31.0)\n", 94 | "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy) (2.6.4)\n", 95 | "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.1.3)\n", 96 | "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy) (67.7.2)\n", 97 | "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (24.0)\n", 98 | "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (3.3.0)\n", 99 | "Requirement already satisfied: numpy>=1.19.0 in /usr/local/lib/python3.10/dist-packages (from spacy) (1.25.2)\n", 100 | "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (0.6.0)\n", 101 | "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (2.16.3)\n", 102 | "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy) (4.10.0)\n", 103 | "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (3.3.2)\n", 104 | "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (3.6)\n", 105 | "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2.0.7)\n", 106 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy) (2024.2.2)\n", 107 | "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy) (0.7.11)\n", 108 | "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy) (0.1.4)\n", 109 | "Requirement already satisfied: click<9.0.0,>=7.1.1 in /usr/local/lib/python3.10/dist-packages (from typer<0.10.0,>=0.3.0->spacy) (8.1.7)\n", 110 | "Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from weasel<0.4.0,>=0.1.0->spacy) (0.16.0)\n", 111 | "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy) (2.1.5)\n", 112 | "Collecting en-core-web-sm==3.7.1\n", 113 | " Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)\n", 114 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.8/12.8 MB\u001b[0m \u001b[31m59.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 115 | "\u001b[?25hRequirement already satisfied: spacy<3.8.0,>=3.7.2 in /usr/local/lib/python3.10/dist-packages (from en-core-web-sm==3.7.1) (3.7.4)\n", 116 | "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.12)\n", 117 | "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.5)\n", 118 | "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.0.10)\n", 119 | "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.8)\n", 120 | "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.0.9)\n", 121 | "Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.2.3)\n", 122 | "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.1.2)\n", 123 | "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.4.8)\n", 124 | "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.10)\n", 125 | "Requirement already satisfied: weasel<0.4.0,>=0.1.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.3.4)\n", 126 | "Requirement already satisfied: typer<0.10.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.9.4)\n", 127 | "Requirement already satisfied: smart-open<7.0.0,>=5.2.1 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (6.4.0)\n", 128 | "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.66.2)\n", 129 | "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.31.0)\n", 130 | "Requirement already satisfied: pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.6.4)\n", 131 | "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.1.3)\n", 132 | "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (67.7.2)\n", 133 | "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (24.0)\n", 134 | "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3.0)\n", 135 | "Requirement already satisfied: numpy>=1.19.0 in /usr/local/lib/python3.10/dist-packages (from spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (1.25.2)\n", 136 | "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.6.0)\n", 137 | "Requirement already satisfied: pydantic-core==2.16.3 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.16.3)\n", 138 | "Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic!=1.8,!=1.8.1,<3.0.0,>=1.7.4->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (4.10.0)\n", 139 | "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.3.2)\n", 140 | "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (3.6)\n", 141 | "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.0.7)\n", 142 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3.0.0,>=2.13.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2024.2.2)\n", 143 | "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.7.11)\n", 144 | "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.1.4)\n", 145 | "Requirement already satisfied: click<9.0.0,>=7.1.1 in /usr/local/lib/python3.10/dist-packages (from typer<0.10.0,>=0.3.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (8.1.7)\n", 146 | "Requirement already satisfied: cloudpathlib<0.17.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from weasel<0.4.0,>=0.1.0->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (0.16.0)\n", 147 | "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy<3.8.0,>=3.7.2->en-core-web-sm==3.7.1) (2.1.5)\n", 148 | "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n", 149 | "You can now load the package via spacy.load('en_core_web_sm')\n", 150 | "\u001b[38;5;3m⚠ Restart to reload dependencies\u001b[0m\n", 151 | "If you are in a Jupyter or Colab notebook, you may need to restart Python in\n", 152 | "order to load all the package's dependencies. You can do this by selecting the\n", 153 | "'Restart kernel' or 'Restart runtime' option.\n" 154 | ] 155 | } 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "source": [ 161 | "##spaCy Procesing Pipeline" 162 | ], 163 | "metadata": { 164 | "id": "CynCZB2eSpk4" 165 | } 166 | }, 167 | { 168 | "cell_type": "markdown", 169 | "source": [ 170 | "In spaCy, the order of tasks in the processing pipeline generally follows a predefined sequence, although it's also customizable. By default, spaCy's processing pipeline includes the following components in the specified order:\n", 171 | "\n" 172 | ], 173 | "metadata": { 174 | "id": "E3lAMG68F3_F" 175 | } 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "source": [ 180 | "### Order of Tasks in the Processing Pipeline\n", 181 | "\n", 182 | "| Order | Name | Definition |\n", 183 | "| :-----| :------ |: ---- |\n", 184 | "| 1 | Tokenization | Input text is split into individual tokens, such as words and punctuation marks. |\n", 185 | "| 2 | Stop Words | Removes stop words from the text. |\n", 186 | "| 3 | POS Tagging | Assigns grammatical labels (e.g., noun, verb, adjective) to each token in the text based on its syntactic role within the sentence. |\n", 187 | "| 4 | Dependency Parsing| Analyzes the grammatical structure of the text by determining the relationships |\n", 188 | "| 5 |Lemmatization | Reduces tokens to their base or root form (lemmas) |\n", 189 | "| 6 | Named Entity Recognition| Identifies, categorizes persons, organizations, locations, dates, etc..|\n", 190 | "| 7 | Other Use Case Tasks| May be included in pipelne (Sentiment Analysis) |\n" 191 | ], 192 | "metadata": { 193 | "id": "9OZqsZCjD_Sh" 194 | } 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "source": [ 199 | "# Exercise #2: Build a Simple Processing Pipeline" 200 | ], 201 | "metadata": { 202 | "id": "01EyDupaWZ-U" 203 | } 204 | }, 205 | { 206 | "cell_type": "code", 207 | "source": [ 208 | "# Example text\n", 209 | "text = \"A customer in New York City wants to give a review.\"\n", 210 | "\n", 211 | "# Process the text using spaCy\n", 212 | "doc = nlp(text)\n", 213 | "\n", 214 | "# The nlp object is typically a loaded spaCy language model,\n", 215 | "# such as the English language model ('en') you loaded. When you pass\n", 216 | "# the text string to the nlp object, spaCy processes the text\n", 217 | "# through its NLP pipeline.\n", 218 | "\n", 219 | "# After executing doc = nlp(text), the variable doc contains a\n", 220 | "# spaCy Doc object, which represents the processed version of the input text.\n", 221 | "\n", 222 | "#-------------------------------------------------------------------------------\n", 223 | "# SOLUTION\n", 224 | "\n", 225 | "# Example text\n", 226 | "text = \"A customer in New York City wants to give a review.\"\n", 227 | "\n", 228 | "# Process the text using spaCy\n", 229 | "doc = nlp(text) #Recall that the Doc object represents the our sentence.\n", 230 | "\n", 231 | "\n", 232 | "# Tokenization\n", 233 | "print(\"Tokenization:\")\n", 234 | "for token in doc:\n", 235 | " # Iterate through each token in the processed text and print the token text\n", 236 | " print(token.text)\n", 237 | "\n", 238 | "print(\"\\n\") # Add a newline for separation\n", 239 | "\n", 240 | "# Filter out stop words\n", 241 | "print(\"Filtered Tokens (without stop words):\")\n", 242 | "filtered_tokens = [token.text for token in doc if not token.is_stop]\n", 243 | "# Create a list of tokens excluding stop words using list comprehension\n", 244 | "print(filtered_tokens)\n", 245 | "\n", 246 | "print(\"\\n\") # Add a newline for separation\n", 247 | "\n", 248 | "# Part-of-Speech Tagging (POS)\n", 249 | "print(\"Part-of-Speech Tagging (POS):\")\n", 250 | "for token in doc:\n", 251 | " # Iterate through each token and print the token text and its POS tag\n", 252 | " print(token.text, token.pos_)\n", 253 | "\n", 254 | "print(\"\\n\") # Add a newline for separation\n", 255 | "\n", 256 | "# Named Entity Recognition (NER)\n", 257 | "print(\"Named Entity Recognition (NER):\")\n", 258 | "for ent in doc.ents:\n", 259 | " # Iterate through each named entity in the processed text and print its text and label\n", 260 | " print(ent.text, ent.label_)\n", 261 | "\n", 262 | "print(\"\\n\") # Add a newline for separation\n", 263 | "\n", 264 | "# Lemmatization\n", 265 | "print(\"Lemmatization:\")\n", 266 | "lemmatized_tokens = [token.lemma_ for token in doc if not token.is_punct]\n", 267 | "# Create a list of lemmatized tokens excluding punctuation using list comprehension\n", 268 | "print(lemmatized_tokens)\n" 269 | ], 270 | "metadata": { 271 | "id": "55OLJ3mfrAwH", 272 | "colab": { 273 | "base_uri": "https://localhost:8080/" 274 | }, 275 | "outputId": "4cd10a4c-f19b-49e9-9b28-38b2417a1af6" 276 | }, 277 | "execution_count": null, 278 | "outputs": [ 279 | { 280 | "output_type": "stream", 281 | "name": "stdout", 282 | "text": [ 283 | "Tokenization:\n", 284 | "A\n", 285 | "customer\n", 286 | "in\n", 287 | "New\n", 288 | "York\n", 289 | "City\n", 290 | "wants\n", 291 | "to\n", 292 | "give\n", 293 | "a\n", 294 | "review\n", 295 | ".\n", 296 | "\n", 297 | "\n", 298 | "Filtered Tokens (without stop words):\n", 299 | "['customer', 'New', 'York', 'City', 'wants', 'review', '.']\n", 300 | "\n", 301 | "\n", 302 | "Part-of-Speech Tagging (POS):\n", 303 | "A DET\n", 304 | "customer NOUN\n", 305 | "in ADP\n", 306 | "New PROPN\n", 307 | "York PROPN\n", 308 | "City PROPN\n", 309 | "wants VERB\n", 310 | "to PART\n", 311 | "give VERB\n", 312 | "a DET\n", 313 | "review NOUN\n", 314 | ". PUNCT\n", 315 | "\n", 316 | "\n", 317 | "Named Entity Recognition (NER):\n", 318 | "New York City GPE\n", 319 | "\n", 320 | "\n", 321 | "Lemmatization:\n", 322 | "['a', 'customer', 'in', 'New', 'York', 'City', 'want', 'to', 'give', 'a', 'review']\n" 323 | ] 324 | } 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "source": [ 330 | "#Exercise #3: Build a Processing Pipeline with a File" 331 | ], 332 | "metadata": { 333 | "id": "5g2IwiXkT4Qj" 334 | } 335 | }, 336 | { 337 | "cell_type": "code", 338 | "source": [ 339 | "# Run this cell.\n", 340 | "\n", 341 | "file_path = '/content/sentiment_examples.txt'\n", 342 | "with open(file_path, 'r', encoding='utf-8') as file:\n", 343 | " sentiment_texts = file.readlines()\n", 344 | "\n", 345 | "# After this block of code executes, the 'sentiment_texts' list contains\n", 346 | "# all the lines of text read from the 'sentiment_examples.txt' file.\n", 347 | "\n", 348 | "# The 'with' statement is used to open the file in a way that ensures it is automatically closed after the block of code inside the 'with' statement is executed.\n", 349 | "\n", 350 | "# 'open(file_path, 'r', encoding='utf-8')' opens the file in read mode ('r') with UTF-8 encoding ('utf-8').\n", 351 | "# 'as file' assigns the opened file object to the variable 'file', which can be used to read from the file.\n", 352 | "\n", 353 | "# 'file.readlines()' reads all lines from the file and returns a list of strings, where each string corresponds to a line of text in the file.\n", 354 | "# These strings are stored in the 'sentiment_texts' list for further processing." 355 | ], 356 | "metadata": { 357 | "id": "UBSMP0lLrXRi" 358 | }, 359 | "execution_count": null, 360 | "outputs": [] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "source": [ 365 | "# Initialize empty lists to store the results\n", 366 | "token_lists = [] # List to store tokens for each sentiment example\n", 367 | "filtered_token_lists = [] # List to store filtered tokens (after stop word removal) for each sentiment example\n", 368 | "pos_tag_lists = [] # List to store POS tags for each sentiment example\n", 369 | "ner_lists = [] # List to store named entities for each sentiment example\n", 370 | "\n", 371 | "# Process each sentiment example using spaCy and store the results\n", 372 | "for sentiment_text in sentiment_texts:\n", 373 | " doc = nlp(sentiment_text.strip()) # Strip any leading/trailing whitespace\n", 374 | " #The .strip() method is used to clean up the sentiment_text\n", 375 | " #before passing it to the spaCy nlp pipeline for processing.\n", 376 | " #This ensures that there are no unwanted spaces or newline characters\n", 377 | " #that could affect the processing of the text by spaCy.\n", 378 | "\n", 379 | "#-----------------------------------------------------------------------------\n", 380 | "# SOLUTION\n", 381 | "\n", 382 | " # Tokenization\n", 383 | " tokens = [token.text for token in doc] # Extract tokens from the processed text\n", 384 | " token_lists.append(tokens) # Append tokens list to token_lists\n", 385 | " #Creates a list of tokens and appending it to the token_lists list.\n", 386 | " #This is done to store the tokens for each sentiment example.\n", 387 | "\n", 388 | " # Stop Word Removal filter\n", 389 | " filtered_tokens = [token.text for token in doc if not token.is_stop] # Filter out stop words\n", 390 | " filtered_token_lists.append(filtered_tokens) # Append filtered tokens list to filtered_token_lists\n", 391 | "\n", 392 | "\n", 393 | " # Part-of-Speech Tagging (POS tagging)\n", 394 | " pos_tags = [(token.text, token.pos_) for token in doc] # Extract token text and POS tags\n", 395 | " pos_tag_lists.append(pos_tags) # Append POS tags list to pos_tag_lists\n", 396 | "\n", 397 | " # Named Entity Recognition (NER)\n", 398 | " ner_entities = [(ent.text, ent.label_) for ent in doc.ents] # Extract named entities and their labels\n", 399 | " ner_lists.append(ner_entities) # Append named entities list to ner_lists\n", 400 | "\n", 401 | "# Create a DataFrame to organize the results\n", 402 | "results_df = pd.DataFrame({\n", 403 | " 'Sentiment Example': sentiment_texts,\n", 404 | " 'Tokens': token_lists,\n", 405 | " 'Filtered Tokens': filtered_token_lists,\n", 406 | " 'POS Tags': pos_tag_lists,\n", 407 | " 'Named Entities': ner_lists\n", 408 | "})\n", 409 | "\n", 410 | "# Display the DataFrame\n", 411 | "print(results_df)\n" 412 | ], 413 | "metadata": { 414 | "id": "BzCjJnodmL_s", 415 | "colab": { 416 | "base_uri": "https://localhost:8080/" 417 | }, 418 | "outputId": "f75c1516-44e4-4d15-fc4c-91aeac1b0d4f" 419 | }, 420 | "execution_count": null, 421 | "outputs": [ 422 | { 423 | "output_type": "stream", 424 | "name": "stdout", 425 | "text": [ 426 | " Sentiment Example \\\n", 427 | "0 \"I love the new features of your product! It h... \n", 428 | "1 \"The customer support was exceptional in New Y... \n", 429 | "2 \"The quality of your service exceeded my expec... \n", 430 | "3 \"I'm extremely satisfied with my purchase. The... \n", 431 | "4 \"The user interface is intuitive and easy to n... \n", 432 | "5 \"I had a positive experience shopping on your ... \n", 433 | "6 \"Your company values customer feedback, and it... \n", 434 | "7 \"The pricing is fair, and the value I get in r... \n", 435 | "8 \"I appreciate the personalized recommendations... \n", 436 | "9 \"The delivery was prompt, and the packaging wa... \n", 437 | "\n", 438 | " Tokens \\\n", 439 | "0 [\", I, love, the, new, features, of, your, pro... \n", 440 | "1 [\", The, customer, support, was, exceptional, ... \n", 441 | "2 [\", The, quality, of, your, service, exceeded,... \n", 442 | "3 [\", I, 'm, extremely, satisfied, with, my, pur... \n", 443 | "4 [\", The, user, interface, is, intuitive, and, ... \n", 444 | "5 [\", I, had, a, positive, experience, shopping,... \n", 445 | "6 [\", Your, company, values, customer, feedback,... \n", 446 | "7 [\", The, pricing, is, fair, ,, and, the, value... \n", 447 | "8 [\", I, appreciate, the, personalized, recommen... \n", 448 | "9 [\", The, delivery, was, prompt, ,, and, the, p... \n", 449 | "\n", 450 | " Filtered Tokens \\\n", 451 | "0 [\", love, new, features, product, !, greatly, ... \n", 452 | "1 [\", customer, support, exceptional, New, York,... \n", 453 | "2 [\", quality, service, exceeded, expectations, ... \n", 454 | "3 [\", extremely, satisfied, purchase, ., product... \n", 455 | "4 [\", user, interface, intuitive, easy, navigate... \n", 456 | "5 [\", positive, experience, shopping, website, .... \n", 457 | "6 [\", company, values, customer, feedback, ,, sh... \n", 458 | "7 [\", pricing, fair, ,, value, return, fantastic... \n", 459 | "8 [\", appreciate, personalized, recommendations,... \n", 460 | "9 [\", delivery, prompt, ,, packaging, secure, .,... \n", 461 | "\n", 462 | " POS Tags Named Entities \n", 463 | "0 [(\", PUNCT), (I, PRON), (love, VERB), (the, DE... [] \n", 464 | "1 [(\", PUNCT), (The, DET), (customer, NOUN), (su... [(New York, GPE)] \n", 465 | "2 [(\", PUNCT), (The, DET), (quality, NOUN), (of,... [(Prague, GPE)] \n", 466 | "3 [(\", PUNCT), (I, PRON), ('m, AUX), (extremely,... [] \n", 467 | "4 [(\", PUNCT), (The, DET), (user, NOUN), (interf... [] \n", 468 | "5 [(\", PUNCT), (I, PRON), (had, VERB), (a, DET),... [] \n", 469 | "6 [(\", PUNCT), (Your, PRON), (company, NOUN), (v... [] \n", 470 | "7 [(\", PUNCT), (The, DET), (pricing, NOUN), (is,... [] \n", 471 | "8 [(\", PUNCT), (I, PRON), (appreciate, VERB), (t... [] \n", 472 | "9 [(\", PUNCT), (The, DET), (delivery, NOUN), (wa... [] \n" 473 | ] 474 | } 475 | ] 476 | }, 477 | { 478 | "cell_type": "markdown", 479 | "source": [ 480 | "Export Data to CSV to see the processed_data.csv" 481 | ], 482 | "metadata": { 483 | "id": "K3E8rVsOkWHr" 484 | } 485 | }, 486 | { 487 | "cell_type": "code", 488 | "source": [ 489 | "\n", 490 | "# Write the DataFrame to a CSV file named 'processed_data.csv' without including the index\n", 491 | "results_df.to_csv('processed_data.csv', index=False)\n", 492 | "\n", 493 | "# Read the CSV file 'processed_data.csv' into a Pandas DataFrame called processed_df\n", 494 | "# Specify the encoding as 'latin-1' to handle special characters if present\n", 495 | "processed_df = pd.read_csv('/content/processed_data.csv', encoding='latin-1')\n" 496 | ], 497 | "metadata": { 498 | "id": "xyVoHpxui5Za" 499 | }, 500 | "execution_count": null, 501 | "outputs": [] 502 | }, 503 | { 504 | "cell_type": "code", 505 | "source": [ 506 | "processed_df.head()" 507 | ], 508 | "metadata": { 509 | "id": "3YzY2O4MkYeq", 510 | "colab": { 511 | "base_uri": "https://localhost:8080/", 512 | "height": 310 513 | }, 514 | "outputId": "f7f4b7bf-f8e3-4524-b3f9-c2de3fe6d7cd" 515 | }, 516 | "execution_count": null, 517 | "outputs": [ 518 | { 519 | "output_type": "execute_result", 520 | "data": { 521 | "text/plain": [ 522 | " Sentiment Example \\\n", 523 | "0 \"I love the new features of your product! It h... \n", 524 | "1 \"The customer support was exceptional in New Y... \n", 525 | "2 \"The quality of your service exceeded my expec... \n", 526 | "3 \"I'm extremely satisfied with my purchase. The... \n", 527 | "4 \"The user interface is intuitive and easy to n... \n", 528 | "\n", 529 | " Tokens \\\n", 530 | "0 ['\"', 'I', 'love', 'the', 'new', 'features', '... \n", 531 | "1 ['\"', 'The', 'customer', 'support', 'was', 'ex... \n", 532 | "2 ['\"', 'The', 'quality', 'of', 'your', 'service... \n", 533 | "3 ['\"', 'I', \"'m\", 'extremely', 'satisfied', 'wi... \n", 534 | "4 ['\"', 'The', 'user', 'interface', 'is', 'intui... \n", 535 | "\n", 536 | " Filtered Tokens \\\n", 537 | "0 ['\"', 'love', 'new', 'features', 'product', '!... \n", 538 | "1 ['\"', 'customer', 'support', 'exceptional', 'N... \n", 539 | "2 ['\"', 'quality', 'service', 'exceeded', 'expec... \n", 540 | "3 ['\"', 'extremely', 'satisfied', 'purchase', '.... \n", 541 | "4 ['\"', 'user', 'interface', 'intuitive', 'easy'... \n", 542 | "\n", 543 | " POS Tags Named Entities \n", 544 | "0 [('\"', 'PUNCT'), ('I', 'PRON'), ('love', 'VERB... [] \n", 545 | "1 [('\"', 'PUNCT'), ('The', 'DET'), ('customer', ... [('New York', 'GPE')] \n", 546 | "2 [('\"', 'PUNCT'), ('The', 'DET'), ('quality', '... [('Prague', 'GPE')] \n", 547 | "3 [('\"', 'PUNCT'), ('I', 'PRON'), (\"'m\", 'AUX'),... [] \n", 548 | "4 [('\"', 'PUNCT'), ('The', 'DET'), ('user', 'NOU... [] " 549 | ], 550 | "text/html": [ 551 | "\n", 552 | "
\n", 553 | "
\n", 554 | "\n", 567 | "\n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | "
Sentiment ExampleTokensFiltered TokensPOS TagsNamed Entities
0\"I love the new features of your product! It h...['\"', 'I', 'love', 'the', 'new', 'features', '...['\"', 'love', 'new', 'features', 'product', '!...[('\"', 'PUNCT'), ('I', 'PRON'), ('love', 'VERB...[]
1\"The customer support was exceptional in New Y...['\"', 'The', 'customer', 'support', 'was', 'ex...['\"', 'customer', 'support', 'exceptional', 'N...[('\"', 'PUNCT'), ('The', 'DET'), ('customer', ...[('New York', 'GPE')]
2\"The quality of your service exceeded my expec...['\"', 'The', 'quality', 'of', 'your', 'service...['\"', 'quality', 'service', 'exceeded', 'expec...[('\"', 'PUNCT'), ('The', 'DET'), ('quality', '...[('Prague', 'GPE')]
3\"I'm extremely satisfied with my purchase. The...['\"', 'I', \"'m\", 'extremely', 'satisfied', 'wi...['\"', 'extremely', 'satisfied', 'purchase', '....[('\"', 'PUNCT'), ('I', 'PRON'), (\"'m\", 'AUX'),...[]
4\"The user interface is intuitive and easy to n...['\"', 'The', 'user', 'interface', 'is', 'intui...['\"', 'user', 'interface', 'intuitive', 'easy'...[('\"', 'PUNCT'), ('The', 'DET'), ('user', 'NOU...[]
\n", 621 | "
\n", 622 | "
\n", 623 | "\n", 624 | "
\n", 625 | " \n", 633 | "\n", 634 | " \n", 674 | "\n", 675 | " \n", 699 | "
\n", 700 | "\n", 701 | "\n", 702 | "
\n", 703 | " \n", 714 | "\n", 715 | "\n", 804 | "\n", 805 | " \n", 827 | "
\n", 828 | "\n", 829 | "
\n", 830 | "
\n" 831 | ], 832 | "application/vnd.google.colaboratory.intrinsic+json": { 833 | "type": "dataframe", 834 | "variable_name": "processed_df", 835 | "summary": "{\n \"name\": \"processed_df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"Sentiment Example\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"\\\"I appreciate the personalized recommendations. It makes shopping more convenient.\\\"\\n\",\n \"\\\"The customer support was exceptional in New York. They resolved my issue promptly and professionally.\\\"\\n\",\n \"\\\"I had a positive experience shopping on your website. The checkout process was smooth.\\\"\\n\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Tokens\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"['\\\"', 'I', 'appreciate', 'the', 'personalized', 'recommendations', '.', 'It', 'makes', 'shopping', 'more', 'convenient', '.', '\\\"']\",\n \"['\\\"', 'The', 'customer', 'support', 'was', 'exceptional', 'in', 'New', 'York', '.', 'They', 'resolved', 'my', 'issue', 'promptly', 'and', 'professionally', '.', '\\\"']\",\n \"['\\\"', 'I', 'had', 'a', 'positive', 'experience', 'shopping', 'on', 'your', 'website', '.', 'The', 'checkout', 'process', 'was', 'smooth', '.', '\\\"']\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Filtered Tokens\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"['\\\"', 'appreciate', 'personalized', 'recommendations', '.', 'makes', 'shopping', 'convenient', '.', '\\\"']\",\n \"['\\\"', 'customer', 'support', 'exceptional', 'New', 'York', '.', 'resolved', 'issue', 'promptly', 'professionally', '.', '\\\"']\",\n \"['\\\"', 'positive', 'experience', 'shopping', 'website', '.', 'checkout', 'process', 'smooth', '.', '\\\"']\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"POS Tags\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"[('\\\"', 'PUNCT'), ('I', 'PRON'), ('appreciate', 'VERB'), ('the', 'DET'), ('personalized', 'ADJ'), ('recommendations', 'NOUN'), ('.', 'PUNCT'), ('It', 'PRON'), ('makes', 'VERB'), ('shopping', 'VERB'), ('more', 'ADV'), ('convenient', 'ADJ'), ('.', 'PUNCT'), ('\\\"', 'PUNCT')]\",\n \"[('\\\"', 'PUNCT'), ('The', 'DET'), ('customer', 'NOUN'), ('support', 'NOUN'), ('was', 'AUX'), ('exceptional', 'ADJ'), ('in', 'ADP'), ('New', 'PROPN'), ('York', 'PROPN'), ('.', 'PUNCT'), ('They', 'PRON'), ('resolved', 'VERB'), ('my', 'PRON'), ('issue', 'NOUN'), ('promptly', 'ADV'), ('and', 'CCONJ'), ('professionally', 'ADV'), ('.', 'PUNCT'), ('\\\"', 'PUNCT')]\",\n \"[('\\\"', 'PUNCT'), ('I', 'PRON'), ('had', 'VERB'), ('a', 'DET'), ('positive', 'ADJ'), ('experience', 'NOUN'), ('shopping', 'VERB'), ('on', 'ADP'), ('your', 'PRON'), ('website', 'NOUN'), ('.', 'PUNCT'), ('The', 'DET'), ('checkout', 'NOUN'), ('process', 'NOUN'), ('was', 'AUX'), ('smooth', 'ADJ'), ('.', 'PUNCT'), ('\\\"', 'PUNCT')]\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Named Entities\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"[]\",\n \"[('New York', 'GPE')]\",\n \"[('Prague', 'GPE')]\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" 836 | } 837 | }, 838 | "metadata": {}, 839 | "execution_count": 7 840 | } 841 | ] 842 | } 843 | ], 844 | "metadata": { 845 | "kernelspec": { 846 | "display_name": "Python 2", 847 | "language": "python", 848 | "name": "python2" 849 | }, 850 | "language_info": { 851 | "codemirror_mode": { 852 | "name": "ipython", 853 | "version": 2 854 | }, 855 | "file_extension": ".py", 856 | "mimetype": "text/x-python", 857 | "name": "python", 858 | "nbconvert_exporter": "python", 859 | "pygments_lexer": "ipython2", 860 | "version": "2.7.13" 861 | }, 862 | "colab": { 863 | "provenance": [] 864 | } 865 | }, 866 | "nbformat": 4, 867 | "nbformat_minor": 0 868 | } -------------------------------------------------------------------------------- /Chapter 2/sentiment_examples.txt: -------------------------------------------------------------------------------- 1 | "I love the new features of your product! It has greatly improved my productivity." 2 | "The customer support was exceptional in New York. They resolved my issue promptly and professionally." 3 | "The quality of your service exceeded my expectations in Prague. I'm impressed!" 4 | "I'm extremely satisfied with my purchase. The product works flawlessly." 5 | "The user interface is intuitive and easy to navigate. Kudos to your design team!" 6 | "I had a positive experience shopping on your website. The checkout process was smooth." 7 | "Your company values customer feedback, and it shows in the improvements you've made." 8 | "The pricing is fair, and the value I get in return is fantastic." 9 | "I appreciate the personalized recommendations. It makes shopping more convenient." 10 | "The delivery was prompt, and the packaging was secure. Everything arrived in perfect condition." -------------------------------------------------------------------------------- /Chapter 3/Ch3_challenge.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "15EatDy5yxEI" 7 | }, 8 | "source": [ 9 | "#Analyze Customer Feedback\n", 10 | "\n", 11 | "In this exercise, we analyze customer feedback using spaCy and TextBlob." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "source": [ 17 | "#Exercise #1: Install Libraries and Modules\n", 18 | "\n" 19 | ], 20 | "metadata": { 21 | "id": "xfkLj86ioXxy" 22 | } 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": null, 27 | "metadata": { 28 | "id": "aQZzfNqQkdfW" 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "#Install Libraries and Modules\n", 33 | "\n", 34 | "\n", 35 | "\n", 36 | "\n" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "source": [ 42 | "#Exercise #2: Read in Feedback File and Create Output File for Results" 43 | ], 44 | "metadata": { 45 | "id": "TElJA9XWLMrm" 46 | } 47 | }, 48 | { 49 | "cell_type": "code", 50 | "source": [ 51 | "# EXERCISE - SCROLL DOWN TO THE \"START HERE\" SECTION BELOW.\n", 52 | "\n", 53 | "# Load the English NLP model\n", 54 | "nlp = spacy.load(\"en_core_web_sm\")\n", 55 | "\n", 56 | "\n", 57 | "#Define the path to the file containing feedback\n", 58 | "file_path = '/content/feedback_data.csv'\n", 59 | "\n", 60 | "# Read the file using open() with the specified parameters\n", 61 | "with open(file_path, \"r\", encoding=\"utf-8\") as file:\n", 62 | " # Step 2: Read the lines of the file\n", 63 | " feedback_data = file.readlines()\n", 64 | "\n", 65 | "# Add code to output feedback results to a file.\n", 66 | "output_csv_path = \"feedback_analysis_results.csv\"\n", 67 | "\n", 68 | "# Prepare CSV header\n", 69 | "csv_header = [\"Feedback Index\", \"Sentiment Polarity\", \"Sentiment Subjectivity\", \"Named Entities\", \"Preferred Contact Method\"]\n", 70 | "\n", 71 | "# Open CSV file for writing\n", 72 | "with open(output_csv_path, \"w\", newline=\"\", encoding=\"utf-8\") as csv_file:\n", 73 | " # Create CSV writer\n", 74 | " csv_writer = csv.writer(csv_file)\n", 75 | "\n", 76 | " # Write the header\n", 77 | " csv_writer.writerow(csv_header)\n", 78 | "\n", 79 | "#-----------------------------------------------------------------\n", 80 | "# START HERE\n", 81 | "\n", 82 | "# Process and analyze each feedback\n", 83 | "\n", 84 | "\n", 85 | "# Process the narrative using spaCy\n", 86 | "\n", 87 | "\n", 88 | "\n", 89 | "# Perform sentiment analysis using TextBlob\n", 90 | "\n", 91 | "\n", 92 | "\n", 93 | "# Extract named entities\n", 94 | "\n", 95 | "\n", 96 | "# Determine the preferred contact method\n", 97 | " preferred_contact_method = \"chat\" if \"email\" not in narrative.lower() else \"email\"\n", 98 | " print(f\"Preferred Contact Method: {preferred_contact_method}\")\n", 99 | "\n", 100 | "# Write the results to the CSV file\n", 101 | " csv_writer.writerow([idx, sentiment_polarity, sentiment_subjectivity, entities, preferred_contact_method])\n", 102 | "\n", 103 | "#-----------------------------------------------------------------" 104 | ], 105 | "metadata": { 106 | "id": "mjkLcZRHQk0t" 107 | }, 108 | "execution_count": null, 109 | "outputs": [] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": { 114 | "id": "l7hiWdrojvK_" 115 | }, 116 | "source": [ 117 | "#Exercise #3: Exploratory Data Analysis on Customer Feedback" 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "id": "c3VdXX05nCw0" 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "##Load results file into a Pandas Dataframe\n", 129 | "\n", 130 | "\n", 131 | "\n", 132 | "# Display the first rows of the DataFrame\n", 133 | "\n" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": { 140 | "id": "CP2J1lkwnyYL" 141 | }, 142 | "outputs": [], 143 | "source": [ 144 | "df.info()" 145 | ] 146 | }, 147 | { 148 | "cell_type": "markdown", 149 | "source": [ 150 | "##Plot the Preferred Contact Method" 151 | ], 152 | "metadata": { 153 | "id": "HPcHpaS5nb2k" 154 | } 155 | }, 156 | { 157 | "cell_type": "code", 158 | "source": [ 159 | "# Example plot: Preferred Contact Method Count\n", 160 | "\n", 161 | "\n", 162 | "\n", 163 | "\n" 164 | ], 165 | "metadata": { 166 | "id": "1LKkuGxtfDOt" 167 | }, 168 | "execution_count": null, 169 | "outputs": [] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "source": [ 174 | "##Plot the Sentiment Polarity and Sentiment Subjectivity" 175 | ], 176 | "metadata": { 177 | "id": "1ZN_mzufnh0S" 178 | } 179 | }, 180 | { 181 | "cell_type": "code", 182 | "source": [ 183 | "# Pair Plot: Pairwise relationships\n", 184 | "sns.pairplot(data=df[['Sentiment Polarity', 'Sentiment Subjectivity']],\n", 185 | " diag_kind='kde')\n", 186 | "plt.suptitle('Pair Plot: Pairwise relationships')\n", 187 | "plt.show()" 188 | ], 189 | "metadata": { 190 | "id": "WjYn8tEjgivQ" 191 | }, 192 | "execution_count": null, 193 | "outputs": [] 194 | } 195 | ], 196 | "metadata": { 197 | "colab": { 198 | "provenance": [] 199 | }, 200 | "kernelspec": { 201 | "display_name": "Python 3", 202 | "name": "python3" 203 | }, 204 | "language_info": { 205 | "name": "python" 206 | } 207 | }, 208 | "nbformat": 4, 209 | "nbformat_minor": 0 210 | } 211 | -------------------------------------------------------------------------------- /Chapter 3/Ch3_solution.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "15EatDy5yxEI" 7 | }, 8 | "source": [ 9 | "#Analyze Customer Feedback\n", 10 | "\n", 11 | "In this exercise, we analyze customer feedback using spaCy and TextBlob." 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "source": [ 17 | "#Exercise #1: Install Libraries and Modules\n", 18 | "\n" 19 | ], 20 | "metadata": { 21 | "id": "xfkLj86ioXxy" 22 | } 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 1, 27 | "metadata": { 28 | "id": "aQZzfNqQkdfW" 29 | }, 30 | "outputs": [], 31 | "source": [ 32 | "# Install Libraries and Modules\n", 33 | "\n", 34 | "\n", 35 | "import spacy\n", 36 | "from textblob import TextBlob\n", 37 | "import pandas as pd\n", 38 | "import csv\n", 39 | "\n", 40 | "import seaborn as sns\n", 41 | "import matplotlib.pyplot as plt\n", 42 | "\n", 43 | "\n", 44 | "\n" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "source": [ 50 | "#Exercise #2: Read in Feedback File and Create Output File for Results" 51 | ], 52 | "metadata": { 53 | "id": "TElJA9XWLMrm" 54 | } 55 | }, 56 | { 57 | "cell_type": "code", 58 | "source": [ 59 | "\n", 60 | "# Load the English NLP model\n", 61 | "nlp = spacy.load(\"en_core_web_sm\")\n", 62 | "\n", 63 | "\n", 64 | "#Define the path to the file containing feedback\n", 65 | "file_path = '/content/feedback_data.csv'\n", 66 | "\n", 67 | "# Read the file using open() with the specified parameters\n", 68 | "with open(file_path, \"r\", encoding=\"utf-8\") as file:\n", 69 | " # Step 2: Read the lines of the file\n", 70 | " feedback_data = file.readlines()\n", 71 | "\n", 72 | "# Add code to output feedback results to a file.\n", 73 | "output_csv_path = \"feedback_analysis_results.csv\"\n", 74 | "\n", 75 | "# Prepare CSV header\n", 76 | "csv_header = [\"Feedback Index\", \"Sentiment Polarity\", \"Sentiment Subjectivity\", \"Named Entities\", \"Preferred Contact Method\"]\n", 77 | "\n", 78 | "# Open CSV file for writing\n", 79 | "with open(output_csv_path, \"w\", newline=\"\", encoding=\"utf-8\") as csv_file:\n", 80 | " # Create CSV writer\n", 81 | " csv_writer = csv.writer(csv_file)\n", 82 | "\n", 83 | " # Write the header\n", 84 | " csv_writer.writerow(csv_header)\n", 85 | "\n", 86 | "#-----------------------------------------------------------------\n", 87 | "# SOLUTION\n", 88 | "\n", 89 | "# Process and analyze each feedback\n", 90 | " for idx, narrative in enumerate(feedback_data, start=1):\n", 91 | " print(f\"\\nProcessing Feedback {idx}:\")\n", 92 | " print(\"------------------------------\")\n", 93 | " print(narrative.strip())\n", 94 | "\n", 95 | "# Process the narrative using spaCy\n", 96 | " doc = nlp(narrative)\n", 97 | "\n", 98 | "# Perform sentiment analysis using TextBlob\n", 99 | " blob = TextBlob(narrative)\n", 100 | " sentiment_polarity = blob.sentiment.polarity\n", 101 | " sentiment_subjectivity = blob.sentiment.subjectivity\n", 102 | " print(f\"Sentiment: {sentiment_polarity} (Polarity), {sentiment_subjectivity} (Subjectivity)\")\n", 103 | "\n", 104 | "# Extract named entities\n", 105 | " entities = [(ent.text, ent.label_) for ent in doc.ents]\n", 106 | " print(\"Named Entities:\", entities)\n", 107 | "\n", 108 | "# Determine the preferred contact method\n", 109 | " preferred_contact_method = \"chat\" if \"email\" not in narrative.lower() else \"email\"\n", 110 | " print(f\"Preferred Contact Method: {preferred_contact_method}\")\n", 111 | "\n", 112 | "# Write the results to the CSV file\n", 113 | " csv_writer.writerow([idx, sentiment_polarity, sentiment_subjectivity, entities, preferred_contact_method])\n" 114 | ], 115 | "metadata": { 116 | "id": "mjkLcZRHQk0t", 117 | "colab": { 118 | "base_uri": "https://localhost:8080/" 119 | }, 120 | "outputId": "a718c770-f86d-4a13-e904-27f573393eb3" 121 | }, 122 | "execution_count": 2, 123 | "outputs": [ 124 | { 125 | "output_type": "stream", 126 | "name": "stdout", 127 | "text": [ 128 | "\n", 129 | "Processing Feedback 1:\n", 130 | "------------------------------\n", 131 | "Your products are excellent. I really love the quality! However, delivery to my location in Los Angeles was a bit slow. abbey@email.com\n", 132 | "Sentiment: 0.44166666666666665 (Polarity), 0.6666666666666666 (Subjectivity)\n", 133 | "Named Entities: [('Los Angeles', 'GPE')]\n", 134 | "Preferred Contact Method: email\n", 135 | "\n", 136 | "Processing Feedback 2:\n", 137 | "------------------------------\n", 138 | "The customer service team in New York was helpful in resolving my issue. I appreciate the assistance. brian@email.com\n", 139 | "Sentiment: 0.13636363636363635 (Polarity), 0.45454545454545453 (Subjectivity)\n", 140 | "Named Entities: [('New York', 'GPE')]\n", 141 | "Preferred Contact Method: email\n", 142 | "\n", 143 | "Processing Feedback 3:\n", 144 | "------------------------------\n", 145 | "The new features in the latest release are fantastic! They have greatly improved the user experience in San Francisco.\n", 146 | "Sentiment: 0.48409090909090907 (Polarity), 0.7511363636363636 (Subjectivity)\n", 147 | "Named Entities: [('San Francisco', 'GPE')]\n", 148 | "Preferred Contact Method: chat\n", 149 | "\n", 150 | "Processing Feedback 4:\n", 151 | "------------------------------\n", 152 | "The product didn't meet my expectations, and I'm disappointed. I hope you can address the issues in Chicago. My email address is emailme@email.com\n", 153 | "Sentiment: -0.75 (Polarity), 0.75 (Subjectivity)\n", 154 | "Named Entities: [('Chicago', 'GPE')]\n", 155 | "Preferred Contact Method: email\n" 156 | ] 157 | } 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": { 163 | "id": "l7hiWdrojvK_" 164 | }, 165 | "source": [ 166 | "#Exercise #3: Exploratory Data Analysis on Customer Feedback" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 3, 172 | "metadata": { 173 | "id": "c3VdXX05nCw0", 174 | "colab": { 175 | "base_uri": "https://localhost:8080/", 176 | "height": 174 177 | }, 178 | "outputId": "7a630617-7e22-48ac-de31-514f2cd6a8cf" 179 | }, 180 | "outputs": [ 181 | { 182 | "output_type": "execute_result", 183 | "data": { 184 | "text/plain": [ 185 | " Feedback Index Sentiment Polarity Sentiment Subjectivity \\\n", 186 | "0 1 0.441667 0.666667 \n", 187 | "1 2 0.136364 0.454545 \n", 188 | "2 3 0.484091 0.751136 \n", 189 | "3 4 -0.750000 0.750000 \n", 190 | "\n", 191 | " Named Entities Preferred Contact Method \n", 192 | "0 [('Los Angeles', 'GPE')] email \n", 193 | "1 [('New York', 'GPE')] email \n", 194 | "2 [('San Francisco', 'GPE')] chat \n", 195 | "3 [('Chicago', 'GPE')] email " 196 | ], 197 | "text/html": [ 198 | "\n", 199 | "
\n", 200 | "
\n", 201 | "\n", 214 | "\n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | "
Feedback IndexSentiment PolaritySentiment SubjectivityNamed EntitiesPreferred Contact Method
010.4416670.666667[('Los Angeles', 'GPE')]email
120.1363640.454545[('New York', 'GPE')]email
230.4840910.751136[('San Francisco', 'GPE')]chat
34-0.7500000.750000[('Chicago', 'GPE')]email
\n", 260 | "
\n", 261 | "
\n", 262 | "\n", 263 | "
\n", 264 | " \n", 272 | "\n", 273 | " \n", 313 | "\n", 314 | " \n", 338 | "
\n", 339 | "\n", 340 | "\n", 341 | "
\n", 342 | " \n", 353 | "\n", 354 | "\n", 443 | "\n", 444 | " \n", 466 | "
\n", 467 | "\n", 468 | "
\n", 469 | "
\n" 470 | ], 471 | "application/vnd.google.colaboratory.intrinsic+json": { 472 | "type": "dataframe", 473 | "variable_name": "df", 474 | "summary": "{\n \"name\": \"df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"Feedback Index\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 4,\n \"num_unique_values\": 4,\n \"samples\": [\n 2,\n 4,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentiment Polarity\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.5733392193304117,\n \"min\": -0.75,\n \"max\": 0.484090909090909,\n \"num_unique_values\": 4,\n \"samples\": [\n 0.1363636363636363,\n -0.75,\n 0.4416666666666666\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Sentiment Subjectivity\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.13974256983521738,\n \"min\": 0.4545454545454545,\n \"max\": 0.7511363636363636,\n \"num_unique_values\": 4,\n \"samples\": [\n 0.4545454545454545,\n 0.75,\n 0.6666666666666666\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Named Entities\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"[('New York', 'GPE')]\",\n \"[('Chicago', 'GPE')]\",\n \"[('Los Angeles', 'GPE')]\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Preferred Contact Method\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"chat\",\n \"email\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" 475 | } 476 | }, 477 | "metadata": {}, 478 | "execution_count": 3 479 | } 480 | ], 481 | "source": [ 482 | "##Load results file into a Pandas Dataframe\n", 483 | "\n", 484 | "# Load data into a Pandas DataFrame\n", 485 | "df = pd.read_csv('/content/feedback_analysis_results.csv')\n", 486 | "\n", 487 | "# Display the DataFrame\n", 488 | "df.head()\n" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 4, 494 | "metadata": { 495 | "id": "CP2J1lkwnyYL", 496 | "colab": { 497 | "base_uri": "https://localhost:8080/" 498 | }, 499 | "outputId": "934609e1-2538-49bf-b586-b03f4684688e" 500 | }, 501 | "outputs": [ 502 | { 503 | "output_type": "stream", 504 | "name": "stdout", 505 | "text": [ 506 | "\n", 507 | "RangeIndex: 4 entries, 0 to 3\n", 508 | "Data columns (total 5 columns):\n", 509 | " # Column Non-Null Count Dtype \n", 510 | "--- ------ -------------- ----- \n", 511 | " 0 Feedback Index 4 non-null int64 \n", 512 | " 1 Sentiment Polarity 4 non-null float64\n", 513 | " 2 Sentiment Subjectivity 4 non-null float64\n", 514 | " 3 Named Entities 4 non-null object \n", 515 | " 4 Preferred Contact Method 4 non-null object \n", 516 | "dtypes: float64(2), int64(1), object(2)\n", 517 | "memory usage: 288.0+ bytes\n" 518 | ] 519 | } 520 | ], 521 | "source": [ 522 | "df.info()" 523 | ] 524 | }, 525 | { 526 | "cell_type": "markdown", 527 | "source": [ 528 | "##Plot the Preferred Contact Method" 529 | ], 530 | "metadata": { 531 | "id": "HPcHpaS5nb2k" 532 | } 533 | }, 534 | { 535 | "cell_type": "code", 536 | "source": [ 537 | "# Example plot: Preferred Contact Method Count\n", 538 | "plt.figure(figsize=(10, 6))\n", 539 | "sns.countplot(data=df, x='Preferred Contact Method', palette='muted')\n", 540 | "plt.title('Preferred Contact Method Count')\n", 541 | "plt.xlabel('Preferred Contact Method')\n", 542 | "plt.ylabel('Count')\n", 543 | "plt.show()" 544 | ], 545 | "metadata": { 546 | "id": "1LKkuGxtfDOt", 547 | "colab": { 548 | "base_uri": "https://localhost:8080/", 549 | "height": 671 550 | }, 551 | "outputId": "f637ee60-99eb-4d6d-9ff7-ca6e7f7f3929" 552 | }, 553 | "execution_count": 6, 554 | "outputs": [ 555 | { 556 | "output_type": "stream", 557 | "name": "stderr", 558 | "text": [ 559 | ":3: FutureWarning: \n", 560 | "\n", 561 | "Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.\n", 562 | "\n", 563 | " sns.countplot(data=df, x='Preferred Contact Method', palette='muted')\n" 564 | ] 565 | }, 566 | { 567 | "output_type": "display_data", 568 | "data": { 569 | "text/plain": [ 570 | "
" 571 | ], 572 | "image/png": "\n" 573 | }, 574 | "metadata": {} 575 | } 576 | ] 577 | }, 578 | { 579 | "cell_type": "markdown", 580 | "source": [ 581 | "##Plot the Sentiment Polarity and Sentiment Subjectivity" 582 | ], 583 | "metadata": { 584 | "id": "1ZN_mzufnh0S" 585 | } 586 | }, 587 | { 588 | "cell_type": "code", 589 | "source": [ 590 | "# Pair Plot: Pairwise relationships\n", 591 | "sns.pairplot(data=df[['Sentiment Polarity', 'Sentiment Subjectivity']],\n", 592 | " diag_kind='kde')\n", 593 | "plt.suptitle('Pair Plot: Pairwise relationships')\n", 594 | "plt.show()" 595 | ], 596 | "metadata": { 597 | "id": "WjYn8tEjgivQ", 598 | "colab": { 599 | "base_uri": "https://localhost:8080/", 600 | "height": 515 601 | }, 602 | "outputId": "4b506cdc-4ce0-4f16-bf15-a5234f39da97" 603 | }, 604 | "execution_count": 7, 605 | "outputs": [ 606 | { 607 | "output_type": "display_data", 608 | "data": { 609 | "text/plain": [ 610 | "
" 611 | ], 612 | "image/png": "\n" 613 | }, 614 | "metadata": {} 615 | } 616 | ] 617 | } 618 | ], 619 | "metadata": { 620 | "colab": { 621 | "provenance": [] 622 | }, 623 | "kernelspec": { 624 | "display_name": "Python 3", 625 | "name": "python3" 626 | }, 627 | "language_info": { 628 | "name": "python" 629 | } 630 | }, 631 | "nbformat": 4, 632 | "nbformat_minor": 0 633 | } 634 | -------------------------------------------------------------------------------- /Chapter 3/feedback_data.csv: -------------------------------------------------------------------------------- 1 | Your products are excellent. I really love the quality! However, delivery to my location in Los Angeles was a bit slow. abbey@email.com 2 | The customer service team in New York was helpful in resolving my issue. I appreciate the assistance. brian@email.com 3 | The new features in the latest release are fantastic! They have greatly improved the user experience in San Francisco. 4 | The product didn't meet my expectations, and I'm disappointed. I hope you can address the issues in Chicago. My email address is emailme@email.com 5 | -------------------------------------------------------------------------------- /Chapter 4/ch4_feedback_data.csv: -------------------------------------------------------------------------------- 1 | Your products are excellent. I really love the quality! However, delivery to my location in Los Angeles was a bit slow. abbey@email.com 2 | The customer service team in New York was helpful in resolving my issue. I appreciate the assistance. brian@email.com 3 | The new features in the latest release are fantastic! They have greatly improved the user experience in San Francisco. 4 | The product didn't meet my expectations, and I'm disappointed. I hope you can address the issues in Chicago. My email address is emailme@email.com 5 | Your products are not good. I really don't like the quality! However, delivery to my location in Los Angeles was a bit slow. abbey@email.com 6 | The customer service team in San Francisco was not helpful in resolving my issue. But, I appreciate the assistance. brian@email.com 7 | The new features in the latest release are not fantastic! They have not greatly improved the user experience in San Francisco. 8 | The product didn't meet my expectations, and I'm disappointed. I hope you can address the issues in London. My email address is emailme@email.com 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | LinkedIn Learning Exercise Files License Agreement 2 | ================================================== 3 | 4 | This License Agreement (the "Agreement") is a binding legal agreement 5 | between you (as an individual or entity, as applicable) and LinkedIn 6 | Corporation (“LinkedIn”). By downloading or using the LinkedIn Learning 7 | exercise files in this repository (“Licensed Materials”), you agree to 8 | be bound by the terms of this Agreement. If you do not agree to these 9 | terms, do not download or use the Licensed Materials. 10 | 11 | 1. License. 12 | - a. Subject to the terms of this Agreement, LinkedIn hereby grants LinkedIn 13 | members during their LinkedIn Learning subscription a non-exclusive, 14 | non-transferable copyright license, for internal use only, to 1) make a 15 | reasonable number of copies of the Licensed Materials, and 2) make 16 | derivative works of the Licensed Materials for the sole purpose of 17 | practicing skills taught in LinkedIn Learning courses. 18 | - b. Distribution. Unless otherwise noted in the Licensed Materials, subject 19 | to the terms of this Agreement, LinkedIn hereby grants LinkedIn members 20 | with a LinkedIn Learning subscription a non-exclusive, non-transferable 21 | copyright license to distribute the Licensed Materials, except the 22 | Licensed Materials may not be included in any product or service (or 23 | otherwise used) to instruct or educate others. 24 | 25 | 2. Restrictions and Intellectual Property. 26 | - a. You may not to use, modify, copy, make derivative works of, publish, 27 | distribute, rent, lease, sell, sublicense, assign or otherwise transfer the 28 | Licensed Materials, except as expressly set forth above in Section 1. 29 | - b. Linkedin (and its licensors) retains its intellectual property rights 30 | in the Licensed Materials. Except as expressly set forth in Section 1, 31 | LinkedIn grants no licenses. 32 | - c. You indemnify LinkedIn and its licensors and affiliates for i) any 33 | alleged infringement or misappropriation of any intellectual property rights 34 | of any third party based on modifications you make to the Licensed Materials, 35 | ii) any claims arising from your use or distribution of all or part of the 36 | Licensed Materials and iii) a breach of this Agreement. You will defend, hold 37 | harmless, and indemnify LinkedIn and its affiliates (and our and their 38 | respective employees, shareholders, and directors) from any claim or action 39 | brought by a third party, including all damages, liabilities, costs and 40 | expenses, including reasonable attorneys’ fees, to the extent resulting from, 41 | alleged to have resulted from, or in connection with: (a) your breach of your 42 | obligations herein; or (b) your use or distribution of any Licensed Materials. 43 | 44 | 3. Open source. This code may include open source software, which may be 45 | subject to other license terms as provided in the files. 46 | 47 | 4. Warranty Disclaimer. LINKEDIN PROVIDES THE LICENSED MATERIALS ON AN “AS IS” 48 | AND “AS AVAILABLE” BASIS. LINKEDIN MAKES NO REPRESENTATION OR WARRANTY, 49 | WHETHER EXPRESS OR IMPLIED, ABOUT THE LICENSED MATERIALS, INCLUDING ANY 50 | REPRESENTATION THAT THE LICENSED MATERIALS WILL BE FREE OF ERRORS, BUGS OR 51 | INTERRUPTIONS, OR THAT THE LICENSED MATERIALS ARE ACCURATE, COMPLETE OR 52 | OTHERWISE VALID. TO THE FULLEST EXTENT PERMITTED BY LAW, LINKEDIN AND ITS 53 | AFFILIATES DISCLAIM ANY IMPLIED OR STATUTORY WARRANTY OR CONDITION, INCLUDING 54 | ANY IMPLIED WARRANTY OR CONDITION OF MERCHANTABILITY OR FITNESS FOR A 55 | PARTICULAR PURPOSE, AVAILABILITY, SECURITY, TITLE AND/OR NON-INFRINGEMENT. 56 | YOUR USE OF THE LICENSED MATERIALS IS AT YOUR OWN DISCRETION AND RISK, AND 57 | YOU WILL BE SOLELY RESPONSIBLE FOR ANY DAMAGE THAT RESULTS FROM USE OF THE 58 | LICENSED MATERIALS TO YOUR COMPUTER SYSTEM OR LOSS OF DATA. NO ADVICE OR 59 | INFORMATION, WHETHER ORAL OR WRITTEN, OBTAINED BY YOU FROM US OR THROUGH OR 60 | FROM THE LICENSED MATERIALS WILL CREATE ANY WARRANTY OR CONDITION NOT 61 | EXPRESSLY STATED IN THESE TERMS. 62 | 63 | 5. Limitation of Liability. LINKEDIN SHALL NOT BE LIABLE FOR ANY INDIRECT, 64 | INCIDENTAL, SPECIAL, PUNITIVE, CONSEQUENTIAL OR EXEMPLARY DAMAGES, INCLUDING 65 | BUT NOT LIMITED TO, DAMAGES FOR LOSS OF PROFITS, GOODWILL, USE, DATA OR OTHER 66 | INTANGIBLE LOSSES . IN NO EVENT WILL LINKEDIN'S AGGREGATE LIABILITY TO YOU 67 | EXCEED $100. THIS LIMITATION OF LIABILITY SHALL: 68 | - i. APPLY REGARDLESS OF WHETHER (A) YOU BASE YOUR CLAIM ON CONTRACT, TORT, 69 | STATUTE, OR ANY OTHER LEGAL THEORY, (B) WE KNEW OR SHOULD HAVE KNOWN ABOUT 70 | THE POSSIBILITY OF SUCH DAMAGES, OR (C) THE LIMITED REMEDIES PROVIDED IN THIS 71 | SECTION FAIL OF THEIR ESSENTIAL PURPOSE; AND 72 | - ii. NOT APPLY TO ANY DAMAGE THAT LINKEDIN MAY CAUSE YOU INTENTIONALLY OR 73 | KNOWINGLY IN VIOLATION OF THESE TERMS OR APPLICABLE LAW, OR AS OTHERWISE 74 | MANDATED BY APPLICABLE LAW THAT CANNOT BE DISCLAIMED IN THESE TERMS. 75 | 76 | 6. Termination. This Agreement automatically terminates upon your breach of 77 | this Agreement or termination of your LinkedIn Learning subscription. On 78 | termination, all licenses granted under this Agreement will terminate 79 | immediately and you will delete the Licensed Materials. Sections 2-7 of this 80 | Agreement survive any termination of this Agreement. LinkedIn may discontinue 81 | the availability of some or all of the Licensed Materials at any time for any 82 | reason. 83 | 84 | 7. Miscellaneous. This Agreement will be governed by and construed in 85 | accordance with the laws of the State of California without regard to conflict 86 | of laws principles. The exclusive forum for any disputes arising out of or 87 | relating to this Agreement shall be an appropriate federal or state court 88 | sitting in the County of Santa Clara, State of California. If LinkedIn does 89 | not act to enforce a breach of this Agreement, that does not mean that 90 | LinkedIn has waived its right to enforce this Agreement. The Agreement does 91 | not create a partnership, agency relationship, or joint venture between the 92 | parties. Neither party has the power or authority to bind the other or to 93 | create any obligation or responsibility on behalf of the other. You may not, 94 | without LinkedIn’s prior written consent, assign or delegate any rights or 95 | obligations under these terms, including in connection with a change of 96 | control. Any purported assignment and delegation shall be ineffective. The 97 | Agreement shall bind and inure to the benefit of the parties, their respective 98 | successors and permitted assigns. If any provision of the Agreement is 99 | unenforceable, that provision will be modified to render it enforceable to the 100 | extent possible to give effect to the parties’ intentions and the remaining 101 | provisions will not be affected. This Agreement is the only agreement between 102 | you and LinkedIn regarding the Licensed Materials, and supersedes all prior 103 | agreements relating to the Licensed Materials. 104 | 105 | Last Updated: March 2019 106 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | Copyright 2024 LinkedIn Corporation 2 | All Rights Reserved. 3 | 4 | Licensed under the LinkedIn Learning Exercise File License (the "License"). 5 | See LICENSE in the project root for license information. 6 | 7 | Please note, this project may automatically load third party code from external 8 | repositories (for example, NPM modules, Composer packages, or other dependencies). 9 | If so, such third party code may be subject to other license terms than as set 10 | forth above. In addition, such third party code may also depend on and load 11 | multiple tiers of dependencies. Please review the applicable licenses of the 12 | additional dependencies. 13 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Advanced NLP with Python for Machine Learning 2 | This is the repository for the LinkedIn Learning course `Advanced NLP with Python for Machine Learning`. The full course is available from [LinkedIn Learning][lil-course-url]. 3 | 4 | ![lil-thumbnail-url] 5 | 6 | This course is for anyone who wants to learn more advanced NLP methods. Instructor Gwendolyn Stripling, PhD, begins with a look at the fundamental concepts and principles of NLP, including the evolution and significance of natural language processing. She then reviews some NLP and Python basics—and introduces the NLP library spaCy—before jumping into more modern techniques and advancements in natural language processing using Transformer Models like GPT and BERT. Methods such as supervised fine-tuning, parameter efficient fine-tuning (PEFT), and retrieval-augmented generation (RAG) give you the foundational knowledge you need to improve large language model (LLM) performance. Learn the ways you can apply NLP in your applications and day-to-day, including how to analyze customer sentiments Each chapter ends with a challenge and solution, so you can test your knowledge as you go. 7 | 8 | 9 | ### Instructor 10 | 11 | ![avatar] 12 | 13 | Gwendolyn Stripling 14 | 15 | Machine Learning and Artificial Intelligence Content Developer 16 | 17 | 18 | Check out my other courses on [LinkedIn Learning](https://www.linkedin.com/learning/instructors/gwendolyn-stripling?u=104). 19 | 20 | 21 | [0]: # (Replace these placeholder URLs with actual course URLs) 22 | 23 | [lil-course-url]: https://www.linkedin.com/learning/advanced-nlp-with-python-for-machine-learning-revision-2024-q2 24 | [lil-thumbnail-url]: https://media.licdn.com/dms/image/D560DAQGTubKeur8sMA/learning-public-crop_675_1200/0/1716585791557?e=2147483647&v=beta&t=jYbEsmvdD9sYaBUmc_dtWym36Qn6-YkngoU68wEvvIc 25 | [avatar]: https://media.licdn.com/dms/image/D560DAQGkBuohyKFspw/learning-author-crop_200_200/0/1694723104809?e=1717192800&v=beta&t=UnBwX0YOnSKGAnsdvvkTlVRzdlSLPLTczeA8JacqFd0 26 | --------------------------------------------------------------------------------