├── README.md ├── LICENSE ├── pvr.txt ├── pvr_custom_ner_training.ipynb ├── pvr_training_data.json └── pvr_custom_ner_training2.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # custom-ner-with-spacy 2 | Custom Named Entity Recognition annotated using NER Annotated by tecoholic and Spacy for training the model 3 | 4 | ## Credits 5 | 6 | * NER Annotator - https://github.com/tecoholic/ner-annotator 7 | * `pvr.txt` - https://www.moneycontrol.com/ 8 | * `spacy` - https://github.com/explosion/spaCy 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2020 amrrs 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /pvr.txt: -------------------------------------------------------------------------------- 1 | The share price of PVR rose over 7 percent on Wednesday after the multiplex chain said that it has reduced losses in Q2 despite nil revenue from the core movie exhibition business. 2 | 3 | The company managed to get rent waivers from most landlords, CFO Nitin Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that,” he added. 4 | 5 | Sood further said that they have brought down the fixed cost down by almost 75-80 percent. 6 | 7 | The stock rose as much as 7.6 percent to the day's high of Rs 1,186.85 per share on the BSE. 8 | 9 | Meanwhile, for the September quarter, the company reported a consolidated net loss of Rs 184.06 crore versus a net profit of Rs 47.67 crore in the year ago quarter. 10 | 11 | Its total income was at Rs 110.61 crore during the quarter under review against Rs 979.40 crore in the corresponding quarter last fiscal. PVR's total expenses were at Rs 389.37 crore in July-September 2020-21. 12 | 13 | "Financial performance of the company for Q2, FY 21 was impacted by the continued lockdown announced due to COVID-19 outbreak, which disrupted the company's operations," said PVR in a post-earnings statement. 14 | 15 | PVR said its results were not comparable as business was impacted due to temporary closures of cinemas. 16 | 17 | Under Unlock 5.0 guidelines, the government has permitted cinemas to reopen October 15 onwards with 50 percent capacity. So far, 16 states and UTs, where PVR has a presence, have permitted cinemas to restart operations. Out of total of 831 screens of the company, over 575 have received permission to reopen, it said. 18 | 19 | PVR Ltd Chairman cum Managing Director Ajay Bijli said: "We are eagerly waiting for the reopening of other states, specifically Maharashtra and Telangana so that business can gradually get back to normal. We are taking all possible precautions so that both our customers and employees feel safe while visiting their favourite cinema." -------------------------------------------------------------------------------- /pvr_custom_ner_training.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "language_info": { 4 | "codemirror_mode": { 5 | "name": "ipython", 6 | "version": 3 7 | }, 8 | "file_extension": ".py", 9 | "mimetype": "text/x-python", 10 | "name": "python", 11 | "nbconvert_exporter": "python", 12 | "pygments_lexer": "ipython3", 13 | "version": "3.7.7-final" 14 | }, 15 | "orig_nbformat": 2, 16 | "kernelspec": { 17 | "name": "python37764bita3da3130c1a84ed9abf6da54fdeb0dc3", 18 | "display_name": "Python 3.7.7 64-bit" 19 | } 20 | }, 21 | "nbformat": 4, 22 | "nbformat_minor": 2, 23 | "cells": [ 24 | { 25 | "cell_type": "code", 26 | "execution_count": 2, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "import json\n", 31 | "import spacy" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 6, 37 | "metadata": {}, 38 | "outputs": [ 39 | { 40 | "output_type": "error", 41 | "ename": "FileNotFoundError", 42 | "evalue": "[Errno 2] No such file or directory: '~/Documents/pvr_training_data.json'", 43 | "traceback": [ 44 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 45 | "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", 46 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# load the training data\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'~/Documents/pvr_training_data.json'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mfp\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0mtraining_data\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mjson\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 47 | "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: '~/Documents/pvr_training_data.json'" 48 | ] 49 | } 50 | ], 51 | "source": [ 52 | "# load the training data\n", 53 | "with open('~/Documents/pvr_training_data.json') as fp:\n", 54 | " training_data = json.load(fp)" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": null, 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [] 63 | } 64 | ] 65 | } -------------------------------------------------------------------------------- /pvr_training_data.json: -------------------------------------------------------------------------------- 1 | {"classes":["NAME","COMPANY","ROLE","PERCENTAGE","WEEKDAY","MONEY","TIME PERIOD","STATES"],"annotations":[["The share price of PVR rose over 7 percent on Wednesday after the multiplex chain said that it has reduced losses in Q2 despite nil revenue from the core movie exhibition business.",{"entities":[[19,22,"COMPANY"],[33,42,"PERCENTAGE"],[46,55,"WEEKDAY"]]}],["The company managed to get rent waivers from most landlords, CFO Nitin Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that, ” he added.",{"entities":[[61,64,"ROLE"],[65,75,"NAME"]]}],["Sood further said that they have brought down the fixed cost down by almost 75-80 percent.",{"entities":[[0,4,"NAME"],[76,89,"PERCENTAGE"]]}],["The stock rose as much as 7.6 percent to the day's high of Rs 1,186.85 per share on the BSE.",{"entities":[[26,37,"PERCENTAGE"],[59,70,"PERCENTAGE"],[88,91,"COMPANY"]]}],["Meanwhile, for the September quarter, the company reported a consolidated net loss of Rs 184.06 crore versus a net profit of Rs 47.67 crore in the year ago quarter.",{"entities":[[86,101,"MONEY"],[125,139,"MONEY"]]}],["",{"entities":[]}],["Its total income was at Rs 110.61 crore during the quarter under review against Rs 979.40 crore in the corresponding quarter last fiscal. PVR's total expenses were at Rs 389.37 crore in July-September 2020-21.",{"entities":[[24,39,"MONEY"],[80,95,"MONEY"],[117,137,"TIME PERIOD"],[138,141,"COMPANY"],[167,182,"MONEY"],[186,208,"TIME PERIOD"]]}],["\" Financial performance of the company for Q2, FY 21 was impacted by the continued lockdown announced due to COVID-19 outbreak, which disrupted the company's operations, \" said PVR in a post-earnings statement.",{"entities":[[42,44,"TIME PERIOD"],[46,51,"TIME PERIOD"]]}],["PVR said its results were not comparable as business was impacted due to temporary closures of cinemas.",{"entities":[]}],["",{"entities":[]}],["Under Unlock 5.0 guidelines, the government has permitted cinemas to reopen October 15 onwards with 50 percent capacity. So far , 16 states and UTs, where PVR has a presence, have permitted cinemas to restart operations. Out of total of 831 screens of the company, over 575 have received permission to reopen, it said.",{"entities":[[76,86,"TIME PERIOD"],[100,110,"PERCENTAGE"],[154,157,"COMPANY"]]}],["PVR Ltd Chairman cum Managing Director Ajay Bijli said: \" We are eagerly waiting for the reopening of other states, specifically Maharashtra and Telangana so that business can gradually get back to normal. We are taking all possible precautions so that both our customers and employees feel safe while visiting their favourite cinema . \"",{"entities":[[8,16,"ROLE"],[21,38,"ROLE"],[39,49,"NAME"],[128,139,"ROLE"],[144,153,"ROLE"]]}]]} -------------------------------------------------------------------------------- /pvr_custom_ner_training2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "language_info": { 4 | "codemirror_mode": { 5 | "name": "ipython", 6 | "version": 3 7 | }, 8 | "file_extension": ".py", 9 | "mimetype": "text/x-python", 10 | "name": "python", 11 | "nbconvert_exporter": "python", 12 | "pygments_lexer": "ipython3", 13 | "version": "3.7.7-final" 14 | }, 15 | "orig_nbformat": 2, 16 | "kernelspec": { 17 | "name": "python37764bita3da3130c1a84ed9abf6da54fdeb0dc3", 18 | "display_name": "Python 3.7.7 64-bit" 19 | } 20 | }, 21 | "nbformat": 4, 22 | "nbformat_minor": 2, 23 | "cells": [ 24 | { 25 | "cell_type": "code", 26 | "execution_count": 3, 27 | "metadata": {}, 28 | "outputs": [], 29 | "source": [ 30 | "import json\n", 31 | "import spacy" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 38, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "# load the training data\n", 41 | "with open('pvr_training_data.json') as fp:\n", 42 | " training_data = json.load(fp)" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 39, 48 | "metadata": {}, 49 | "outputs": [ 50 | { 51 | "output_type": "execute_result", 52 | "data": { 53 | "text/plain": [ 54 | "{'classes': ['NAME',\n", 55 | " 'COMPANY',\n", 56 | " 'ROLE',\n", 57 | " 'PERCENTAGE',\n", 58 | " 'WEEKDAY',\n", 59 | " 'MONEY',\n", 60 | " 'TIME PERIOD',\n", 61 | " 'STATES'],\n", 62 | " 'annotations': [['The share price of PVR rose over 7 percent on Wednesday after the multiplex chain said that it has reduced losses in Q2 despite nil revenue from the core movie exhibition business.',\n", 63 | " {'entities': [[19, 22, 'COMPANY'],\n", 64 | " [33, 42, 'PERCENTAGE'],\n", 65 | " [46, 55, 'WEEKDAY']]}],\n", 66 | " ['The company managed to get rent waivers from most landlords, CFO Nitin Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that, ” he added.',\n", 67 | " {'entities': [[61, 64, 'ROLE'], [65, 75, 'NAME']]}],\n", 68 | " ['Sood further said that they have brought down the fixed cost down by almost 75-80 percent.',\n", 69 | " {'entities': [[0, 4, 'NAME'], [76, 89, 'PERCENTAGE']]}],\n", 70 | " [\"The stock rose as much as 7.6 percent to the day's high of Rs 1,186.85 per share on the BSE.\",\n", 71 | " {'entities': [[26, 37, 'PERCENTAGE'],\n", 72 | " [59, 70, 'PERCENTAGE'],\n", 73 | " [88, 91, 'COMPANY']]}],\n", 74 | " ['Meanwhile, for the September quarter, the company reported a consolidated net loss of Rs 184.06 crore versus a net profit of Rs 47.67 crore in the year ago quarter.',\n", 75 | " {'entities': [[86, 101, 'MONEY'], [125, 139, 'MONEY']]}],\n", 76 | " ['', {'entities': []}],\n", 77 | " [\"Its total income was at Rs 110.61 crore during the quarter under review against Rs 979.40 crore in the corresponding quarter last fiscal. PVR's total expenses were at Rs 389.37 crore in July-September 2020-21.\",\n", 78 | " {'entities': [[24, 39, 'MONEY'],\n", 79 | " [80, 95, 'MONEY'],\n", 80 | " [117, 137, 'TIME PERIOD'],\n", 81 | " [138, 141, 'COMPANY'],\n", 82 | " [167, 182, 'MONEY'],\n", 83 | " [186, 208, 'TIME PERIOD']]}],\n", 84 | " ['\" Financial performance of the company for Q2, FY 21 was impacted by the continued lockdown announced due to COVID-19 outbreak, which disrupted the company\\'s operations, \" said PVR in a post-earnings statement.',\n", 85 | " {'entities': [[42, 44, 'TIME PERIOD'], [46, 51, 'TIME PERIOD']]}],\n", 86 | " ['PVR said its results were not comparable as business was impacted due to temporary closures of cinemas.',\n", 87 | " {'entities': []}],\n", 88 | " ['', {'entities': []}],\n", 89 | " ['Under Unlock 5.0 guidelines, the government has permitted cinemas to reopen October 15 onwards with 50 percent capacity. So far , 16 states and UTs, where PVR has a presence, have permitted cinemas to restart operations. Out of total of 831 screens of the company, over 575 have received permission to reopen, it said.',\n", 90 | " {'entities': [[76, 86, 'TIME PERIOD'],\n", 91 | " [100, 110, 'PERCENTAGE'],\n", 92 | " [154, 157, 'COMPANY']]}],\n", 93 | " ['PVR Ltd Chairman cum Managing Director Ajay Bijli said: \" We are eagerly waiting for the reopening of other states, specifically Maharashtra and Telangana so that business can gradually get back to normal. We are taking all possible precautions so that both our customers and employees feel safe while visiting their favourite cinema . \"',\n", 94 | " {'entities': [[8, 16, 'ROLE'],\n", 95 | " [21, 38, 'ROLE'],\n", 96 | " [39, 49, 'NAME'],\n", 97 | " [128, 139, 'ROLE'],\n", 98 | " [144, 153, 'ROLE']]}]]}" 99 | ] 100 | }, 101 | "metadata": {}, 102 | "execution_count": 39 103 | } 104 | ], 105 | "source": [ 106 | "training_data" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 40, 112 | "metadata": {}, 113 | "outputs": [], 114 | "source": [ 115 | "# prepare an empty model to train\n", 116 | "nlp = spacy.blank('en')\n", 117 | "nlp.vocab.vectors.name = 'demo'\n", 118 | "ner = nlp.create_pipe('ner')\n", 119 | "nlp.add_pipe(ner, last = True)" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 51, 125 | "metadata": {}, 126 | "outputs": [], 127 | "source": [ 128 | "# Add the custome NER Tags as entities into the model\n", 129 | "for label in training_data[\"classes\"]:\n", 130 | " nlp.entity.add_label(label)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 52, 136 | "metadata": {}, 137 | "outputs": [], 138 | "source": [ 139 | "# Train the model\n", 140 | "optimizer = nlp.begin_training()\n" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 26, 146 | "metadata": {}, 147 | "outputs": [ 148 | { 149 | "output_type": "execute_result", 150 | "data": { 151 | "text/plain": [ 152 | "[['The share price of PVR rose over 7 percent on Wednesday after the multiplex chain said that it has reduced losses in Q2 despite nil revenue from the core movie exhibition business.',\n", 153 | " {'entities': [[19, 22, 'COMPANY'],\n", 154 | " [33, 42, 'PERCENTAGE'],\n", 155 | " [46, 55, 'WEEKDAY']]}],\n", 156 | " ['The company managed to get rent waivers from most landlords, CFO Nitin Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that, ” he added.',\n", 157 | " {'entities': [[61, 64, 'ROLE'], [65, 75, 'NAME']]}],\n", 158 | " ['Sood further said that they have brought down the fixed cost down by almost 75-80 percent.',\n", 159 | " {'entities': [[0, 4, 'NAME'], [76, 89, 'PERCENTAGE']]}],\n", 160 | " [\"The stock rose as much as 7.6 percent to the day's high of Rs 1,186.85 per share on the BSE.\",\n", 161 | " {'entities': [[26, 37, 'PERCENTAGE'],\n", 162 | " [59, 70, 'PERCENTAGE'],\n", 163 | " [88, 91, 'COMPANY']]}],\n", 164 | " ['Meanwhile, for the September quarter, the company reported a consolidated net loss of Rs 184.06 crore versus a net profit of Rs 47.67 crore in the year ago quarter.',\n", 165 | " {'entities': [[86, 101, 'MONEY'], [125, 139, 'MONEY']]}],\n", 166 | " ['', {'entities': []}],\n", 167 | " [\"Its total income was at Rs 110.61 crore during the quarter under review against Rs 979.40 crore in the corresponding quarter last fiscal. PVR's total expenses were at Rs 389.37 crore in July-September 2020-21.\",\n", 168 | " {'entities': [[24, 39, 'MONEY'],\n", 169 | " [80, 95, 'MONEY'],\n", 170 | " [117, 137, 'TIME PERIOD'],\n", 171 | " [138, 141, 'COMPANY'],\n", 172 | " [167, 182, 'MONEY'],\n", 173 | " [186, 208, 'TIME PERIOD']]}],\n", 174 | " ['\" Financial performance of the company for Q2, FY 21 was impacted by the continued lockdown announced due to COVID-19 outbreak, which disrupted the company\\'s operations, \" said PVR in a post-earnings statement.',\n", 175 | " {'entities': [[42, 44, 'TIME PERIOD'], [46, 51, 'TIME PERIOD']]}],\n", 176 | " ['PVR said its results were not comparable as business was impacted due to temporary closures of cinemas.',\n", 177 | " {'entities': []}],\n", 178 | " ['', {'entities': []}],\n", 179 | " ['Under Unlock 5.0 guidelines, the government has permitted cinemas to reopen October 15 onwards with 50 percent capacity. So far , 16 states and UTs, where PVR has a presence, have permitted cinemas to restart operations. Out of total of 831 screens of the company, over 575 have received permission to reopen, it said.',\n", 180 | " {'entities': [[76, 86, 'TIME PERIOD'],\n", 181 | " [100, 110, 'PERCENTAGE'],\n", 182 | " [154, 157, 'COMPANY']]}],\n", 183 | " ['PVR Ltd Chairman cum Managing Director Ajay Bijli said: \" We are eagerly waiting for the reopening of other states, specifically Maharashtra and Telangana so that business can gradually get back to normal. We are taking all possible precautions so that both our customers and employees feel safe while visiting their favourite cinema . \"',\n", 184 | " {'entities': [[8, 16, 'ROLE'],\n", 185 | " [21, 38, 'ROLE'],\n", 186 | " [39, 49, 'NAME'],\n", 187 | " [128, 139, 'ROLE'],\n", 188 | " [144, 153, 'ROLE']]}]]" 189 | ] 190 | }, 191 | "metadata": {}, 192 | "execution_count": 26 193 | } 194 | ], 195 | "source": [ 196 | "training_data[\"annotations\"]" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 57, 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "for text, annotations in training_data[\"annotations\"]:\n", 206 | " if len(text) > 0:\n", 207 | " nlp.update([text], [annotations], sgd=optimizer)" 208 | ] 209 | }, 210 | { 211 | "cell_type": "code", 212 | "execution_count": 31, 213 | "metadata": {}, 214 | "outputs": [ 215 | { 216 | "output_type": "stream", 217 | "name": "stdout", 218 | "text": [ 219 | "Collecting en_core_web_sm==2.3.1 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz#egg=en_core_web_sm==2.3.1\n", 220 | "\u001b[?25l Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.3.1/en_core_web_sm-2.3.1.tar.gz (12.0MB)\n", 221 | "\u001b[K |████████████████████████████████| 12.1MB 727kB/s \n", 222 | "\u001b[?25hRequirement already satisfied: spacy<2.4.0,>=2.3.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from en_core_web_sm==2.3.1) (2.3.4)\n", 223 | "Requirement already satisfied: thinc<7.5.0,>=7.4.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (7.4.3)\n", 224 | "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.0.4)\n", 225 | "Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (41.2.0)\n", 226 | "Requirement already satisfied: blis<0.8.0,>=0.4.0; python_version >= \"3.6\" in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.7.3)\n", 227 | "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (4.46.0)\n", 228 | "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.24.0)\n", 229 | "Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.0)\n", 230 | "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.0.4)\n", 231 | "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.4)\n", 232 | "Requirement already satisfied: plac<1.2.0,>=0.9.6 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.1.3)\n", 233 | "Requirement already satisfied: numpy>=1.15.0 in /Users/ars/Library/Python/3.7/lib/python/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.18.4)\n", 234 | "Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.0.4)\n", 235 | "Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (0.8.0)\n", 236 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /Users/ars/Library/Python/3.7/lib/python/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.25.9)\n", 237 | "Requirement already satisfied: idna<3,>=2.5 in /Users/ars/Library/Python/3.7/lib/python/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2.9)\n", 238 | "Requirement already satisfied: chardet<4,>=3.0.2 in /Users/ars/Library/Python/3.7/lib/python/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.0.4)\n", 239 | "Requirement already satisfied: certifi>=2017.4.17 in /Users/ars/Library/Python/3.7/lib/python/site-packages (from requests<3.0.0,>=2.13.0->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (2020.4.5.1)\n", 240 | "Requirement already satisfied: importlib-metadata>=0.20; python_version < \"3.8\" in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (1.1.3)\n", 241 | "Requirement already satisfied: zipp>=0.5 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from importlib-metadata>=0.20; python_version < \"3.8\"->catalogue<1.1.0,>=0.0.7->spacy<2.4.0,>=2.3.0->en_core_web_sm==2.3.1) (3.1.0)\n", 242 | "Building wheels for collected packages: en-core-web-sm\n", 243 | " Building wheel for en-core-web-sm (setup.py) ... \u001b[?25ldone\n", 244 | "\u001b[?25h Created wheel for en-core-web-sm: filename=en_core_web_sm-2.3.1-cp37-none-any.whl size=12047110 sha256=09300e1ce2738e32b12a1e25fcc9381fa864c7832799b9f08e9bfe31f3ceba30\n", 245 | " Stored in directory: /private/var/folders/rf/vq7cvscj7ks7pnn874v8tg3m0000gn/T/pip-ephem-wheel-cache-wmyxmlyg/wheels/2b/3f/41/f0b92863355c3ba34bb32b37d8a0c662959da0058202094f46\n", 246 | "Successfully built en-core-web-sm\n", 247 | "Installing collected packages: en-core-web-sm\n", 248 | "Successfully installed en-core-web-sm-2.3.1\n", 249 | "\u001b[33mWARNING: You are using pip version 19.2.3, however version 20.2.4 is available.\n", 250 | "You should consider upgrading via the 'pip install --upgrade pip' command.\u001b[0m\n", 251 | "\u001b[38;5;2m✔ Download and installation successful\u001b[0m\n", 252 | "You can now load the model via spacy.load('en_core_web_sm')\n" 253 | ] 254 | } 255 | ], 256 | "source": [ 257 | "! python3 -m spacy download en_core_web_sm\n" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 36, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [ 266 | "import en_core_web_sm\n", 267 | "\n", 268 | "nlp = en_core_web_sm.load()" 269 | ] 270 | }, 271 | { 272 | "cell_type": "code", 273 | "execution_count": 35, 274 | "metadata": {}, 275 | "outputs": [ 276 | { 277 | "output_type": "error", 278 | "ename": "OSError", 279 | "evalue": "[E050] Can't find model 'en_core_web_sm-2.3.1'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.", 280 | "traceback": [ 281 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 282 | "\u001b[0;31mOSError\u001b[0m Traceback (most recent call last)", 283 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mspacy\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mrandom\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m \u001b[0mnlp\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mspacy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'en_core_web_sm-2.3.1'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 284 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/spacy/__init__.py\u001b[0m in \u001b[0;36mload\u001b[0;34m(name, **overrides)\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdepr_path\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0mwarnings\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwarn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mWarnings\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mW001\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdepr_path\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mDeprecationWarning\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 30\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mutil\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_model\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0moverrides\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 31\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 32\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 285 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/spacy/util.py\u001b[0m in \u001b[0;36mload_model\u001b[0;34m(name, **overrides)\u001b[0m\n\u001b[1;32m 173\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0mhasattr\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"exists\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;31m# Path or Path-like to model data\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 174\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mload_model_from_path\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0moverrides\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 175\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mIOError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mErrors\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mE050\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 176\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 177\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 286 | "\u001b[0;31mOSError\u001b[0m: [E050] Can't find model 'en_core_web_sm-2.3.1'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory." 287 | ] 288 | } 289 | ], 290 | "source": [ 291 | "import spacy \n", 292 | "import random\n", 293 | "nlp = spacy.load('en_core_web_sm-2.3.1')" 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 37, 299 | "metadata": {}, 300 | "outputs": [ 301 | { 302 | "output_type": "stream", 303 | "name": "stdout", 304 | "text": [ 305 | "Modi ORG\nIndia GPE\nApple ORG\nGoogle NORP\niPhones WORK_OF_ART\nAmazon ORG\nAlexa ORG\nDot FAC\n" 306 | ] 307 | } 308 | ], 309 | "source": [ 310 | "nlp.entity.add_label('CELEBRITY')\n", 311 | "TRAIN_DATA = [\n", 312 | " (u\"Modi\", {\"entities\": [(0, 4, \"CELEBRITY\")]})]\n", 313 | "\n", 314 | "optimizer = nlp.begin_training()\n", 315 | "for i in range(20):\n", 316 | " random.shuffle(TRAIN_DATA)\n", 317 | " for text, annotations in TRAIN_DATA:\n", 318 | " nlp.update([text], [annotations],drop=0.3, sgd=optimizer)\n", 319 | "\n", 320 | "\n", 321 | "text = \"But Modi is starting India. The company made a late push\\ninto hardware, and Apple’s Siri and Google available on iPhones, and Amazon’s Alexa\\nsoftware, which runs on its Echo and Dot devices, have clear leads in\\nconsumer adoption.\"\n", 322 | "doc = nlp(text)\n", 323 | "for ent in doc.ents:\n", 324 | " print(ent.text,ent.label_)" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": 46, 330 | "metadata": {}, 331 | "outputs": [ 332 | { 333 | "output_type": "stream", 334 | "name": "stdout", 335 | "text": [ 336 | "Modi {'entities': [(0, 4, 'CELEBRITY')]}\n" 337 | ] 338 | } 339 | ], 340 | "source": [ 341 | "for text, annotations in TRAIN_DATA:\n", 342 | " print(text,annotations)" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": 45, 348 | "metadata": {}, 349 | "outputs": [ 350 | { 351 | "output_type": "execute_result", 352 | "data": { 353 | "text/plain": [ 354 | "[['Meanwhile, for the September quarter, the company reported a consolidated net loss of Rs 184.06 crore versus a net profit of Rs 47.67 crore in the year ago quarter.',\n", 355 | " {'entities': [[86, 101, 'MONEY'], [125, 139, 'MONEY']]}],\n", 356 | " ['Sood further said that they have brought down the fixed cost down by almost 75-80 percent.',\n", 357 | " {'entities': [[0, 4, 'NAME'], [76, 89, 'PERCENTAGE']]}],\n", 358 | " ['The company managed to get rent waivers from most landlords, CFO Nitin Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that, ” he added.',\n", 359 | " {'entities': [[61, 64, 'ROLE'], [65, 75, 'NAME']]}],\n", 360 | " ['PVR Ltd Chairman cum Managing Director Ajay Bijli said: \" We are eagerly waiting for the reopening of other states, specifically Maharashtra and Telangana so that business can gradually get back to normal. We are taking all possible precautions so that both our customers and employees feel safe while visiting their favourite cinema . \"',\n", 361 | " {'entities': [[8, 16, 'ROLE'],\n", 362 | " [21, 38, 'ROLE'],\n", 363 | " [39, 49, 'NAME'],\n", 364 | " [128, 139, 'ROLE'],\n", 365 | " [144, 153, 'ROLE']]}],\n", 366 | " ['', {'entities': []}],\n", 367 | " ['Under Unlock 5.0 guidelines, the government has permitted cinemas to reopen October 15 onwards with 50 percent capacity. So far , 16 states and UTs, where PVR has a presence, have permitted cinemas to restart operations. Out of total of 831 screens of the company, over 575 have received permission to reopen, it said.',\n", 368 | " {'entities': [[76, 86, 'TIME PERIOD'],\n", 369 | " [100, 110, 'PERCENTAGE'],\n", 370 | " [154, 157, 'COMPANY']]}],\n", 371 | " ['', {'entities': []}],\n", 372 | " ['The share price of PVR rose over 7 percent on Wednesday after the multiplex chain said that it has reduced losses in Q2 despite nil revenue from the core movie exhibition business.',\n", 373 | " {'entities': [[19, 22, 'COMPANY'],\n", 374 | " [33, 42, 'PERCENTAGE'],\n", 375 | " [46, 55, 'WEEKDAY']]}],\n", 376 | " ['\" Financial performance of the company for Q2, FY 21 was impacted by the continued lockdown announced due to COVID-19 outbreak, which disrupted the company\\'s operations, \" said PVR in a post-earnings statement.',\n", 377 | " {'entities': [[42, 44, 'TIME PERIOD'], [46, 51, 'TIME PERIOD']]}],\n", 378 | " ['PVR said its results were not comparable as business was impacted due to temporary closures of cinemas.',\n", 379 | " {'entities': []}],\n", 380 | " [\"Its total income was at Rs 110.61 crore during the quarter under review against Rs 979.40 crore in the corresponding quarter last fiscal. PVR's total expenses were at Rs 389.37 crore in July-September 2020-21.\",\n", 381 | " {'entities': [[24, 39, 'MONEY'],\n", 382 | " [80, 95, 'MONEY'],\n", 383 | " [117, 137, 'TIME PERIOD'],\n", 384 | " [138, 141, 'COMPANY'],\n", 385 | " [167, 182, 'MONEY'],\n", 386 | " [186, 208, 'TIME PERIOD']]}],\n", 387 | " [\"The stock rose as much as 7.6 percent to the day's high of Rs 1,186.85 per share on the BSE.\",\n", 388 | " {'entities': [[26, 37, 'PERCENTAGE'],\n", 389 | " [59, 70, 'PERCENTAGE'],\n", 390 | " [88, 91, 'COMPANY']]}]]" 391 | ] 392 | }, 393 | "metadata": {}, 394 | "execution_count": 45 395 | } 396 | ], 397 | "source": [ 398 | "training_data[\"annotations\"]" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 44, 404 | "metadata": {}, 405 | "outputs": [ 406 | { 407 | "output_type": "error", 408 | "ename": "ValueError", 409 | "evalue": "need at least one array to concatenate", 410 | "traceback": [ 411 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 412 | "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", 413 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mrandom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshuffle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtraining_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"annotations\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mtext\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mannotations\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mtraining_data\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m\"annotations\"\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mnlp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mtext\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mannotations\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0mdrop\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0.3\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msgd\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0moptimizer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 414 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/spacy/language.py\u001b[0m in \u001b[0;36mupdate\u001b[0;34m(self, docs, golds, drop, sgd, losses, component_cfg)\u001b[0m\n\u001b[1;32m 527\u001b[0m \u001b[0mkwargs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcomponent_cfg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 528\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msetdefault\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"drop\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdrop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 529\u001b[0;31m \u001b[0mproc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdocs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgolds\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msgd\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mget_grads\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlosses\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlosses\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 530\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mW\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdW\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mgrads\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 531\u001b[0m \u001b[0msgd\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mW\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdW\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkey\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 415 | "\u001b[0;32mnn_parser.pyx\u001b[0m in \u001b[0;36mspacy.syntax.nn_parser.Parser.update\u001b[0;34m()\u001b[0m\n", 416 | "\u001b[0;32m_parser_model.pyx\u001b[0m in \u001b[0;36mspacy.syntax._parser_model.ParserModel.begin_update\u001b[0;34m()\u001b[0m\n", 417 | "\u001b[0;32m_parser_model.pyx\u001b[0m in \u001b[0;36mspacy.syntax._parser_model.ParserStepModel.__init__\u001b[0;34m()\u001b[0m\n", 418 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/thinc/neural/_classes/feed_forward.py\u001b[0m in \u001b[0;36mbegin_update\u001b[0;34m(self, X, drop)\u001b[0m\n\u001b[1;32m 44\u001b[0m \u001b[0mcallbacks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 45\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mlayer\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_layers\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 46\u001b[0;31m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minc_layer_grad\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlayer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbegin_update\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdrop\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdrop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 47\u001b[0m \u001b[0mcallbacks\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mappend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minc_layer_grad\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 48\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 419 | "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/spacy/_ml.py\u001b[0m in \u001b[0;36mflatten\u001b[0;34m(seqs, drop)\u001b[0m\n\u001b[1;32m 798\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mops\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munflatten\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0md_X\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlengths\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpad\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 799\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 800\u001b[0;31m \u001b[0mX\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mops\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mflatten\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mseqs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpad\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 801\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfinish_update\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 802\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", 420 | "\u001b[0;32mops.pyx\u001b[0m in \u001b[0;36mthinc.neural.ops.Ops.flatten\u001b[0;34m()\u001b[0m\n", 421 | "\u001b[0;32m<__array_function__ internals>\u001b[0m in \u001b[0;36mconcatenate\u001b[0;34m(*args, **kwargs)\u001b[0m\n", 422 | "\u001b[0;31mValueError\u001b[0m: need at least one array to concatenate" 423 | ] 424 | } 425 | ], 426 | "source": [ 427 | "optimizer = nlp.begin_training()\n", 428 | "for i in range(20):\n", 429 | " random.shuffle(training_data[\"annotations\"])\n", 430 | " for text, annotations in training_data[\"annotations\"]:\n", 431 | " nlp.update([text], [annotations],drop=0.3, sgd=optimizer)" 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "execution_count": 56, 437 | "metadata": {}, 438 | "outputs": [ 439 | { 440 | "output_type": "stream", 441 | "name": "stdout", 442 | "text": [ 443 | "164 1\n90 1\n247 1\n337 1\n0 1\n318 1\n0 1\n180 1\n210 1\n103 1\n209 1\n92 1\n" 444 | ] 445 | } 446 | ], 447 | "source": [ 448 | "for text, annotations in training_data[\"annotations\"]:\n", 449 | " print(len(text),len([annotations]))" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": 59, 455 | "metadata": {}, 456 | "outputs": [], 457 | "source": [ 458 | "text = \"The share price of PVR rose over 7 percent on Wednesday after the multiplex chain said that it has reduced losses in Q2 despite nil revenue from the core movie exhibition business. The company managed to get rent waivers from most landlords, CFO Nitin Sood said in an interview to CNBC-TV18. “The big focus for us right now as revenues have been nil is to really reduce our fixed cost and we have managed to do that,” he added. Sood further said that they have brought down the fixed cost down by almost 75-80 percent. The stock rose as much as 7.6 percent to the day's high of Rs 1,186.85 per share on the BSE.Meanwhile, for the September quarter, the company reported a consolidated net loss of Rs 184.06 crore versus a net profit of Rs 47.67 crore in the year ago quarter. Its total income was at Rs 110.61 crore during the quarter under review against Rs 979.40 crore in the corresponding quarter last fiscal. PVR's total expenses were at Rs 389.37 crore in July-September 2020-21.Financial performance of the company for Q2, FY 21 was impacted by the continued lockdown announced due to COVID-19 outbreak, which disrupted the company's operations, said PVR in a post-earnings statement. PVR said its results were not comparable as business was impacted due to temporary closures of cinemas.Under Unlock 5.0 guidelines, the government has permitted cinemas to reopen October 15 onwards with 50 percent capacity. So far, 16 states and UTs, where PVR has a presence, have permitted cinemas to restart operations. Out of total of 831 screens of the company, over 575 have received permission to reopen, it said.PVR Ltd Chairman cum Managing Director Ajay Bijli said: We are eagerly waiting for the reopening of other states, specifically Maharashtra and Telangana so that business can gradually get back to normal. We are taking all possible precautions so that both our customers and employees feel safe while visiting their favourite cinema.\"" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": 60, 464 | "metadata": {}, 465 | "outputs": [ 466 | { 467 | "output_type": "stream", 468 | "name": "stdout", 469 | "text": [ 470 | "7 percent PERCENT\nWednesday DATE\nas much as 7.6 percent PERCENT\nday DATE\nRs 1,186.85 MONEY\nRs 184.06 MONEY\nRs 47.67 crore MONEY\nRs 110.61 MONEY\nRs 979.40 MONEY\nRs 389.37 MONEY\nOctober 15 DATE\n50 percent PERCENT\nManaging Director Ajay Bijli ORG\nMaharashtra ORG\nTelangana PERSON\n" 471 | ] 472 | } 473 | ], 474 | "source": [ 475 | "doc = nlp(text)\n", 476 | "for ent in doc.ents:\n", 477 | " print(ent.text,ent.label_)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "execution_count": null, 483 | "metadata": {}, 484 | "outputs": [], 485 | "source": [] 486 | } 487 | ] 488 | } --------------------------------------------------------------------------------