├── .github
└── dependabot.yml
├── Dockerfile
├── LICENSE
├── README.md
├── api
├── __pycache__
│ ├── main.cpython-38.pyc
│ └── summary.cpython-38.pyc
├── main.py
└── summary.py
├── docker-compose.yaml
├── prototype
└── app.py
├── requirements
└── requirements.txt
└── src
├── abstractive
├── BART_Text_Summarization.ipynb
├── LED_Text_Summarization.ipynb
├── Longformer2RoBerta_Text_Summarization.ipynb
├── Pegasus_Text_Summarization.ipynb
├── README.md
└── T5_Text_Summarization.ipynb
└── extractive
├── BERT_Extractive_Text_Summarization.ipynb
├── GPT2_Extractive_Text_Summarization.ipynb
├── README.md
└── XLNet_Extractive_Text_Summarization.ipynb
/.github/dependabot.yml:
--------------------------------------------------------------------------------
1 | # To get started with Dependabot version updates, you'll need to specify which
2 | # package ecosystems to update and where the package manifests are located.
3 | # Please see the documentation for all configuration options:
4 | # https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
5 |
6 | version: 2
7 | updates:
8 | - package-ecosystem: "pip" # See documentation for possible values
9 | directory: "/" # Location of package manifests
10 | schedule:
11 | interval: "daily"
12 |
13 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM python:3.8
2 |
3 | COPY ./requirements/requirements.txt ./requirements/requirements.txt
4 |
5 | RUN pip3 install -r requirements/requirements.txt
6 |
7 | COPY ./api /api/api
8 |
9 | ENV PYTHONPATH=/api
10 | WORKDIR /api
11 |
12 | EXPOSE 8000
13 |
14 | ENTRYPOINT ["uvicorn"]
15 | CMD ["api.main:app", "--host", "0.0.0.0", "--port", "8000"]
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2021 Ajinkya Naik
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Text-Summarization
2 | Abstractive and Extractive Text summarization Transformer model and API.
3 |
4 | # Project History
5 | I wanted to create an abstractive text summarization app as a tool to help in university studies. Researched and tried various models for text summarization including LSTMS and RNNs etc. The output was okay enough from a project point of view but not good enough for actual use case.
6 | Hence I decided to go with Transformers which produce good enough summary for real world use case. I used T5, Pegasus Longformer2RoBerta, BART and LED . According to my tests the models surprisingly, Pegasus produced better output than the other two. Longformer2RobBerta should have been the best model as it is meant to be used for summarization of long documents but the output produced wasn't upto the mark. BART and LED also gave decentish outputs. Overall Pegasus provided a good abstractive summary
7 |
8 | Also tried a few extractive based transformer models like BERT, GPT2, XLNet. The output was almost indistingushible from a human summary.
9 |
10 | # Project
11 | 1. 'src' directory contains 3 sub directories:
12 | - 'abstractive' which contains notebooks for T5, Pegasus, Longformer2RoBerta, BART and LED abstractive summarization models.
13 | - 'extractive' which contains BERT, GPT2 and XLNet extractive summarization models.
14 | 2. 'prototype' directory contains a web app prototype created using Streamlit framework (Used T5) for testing purposes. To run it locally:-
15 | - Git Clone repo
16 | - Go to 'prototype' directory, open command prompt there and run 'streamlit run app.py'
17 | 3. 'app' directory contains an API created for both Abstractive and Extractive (Pegasus and XLNet) summaries. To test API locally:
18 | - Run pip install -r requirements.txt
to install all dependencies
19 | - Open terminal in project directory and run uvicorn app.main:app --reload
20 | - After the application startup is completed, go to localhost:8000/docs
to try it out
21 |
22 | Note:-
23 | - API will soon be deployed to cloud for inference and then integrated into FLASK application as direct usage of transformer leads to timeout.
24 | - Dont copy paste 2 paras directly while testing. Remove all instances of new line so as to convert text to 1 continuous paragraph. Otherwise it will lead to Error 422.
25 |
26 | # Tech Used
27 | These are the libraries and technologies used or will be used in the project.
28 | 1. PyTorch
29 | 2. Transformers Library
30 | 3. Streamlit
31 | 4. Flask (Work in Progress)
32 | 5. FastAPI
33 |
34 | # To Do
35 | 1. ~~Create a web app using Flask and host on cloud platforms for easy usage.~~ (Done)
36 | 2. Build a chrome extension for use in web site (More portable and faster than web app). (WIP)
37 |
--------------------------------------------------------------------------------
/api/__pycache__/main.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aj-naik/Text-Summarization/70787a0d12d022ad931dd2efe99ee9f7108e5c1a/api/__pycache__/main.cpython-38.pyc
--------------------------------------------------------------------------------
/api/__pycache__/summary.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/aj-naik/Text-Summarization/70787a0d12d022ad931dd2efe99ee9f7108e5c1a/api/__pycache__/summary.cpython-38.pyc
--------------------------------------------------------------------------------
/api/main.py:
--------------------------------------------------------------------------------
1 | from fastapi import FastAPI
2 | from fastapi.middleware.cors import CORSMiddleware
3 | from pydantic import BaseModel
4 | from api.summary import Summary
5 | import uvicorn
6 |
7 | class Message(BaseModel):
8 | input: str
9 | output: str = None
10 |
11 | app = FastAPI()
12 | summary = Summary()
13 |
14 | @app.get('/')
15 | def get_root():
16 | return {'message': 'Welcome to the text summarization API'}
17 |
18 | @app.post("/abs-summary/")
19 | async def abssummary(message: Message):
20 | message.output = summary.abs_summary(text=message.input)
21 | return {"output" : message.output}
22 |
23 | @app.post("/ext-summary/")
24 | async def extsummary(message: Message):
25 | message.output = str(summary.ext_summary(message.input))
26 | return {"output" : message.output}
27 |
28 |
29 |
--------------------------------------------------------------------------------
/api/summary.py:
--------------------------------------------------------------------------------
1 |
2 | from transformers import PegasusForConditionalGeneration, PegasusTokenizer
3 | from transformers import pipeline
4 | import re
5 |
6 | class Summary:
7 | def __init__(self):
8 | self.sum_model = PegasusForConditionalGeneration.from_pretrained('google/pegasus-xsum') #use pegasus-large for actual pc and xsum for cloud
9 | self.sum_tokenizer = PegasusTokenizer.from_pretrained('google/pegasus-xsum')
10 |
11 | def abs_summary(self, text):
12 |
13 | text = text.strip('\n')
14 | text = text.strip('\t')
15 | batch = self.sum_tokenizer(text, truncation=True, padding='longest', return_tensors="pt")
16 | translated = self.sum_model.generate(**batch)
17 | tgt_text = self.sum_tokenizer.batch_decode(translated, skip_special_tokens=True)
18 | abs_result = tgt_text[0]
19 | return abs_result
20 |
21 | def ext_summary(self, text: str):
22 |
23 | text = text.replace('\n','')
24 | text = text.replace('\t','')
25 | summarizer = pipeline("summarization")
26 | summarized = summarizer(text, min_length=60)
27 | summary = summarized[0]
28 | ext_result = re.findall(r'"([^"]*)"', str(summary))
29 | return ext_result
--------------------------------------------------------------------------------
/docker-compose.yaml:
--------------------------------------------------------------------------------
1 | version: "3"
2 |
3 | services:
4 | summarization-api:
5 | build: .
6 | ports:
7 | - 8000:8000
--------------------------------------------------------------------------------
/prototype/app.py:
--------------------------------------------------------------------------------
1 | from os import truncate
2 | import streamlit as st
3 | import torch
4 | from transformers import AutoTokenizer, AutoModelWithLMHead
5 |
6 | def main():
7 | st.title("Text Summarization")
8 | st.markdown("AI web app to summarize text")
9 |
10 | text = st.text_area('Enter Text to Summarise', height=275)
11 |
12 | tokenizer = AutoTokenizer.from_pretrained('t5-base')
13 | model = AutoModelWithLMHead.from_pretrained('t5-base', return_dict = True)
14 |
15 | if st.button("Summarize"):
16 | inputs = tokenizer.encode("Summarize:" + text, return_tensors = 'pt', truncation = True)
17 |
18 | summary_ids = model.generate(inputs, min_length=20, length_penalty=5., num_beams=2)
19 |
20 | summary = tokenizer.decode(summary_ids[0])
21 |
22 | res = st.write(summary)
23 |
24 | if __name__ == '__main__':
25 | main()
--------------------------------------------------------------------------------
/requirements/requirements.txt:
--------------------------------------------------------------------------------
1 | transformers==4.26.1
2 | torch
3 |
4 | pydantic==1.10.6
5 | fastapi==0.93.0
6 | summarizer==0.0.7
7 | uvicorn
8 | gunicorn>=20.0.0
9 | sentencepiece
--------------------------------------------------------------------------------
/src/abstractive/Pegasus_Text_Summarization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "Pegasus-Text-Summarization.ipynb",
7 | "provenance": []
8 | },
9 | "kernelspec": {
10 | "name": "python3",
11 | "display_name": "Python 3"
12 | },
13 | "language_info": {
14 | "name": "python"
15 | },
16 | "accelerator": "GPU"
17 | },
18 | "cells": [
19 | {
20 | "cell_type": "code",
21 | "metadata": {
22 | "colab": {
23 | "base_uri": "https://localhost:8080/"
24 | },
25 | "id": "kIjeCW1oEtd_",
26 | "outputId": "e5285b46-db29-4dfe-a500-6822a41ae79d"
27 | },
28 | "source": [
29 | "!pip install transformers\n",
30 | "!pip install sentencepiece\n",
31 | "from transformers import PegasusForConditionalGeneration, PegasusTokenizer\n",
32 | "import torch"
33 | ],
34 | "execution_count": null,
35 | "outputs": [
36 | {
37 | "output_type": "stream",
38 | "text": [
39 | "Collecting transformers\n",
40 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/d8/b2/57495b5309f09fa501866e225c84532d1fd89536ea62406b2181933fb418/transformers-4.5.1-py3-none-any.whl (2.1MB)\n",
41 | "\u001b[K |████████████████████████████████| 2.1MB 5.7MB/s \n",
42 | "\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.19.5)\n",
43 | "Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.0.12)\n",
44 | "Collecting sacremoses\n",
45 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)\n",
46 | "\u001b[K |████████████████████████████████| 901kB 36.7MB/s \n",
47 | "\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20)\n",
48 | "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.41.1)\n",
49 | "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0)\n",
50 | "Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from transformers) (3.10.1)\n",
51 | "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers) (20.9)\n",
52 | "Collecting tokenizers<0.11,>=0.10.1\n",
53 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl (3.3MB)\n",
54 | "\u001b[K |████████████████████████████████| 3.3MB 36.0MB/s \n",
55 | "\u001b[?25hRequirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2)\n",
56 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0)\n",
57 | "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.0.1)\n",
58 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3)\n",
59 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2020.12.5)\n",
60 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4)\n",
61 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10)\n",
62 | "Requirement already satisfied: typing-extensions>=3.6.4; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->transformers) (3.7.4.3)\n",
63 | "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->transformers) (3.4.1)\n",
64 | "Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers) (2.4.7)\n",
65 | "Installing collected packages: sacremoses, tokenizers, transformers\n",
66 | "Successfully installed sacremoses-0.0.45 tokenizers-0.10.2 transformers-4.5.1\n",
67 | "Collecting sentencepiece\n",
68 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)\n",
69 | "\u001b[K |████████████████████████████████| 1.2MB 5.6MB/s \n",
70 | "\u001b[?25hInstalling collected packages: sentencepiece\n",
71 | "Successfully installed sentencepiece-0.1.95\n"
72 | ],
73 | "name": "stdout"
74 | }
75 | ]
76 | },
77 | {
78 | "cell_type": "code",
79 | "metadata": {
80 | "id": "xpgrdhXOFLia"
81 | },
82 | "source": [
83 | "src_text = [ \"In May, Churchill was still generally unpopular with many Conservatives and probably most of the Labour Party. Chamberlain \"\n",
84 | " \"remained Conservative Party leader until October when ill health forced his resignation. By that time, Churchill had won the \"\n",
85 | " \"doubters over and his succession as party leader was a formality.\"\n",
86 | " \" \"\n",
87 | " \"He began his premiership by forming a five-man war cabinet which included Chamberlain as Lord President of the Council, \"\n",
88 | " \"Labour leader Clement Attlee as Lord Privy Seal (later as Deputy Prime Minister), Halifax as Foreign Secretary and Labour's \"\n",
89 | " \"Arthur Greenwood as a minister without portfolio. In practice, these five were augmented by the service chiefs and ministers \"\n",
90 | " \"who attended the majority of meetings. The cabinet changed in size and membership as the war progressed, one of the key \"\n",
91 | " \"appointments being the leading trades unionist Ernest Bevin as Minister of Labour and National Service. In response to \"\n",
92 | " \"previous criticisms that there had been no clear single minister in charge of the prosecution of the war, Churchill created \"\n",
93 | " \"and took the additional position of Minister of Defence, making him the most powerful wartime Prime Minister in British \"\n",
94 | " \"history. He drafted outside experts into government to fulfil vital functions, especially on the Home Front. These included \"\n",
95 | " \"personal friends like Lord Beaverbrook and Frederick Lindemann, who became the government's scientific advisor.\"\n",
96 | " \" \"\n",
97 | " \"At the end of May, with the British Expeditionary Force in retreat to Dunkirk and the Fall of France seemingly imminent, \"\n",
98 | " \"Halifax proposed that the government should explore the possibility of a negotiated peace settlement using the still-neutral \"\n",
99 | " \"Mussolini as an intermediary. There were several high-level meetings from 26 to 28 May, including two with the French \"\n",
100 | " \"premier Paul Reynaud. Churchill's resolve was to fight on, even if France capitulated, but his position remained precarious \"\n",
101 | " \"until Chamberlain resolved to support him. Churchill had the full support of the two Labour members but knew he could not \"\n",
102 | " \"survive as Prime Minister if both Chamberlain and Halifax were against him. In the end, by gaining the support of his outer \"\n",
103 | " \"cabinet, Churchill outmanoeuvred Halifax and won Chamberlain over. Churchill believed that the only option was to fight on \"\n",
104 | " \"and his use of rhetoric hardened public opinion against a peaceful resolution and prepared the British people for a long war \"\n",
105 | " \"– Jenkins says Churchill's speeches were 'an inspiration for the nation, and a catharsis for Churchill himself'.\"\n",
106 | " \" \"\n",
107 | " \"His first speech as Prime Minister, delivered to the Commons on 13 May was the 'blood, toil, tears and sweat' speech. It was \"\n",
108 | " \"little more than a short statement but, Jenkins says, 'it included phrases which have reverberated down the decades'.\" ]"
109 | ],
110 | "execution_count": null,
111 | "outputs": []
112 | },
113 | {
114 | "cell_type": "code",
115 | "metadata": {
116 | "id": "xFwRLFkXFkI7"
117 | },
118 | "source": [
119 | "model_name = 'google/pegasus-large'\n",
120 | "# device = 'cuda' if torch.cuda.is_available() else 'cpu'"
121 | ],
122 | "execution_count": null,
123 | "outputs": []
124 | },
125 | {
126 | "cell_type": "code",
127 | "metadata": {
128 | "id": "jGoYmXVfFvzz"
129 | },
130 | "source": [
131 | "tokenizer = PegasusTokenizer.from_pretrained(model_name)\n",
132 | "model = PegasusForConditionalGeneration.from_pretrained(model_name)\n"
133 | ],
134 | "execution_count": null,
135 | "outputs": []
136 | },
137 | {
138 | "cell_type": "code",
139 | "metadata": {
140 | "id": "jKi4LHzeHADM"
141 | },
142 | "source": [
143 | "batch = tokenizer(src_text, truncation=True, padding='longest', return_tensors=\"pt\")\n",
144 | "translated = model.generate(**batch)\n",
145 | "tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)"
146 | ],
147 | "execution_count": null,
148 | "outputs": []
149 | },
150 | {
151 | "cell_type": "code",
152 | "metadata": {
153 | "colab": {
154 | "base_uri": "https://localhost:8080/",
155 | "height": 86
156 | },
157 | "id": "G_JTeEzDILZ1",
158 | "outputId": "d26d8761-33d3-40b9-b245-ca8b5b4e5efd"
159 | },
160 | "source": [
161 | "tgt_text[0]"
162 | ],
163 | "execution_count": null,
164 | "outputs": [
165 | {
166 | "output_type": "execute_result",
167 | "data": {
168 | "application/vnd.google.colaboratory.intrinsic+json": {
169 | "type": "string"
170 | },
171 | "text/plain": [
172 | "\"He began his premiership by forming a five-man war cabinet which included Chamberlain as Lord President of the Council, Labour leader Clement Attlee as Lord Privy Seal (later as Deputy Prime Minister), Halifax as Foreign Secretary and Labour's Arthur Greenwood as a minister without portfolio. Churchill believed that the only option was to fight on and his use of rhetoric hardened public opinion against a peaceful resolution and prepared the British people for a long war – Jenkins says Churchill's speeches were 'an inspiration for the nation, and a catharsis for Churchill himself'.\""
173 | ]
174 | },
175 | "metadata": {
176 | "tags": []
177 | },
178 | "execution_count": 16
179 | }
180 | ]
181 | },
182 | {
183 | "cell_type": "code",
184 | "metadata": {
185 | "id": "DuspwrP9LuIY"
186 | },
187 | "source": [
188 | ""
189 | ],
190 | "execution_count": null,
191 | "outputs": []
192 | }
193 | ]
194 | }
--------------------------------------------------------------------------------
/src/abstractive/README.md:
--------------------------------------------------------------------------------
1 | # Text-Summarization (T5, Pegasus, Longformer2RoBerta, BART and LED)
2 | Jupyter Notebooks containing use cases of pretrained models of T5, Pegasus, Longformer2RoBerta, BART and LED Transformers for Abstractive Text Summarization.
3 |
4 | Pegasus gives output by a reasonable margin compared to other models.
--------------------------------------------------------------------------------
/src/abstractive/T5_Text_Summarization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "T5-Text-Summarization.ipynb",
7 | "provenance": []
8 | },
9 | "kernelspec": {
10 | "name": "python3",
11 | "display_name": "Python 3"
12 | },
13 | "language_info": {
14 | "name": "python"
15 | },
16 | "widgets": {
17 | "application/vnd.jupyter.widget-state+json": {
18 | "e000043c0da4449b9df33f833ccef865": {
19 | "model_module": "@jupyter-widgets/controls",
20 | "model_name": "HBoxModel",
21 | "state": {
22 | "_view_name": "HBoxView",
23 | "_dom_classes": [],
24 | "_model_name": "HBoxModel",
25 | "_view_module": "@jupyter-widgets/controls",
26 | "_model_module_version": "1.5.0",
27 | "_view_count": null,
28 | "_view_module_version": "1.5.0",
29 | "box_style": "",
30 | "layout": "IPY_MODEL_a14ce6e10acf4d2b8d6ba6bee597c389",
31 | "_model_module": "@jupyter-widgets/controls",
32 | "children": [
33 | "IPY_MODEL_28e05434e4c940cca6bc5e7b1f296226",
34 | "IPY_MODEL_72f1e0fa775043dba4542cd93b5a5470"
35 | ]
36 | }
37 | },
38 | "a14ce6e10acf4d2b8d6ba6bee597c389": {
39 | "model_module": "@jupyter-widgets/base",
40 | "model_name": "LayoutModel",
41 | "state": {
42 | "_view_name": "LayoutView",
43 | "grid_template_rows": null,
44 | "right": null,
45 | "justify_content": null,
46 | "_view_module": "@jupyter-widgets/base",
47 | "overflow": null,
48 | "_model_module_version": "1.2.0",
49 | "_view_count": null,
50 | "flex_flow": null,
51 | "width": null,
52 | "min_width": null,
53 | "border": null,
54 | "align_items": null,
55 | "bottom": null,
56 | "_model_module": "@jupyter-widgets/base",
57 | "top": null,
58 | "grid_column": null,
59 | "overflow_y": null,
60 | "overflow_x": null,
61 | "grid_auto_flow": null,
62 | "grid_area": null,
63 | "grid_template_columns": null,
64 | "flex": null,
65 | "_model_name": "LayoutModel",
66 | "justify_items": null,
67 | "grid_row": null,
68 | "max_height": null,
69 | "align_content": null,
70 | "visibility": null,
71 | "align_self": null,
72 | "height": null,
73 | "min_height": null,
74 | "padding": null,
75 | "grid_auto_rows": null,
76 | "grid_gap": null,
77 | "max_width": null,
78 | "order": null,
79 | "_view_module_version": "1.2.0",
80 | "grid_template_areas": null,
81 | "object_position": null,
82 | "object_fit": null,
83 | "grid_auto_columns": null,
84 | "margin": null,
85 | "display": null,
86 | "left": null
87 | }
88 | },
89 | "28e05434e4c940cca6bc5e7b1f296226": {
90 | "model_module": "@jupyter-widgets/controls",
91 | "model_name": "FloatProgressModel",
92 | "state": {
93 | "_view_name": "ProgressView",
94 | "style": "IPY_MODEL_6dc7555fa342465bb6a0afcabcd35537",
95 | "_dom_classes": [],
96 | "description": "Downloading: 100%",
97 | "_model_name": "FloatProgressModel",
98 | "bar_style": "success",
99 | "max": 1199,
100 | "_view_module": "@jupyter-widgets/controls",
101 | "_model_module_version": "1.5.0",
102 | "value": 1199,
103 | "_view_count": null,
104 | "_view_module_version": "1.5.0",
105 | "orientation": "horizontal",
106 | "min": 0,
107 | "description_tooltip": null,
108 | "_model_module": "@jupyter-widgets/controls",
109 | "layout": "IPY_MODEL_04e937bc336e4823a883a1645ff2cd76"
110 | }
111 | },
112 | "72f1e0fa775043dba4542cd93b5a5470": {
113 | "model_module": "@jupyter-widgets/controls",
114 | "model_name": "HTMLModel",
115 | "state": {
116 | "_view_name": "HTMLView",
117 | "style": "IPY_MODEL_18f7a33467aa4d64a32c9de0e86a1b65",
118 | "_dom_classes": [],
119 | "description": "",
120 | "_model_name": "HTMLModel",
121 | "placeholder": "",
122 | "_view_module": "@jupyter-widgets/controls",
123 | "_model_module_version": "1.5.0",
124 | "value": " 1.20k/1.20k [00:00<00:00, 1.65kB/s]",
125 | "_view_count": null,
126 | "_view_module_version": "1.5.0",
127 | "description_tooltip": null,
128 | "_model_module": "@jupyter-widgets/controls",
129 | "layout": "IPY_MODEL_c33aaa6e71d24bf48968bab3729c3bd5"
130 | }
131 | },
132 | "6dc7555fa342465bb6a0afcabcd35537": {
133 | "model_module": "@jupyter-widgets/controls",
134 | "model_name": "ProgressStyleModel",
135 | "state": {
136 | "_view_name": "StyleView",
137 | "_model_name": "ProgressStyleModel",
138 | "description_width": "initial",
139 | "_view_module": "@jupyter-widgets/base",
140 | "_model_module_version": "1.5.0",
141 | "_view_count": null,
142 | "_view_module_version": "1.2.0",
143 | "bar_color": null,
144 | "_model_module": "@jupyter-widgets/controls"
145 | }
146 | },
147 | "04e937bc336e4823a883a1645ff2cd76": {
148 | "model_module": "@jupyter-widgets/base",
149 | "model_name": "LayoutModel",
150 | "state": {
151 | "_view_name": "LayoutView",
152 | "grid_template_rows": null,
153 | "right": null,
154 | "justify_content": null,
155 | "_view_module": "@jupyter-widgets/base",
156 | "overflow": null,
157 | "_model_module_version": "1.2.0",
158 | "_view_count": null,
159 | "flex_flow": null,
160 | "width": null,
161 | "min_width": null,
162 | "border": null,
163 | "align_items": null,
164 | "bottom": null,
165 | "_model_module": "@jupyter-widgets/base",
166 | "top": null,
167 | "grid_column": null,
168 | "overflow_y": null,
169 | "overflow_x": null,
170 | "grid_auto_flow": null,
171 | "grid_area": null,
172 | "grid_template_columns": null,
173 | "flex": null,
174 | "_model_name": "LayoutModel",
175 | "justify_items": null,
176 | "grid_row": null,
177 | "max_height": null,
178 | "align_content": null,
179 | "visibility": null,
180 | "align_self": null,
181 | "height": null,
182 | "min_height": null,
183 | "padding": null,
184 | "grid_auto_rows": null,
185 | "grid_gap": null,
186 | "max_width": null,
187 | "order": null,
188 | "_view_module_version": "1.2.0",
189 | "grid_template_areas": null,
190 | "object_position": null,
191 | "object_fit": null,
192 | "grid_auto_columns": null,
193 | "margin": null,
194 | "display": null,
195 | "left": null
196 | }
197 | },
198 | "18f7a33467aa4d64a32c9de0e86a1b65": {
199 | "model_module": "@jupyter-widgets/controls",
200 | "model_name": "DescriptionStyleModel",
201 | "state": {
202 | "_view_name": "StyleView",
203 | "_model_name": "DescriptionStyleModel",
204 | "description_width": "",
205 | "_view_module": "@jupyter-widgets/base",
206 | "_model_module_version": "1.5.0",
207 | "_view_count": null,
208 | "_view_module_version": "1.2.0",
209 | "_model_module": "@jupyter-widgets/controls"
210 | }
211 | },
212 | "c33aaa6e71d24bf48968bab3729c3bd5": {
213 | "model_module": "@jupyter-widgets/base",
214 | "model_name": "LayoutModel",
215 | "state": {
216 | "_view_name": "LayoutView",
217 | "grid_template_rows": null,
218 | "right": null,
219 | "justify_content": null,
220 | "_view_module": "@jupyter-widgets/base",
221 | "overflow": null,
222 | "_model_module_version": "1.2.0",
223 | "_view_count": null,
224 | "flex_flow": null,
225 | "width": null,
226 | "min_width": null,
227 | "border": null,
228 | "align_items": null,
229 | "bottom": null,
230 | "_model_module": "@jupyter-widgets/base",
231 | "top": null,
232 | "grid_column": null,
233 | "overflow_y": null,
234 | "overflow_x": null,
235 | "grid_auto_flow": null,
236 | "grid_area": null,
237 | "grid_template_columns": null,
238 | "flex": null,
239 | "_model_name": "LayoutModel",
240 | "justify_items": null,
241 | "grid_row": null,
242 | "max_height": null,
243 | "align_content": null,
244 | "visibility": null,
245 | "align_self": null,
246 | "height": null,
247 | "min_height": null,
248 | "padding": null,
249 | "grid_auto_rows": null,
250 | "grid_gap": null,
251 | "max_width": null,
252 | "order": null,
253 | "_view_module_version": "1.2.0",
254 | "grid_template_areas": null,
255 | "object_position": null,
256 | "object_fit": null,
257 | "grid_auto_columns": null,
258 | "margin": null,
259 | "display": null,
260 | "left": null
261 | }
262 | },
263 | "6def94fb658f47e59663e5ef80ca3b8d": {
264 | "model_module": "@jupyter-widgets/controls",
265 | "model_name": "HBoxModel",
266 | "state": {
267 | "_view_name": "HBoxView",
268 | "_dom_classes": [],
269 | "_model_name": "HBoxModel",
270 | "_view_module": "@jupyter-widgets/controls",
271 | "_model_module_version": "1.5.0",
272 | "_view_count": null,
273 | "_view_module_version": "1.5.0",
274 | "box_style": "",
275 | "layout": "IPY_MODEL_177f493826d3486aa640b5c6acba89f0",
276 | "_model_module": "@jupyter-widgets/controls",
277 | "children": [
278 | "IPY_MODEL_408845522cd9452c8da2c545f38f6b7f",
279 | "IPY_MODEL_837c2f05b7b24f5684d0ad1e4651cdd3"
280 | ]
281 | }
282 | },
283 | "177f493826d3486aa640b5c6acba89f0": {
284 | "model_module": "@jupyter-widgets/base",
285 | "model_name": "LayoutModel",
286 | "state": {
287 | "_view_name": "LayoutView",
288 | "grid_template_rows": null,
289 | "right": null,
290 | "justify_content": null,
291 | "_view_module": "@jupyter-widgets/base",
292 | "overflow": null,
293 | "_model_module_version": "1.2.0",
294 | "_view_count": null,
295 | "flex_flow": null,
296 | "width": null,
297 | "min_width": null,
298 | "border": null,
299 | "align_items": null,
300 | "bottom": null,
301 | "_model_module": "@jupyter-widgets/base",
302 | "top": null,
303 | "grid_column": null,
304 | "overflow_y": null,
305 | "overflow_x": null,
306 | "grid_auto_flow": null,
307 | "grid_area": null,
308 | "grid_template_columns": null,
309 | "flex": null,
310 | "_model_name": "LayoutModel",
311 | "justify_items": null,
312 | "grid_row": null,
313 | "max_height": null,
314 | "align_content": null,
315 | "visibility": null,
316 | "align_self": null,
317 | "height": null,
318 | "min_height": null,
319 | "padding": null,
320 | "grid_auto_rows": null,
321 | "grid_gap": null,
322 | "max_width": null,
323 | "order": null,
324 | "_view_module_version": "1.2.0",
325 | "grid_template_areas": null,
326 | "object_position": null,
327 | "object_fit": null,
328 | "grid_auto_columns": null,
329 | "margin": null,
330 | "display": null,
331 | "left": null
332 | }
333 | },
334 | "408845522cd9452c8da2c545f38f6b7f": {
335 | "model_module": "@jupyter-widgets/controls",
336 | "model_name": "FloatProgressModel",
337 | "state": {
338 | "_view_name": "ProgressView",
339 | "style": "IPY_MODEL_650303a960ec4610ac2463167d0f88ad",
340 | "_dom_classes": [],
341 | "description": "Downloading: 100%",
342 | "_model_name": "FloatProgressModel",
343 | "bar_style": "success",
344 | "max": 791656,
345 | "_view_module": "@jupyter-widgets/controls",
346 | "_model_module_version": "1.5.0",
347 | "value": 791656,
348 | "_view_count": null,
349 | "_view_module_version": "1.5.0",
350 | "orientation": "horizontal",
351 | "min": 0,
352 | "description_tooltip": null,
353 | "_model_module": "@jupyter-widgets/controls",
354 | "layout": "IPY_MODEL_708987ca3048462ba07c405879bb7f5f"
355 | }
356 | },
357 | "837c2f05b7b24f5684d0ad1e4651cdd3": {
358 | "model_module": "@jupyter-widgets/controls",
359 | "model_name": "HTMLModel",
360 | "state": {
361 | "_view_name": "HTMLView",
362 | "style": "IPY_MODEL_bc3fe2af97a04e829a5c35aba4fa89f0",
363 | "_dom_classes": [],
364 | "description": "",
365 | "_model_name": "HTMLModel",
366 | "placeholder": "",
367 | "_view_module": "@jupyter-widgets/controls",
368 | "_model_module_version": "1.5.0",
369 | "value": " 792k/792k [00:00<00:00, 1.80MB/s]",
370 | "_view_count": null,
371 | "_view_module_version": "1.5.0",
372 | "description_tooltip": null,
373 | "_model_module": "@jupyter-widgets/controls",
374 | "layout": "IPY_MODEL_e4b5af1b74b84caf8cf962f2c2d88f4a"
375 | }
376 | },
377 | "650303a960ec4610ac2463167d0f88ad": {
378 | "model_module": "@jupyter-widgets/controls",
379 | "model_name": "ProgressStyleModel",
380 | "state": {
381 | "_view_name": "StyleView",
382 | "_model_name": "ProgressStyleModel",
383 | "description_width": "initial",
384 | "_view_module": "@jupyter-widgets/base",
385 | "_model_module_version": "1.5.0",
386 | "_view_count": null,
387 | "_view_module_version": "1.2.0",
388 | "bar_color": null,
389 | "_model_module": "@jupyter-widgets/controls"
390 | }
391 | },
392 | "708987ca3048462ba07c405879bb7f5f": {
393 | "model_module": "@jupyter-widgets/base",
394 | "model_name": "LayoutModel",
395 | "state": {
396 | "_view_name": "LayoutView",
397 | "grid_template_rows": null,
398 | "right": null,
399 | "justify_content": null,
400 | "_view_module": "@jupyter-widgets/base",
401 | "overflow": null,
402 | "_model_module_version": "1.2.0",
403 | "_view_count": null,
404 | "flex_flow": null,
405 | "width": null,
406 | "min_width": null,
407 | "border": null,
408 | "align_items": null,
409 | "bottom": null,
410 | "_model_module": "@jupyter-widgets/base",
411 | "top": null,
412 | "grid_column": null,
413 | "overflow_y": null,
414 | "overflow_x": null,
415 | "grid_auto_flow": null,
416 | "grid_area": null,
417 | "grid_template_columns": null,
418 | "flex": null,
419 | "_model_name": "LayoutModel",
420 | "justify_items": null,
421 | "grid_row": null,
422 | "max_height": null,
423 | "align_content": null,
424 | "visibility": null,
425 | "align_self": null,
426 | "height": null,
427 | "min_height": null,
428 | "padding": null,
429 | "grid_auto_rows": null,
430 | "grid_gap": null,
431 | "max_width": null,
432 | "order": null,
433 | "_view_module_version": "1.2.0",
434 | "grid_template_areas": null,
435 | "object_position": null,
436 | "object_fit": null,
437 | "grid_auto_columns": null,
438 | "margin": null,
439 | "display": null,
440 | "left": null
441 | }
442 | },
443 | "bc3fe2af97a04e829a5c35aba4fa89f0": {
444 | "model_module": "@jupyter-widgets/controls",
445 | "model_name": "DescriptionStyleModel",
446 | "state": {
447 | "_view_name": "StyleView",
448 | "_model_name": "DescriptionStyleModel",
449 | "description_width": "",
450 | "_view_module": "@jupyter-widgets/base",
451 | "_model_module_version": "1.5.0",
452 | "_view_count": null,
453 | "_view_module_version": "1.2.0",
454 | "_model_module": "@jupyter-widgets/controls"
455 | }
456 | },
457 | "e4b5af1b74b84caf8cf962f2c2d88f4a": {
458 | "model_module": "@jupyter-widgets/base",
459 | "model_name": "LayoutModel",
460 | "state": {
461 | "_view_name": "LayoutView",
462 | "grid_template_rows": null,
463 | "right": null,
464 | "justify_content": null,
465 | "_view_module": "@jupyter-widgets/base",
466 | "overflow": null,
467 | "_model_module_version": "1.2.0",
468 | "_view_count": null,
469 | "flex_flow": null,
470 | "width": null,
471 | "min_width": null,
472 | "border": null,
473 | "align_items": null,
474 | "bottom": null,
475 | "_model_module": "@jupyter-widgets/base",
476 | "top": null,
477 | "grid_column": null,
478 | "overflow_y": null,
479 | "overflow_x": null,
480 | "grid_auto_flow": null,
481 | "grid_area": null,
482 | "grid_template_columns": null,
483 | "flex": null,
484 | "_model_name": "LayoutModel",
485 | "justify_items": null,
486 | "grid_row": null,
487 | "max_height": null,
488 | "align_content": null,
489 | "visibility": null,
490 | "align_self": null,
491 | "height": null,
492 | "min_height": null,
493 | "padding": null,
494 | "grid_auto_rows": null,
495 | "grid_gap": null,
496 | "max_width": null,
497 | "order": null,
498 | "_view_module_version": "1.2.0",
499 | "grid_template_areas": null,
500 | "object_position": null,
501 | "object_fit": null,
502 | "grid_auto_columns": null,
503 | "margin": null,
504 | "display": null,
505 | "left": null
506 | }
507 | },
508 | "4e3aedb9a7a3462194c640ed52370db4": {
509 | "model_module": "@jupyter-widgets/controls",
510 | "model_name": "HBoxModel",
511 | "state": {
512 | "_view_name": "HBoxView",
513 | "_dom_classes": [],
514 | "_model_name": "HBoxModel",
515 | "_view_module": "@jupyter-widgets/controls",
516 | "_model_module_version": "1.5.0",
517 | "_view_count": null,
518 | "_view_module_version": "1.5.0",
519 | "box_style": "",
520 | "layout": "IPY_MODEL_06049fe5455d4ca8a25d65b3b50b64e4",
521 | "_model_module": "@jupyter-widgets/controls",
522 | "children": [
523 | "IPY_MODEL_5fba47af158a4382914e76284402939d",
524 | "IPY_MODEL_426c9236a2fe41ecae28f48a915769d6"
525 | ]
526 | }
527 | },
528 | "06049fe5455d4ca8a25d65b3b50b64e4": {
529 | "model_module": "@jupyter-widgets/base",
530 | "model_name": "LayoutModel",
531 | "state": {
532 | "_view_name": "LayoutView",
533 | "grid_template_rows": null,
534 | "right": null,
535 | "justify_content": null,
536 | "_view_module": "@jupyter-widgets/base",
537 | "overflow": null,
538 | "_model_module_version": "1.2.0",
539 | "_view_count": null,
540 | "flex_flow": null,
541 | "width": null,
542 | "min_width": null,
543 | "border": null,
544 | "align_items": null,
545 | "bottom": null,
546 | "_model_module": "@jupyter-widgets/base",
547 | "top": null,
548 | "grid_column": null,
549 | "overflow_y": null,
550 | "overflow_x": null,
551 | "grid_auto_flow": null,
552 | "grid_area": null,
553 | "grid_template_columns": null,
554 | "flex": null,
555 | "_model_name": "LayoutModel",
556 | "justify_items": null,
557 | "grid_row": null,
558 | "max_height": null,
559 | "align_content": null,
560 | "visibility": null,
561 | "align_self": null,
562 | "height": null,
563 | "min_height": null,
564 | "padding": null,
565 | "grid_auto_rows": null,
566 | "grid_gap": null,
567 | "max_width": null,
568 | "order": null,
569 | "_view_module_version": "1.2.0",
570 | "grid_template_areas": null,
571 | "object_position": null,
572 | "object_fit": null,
573 | "grid_auto_columns": null,
574 | "margin": null,
575 | "display": null,
576 | "left": null
577 | }
578 | },
579 | "5fba47af158a4382914e76284402939d": {
580 | "model_module": "@jupyter-widgets/controls",
581 | "model_name": "FloatProgressModel",
582 | "state": {
583 | "_view_name": "ProgressView",
584 | "style": "IPY_MODEL_d6312e9620ff4397a56e9aa1bef17c1c",
585 | "_dom_classes": [],
586 | "description": "Downloading: 100%",
587 | "_model_name": "FloatProgressModel",
588 | "bar_style": "success",
589 | "max": 1389353,
590 | "_view_module": "@jupyter-widgets/controls",
591 | "_model_module_version": "1.5.0",
592 | "value": 1389353,
593 | "_view_count": null,
594 | "_view_module_version": "1.5.0",
595 | "orientation": "horizontal",
596 | "min": 0,
597 | "description_tooltip": null,
598 | "_model_module": "@jupyter-widgets/controls",
599 | "layout": "IPY_MODEL_282d0d86aa1e45c48f85951a5dcd9079"
600 | }
601 | },
602 | "426c9236a2fe41ecae28f48a915769d6": {
603 | "model_module": "@jupyter-widgets/controls",
604 | "model_name": "HTMLModel",
605 | "state": {
606 | "_view_name": "HTMLView",
607 | "style": "IPY_MODEL_9e83dbf68ab94874a0a60d95ff9cd27c",
608 | "_dom_classes": [],
609 | "description": "",
610 | "_model_name": "HTMLModel",
611 | "placeholder": "",
612 | "_view_module": "@jupyter-widgets/controls",
613 | "_model_module_version": "1.5.0",
614 | "value": " 1.39M/1.39M [00:56<00:00, 24.8kB/s]",
615 | "_view_count": null,
616 | "_view_module_version": "1.5.0",
617 | "description_tooltip": null,
618 | "_model_module": "@jupyter-widgets/controls",
619 | "layout": "IPY_MODEL_36afbf787e3845d7a8fe0898b5c87591"
620 | }
621 | },
622 | "d6312e9620ff4397a56e9aa1bef17c1c": {
623 | "model_module": "@jupyter-widgets/controls",
624 | "model_name": "ProgressStyleModel",
625 | "state": {
626 | "_view_name": "StyleView",
627 | "_model_name": "ProgressStyleModel",
628 | "description_width": "initial",
629 | "_view_module": "@jupyter-widgets/base",
630 | "_model_module_version": "1.5.0",
631 | "_view_count": null,
632 | "_view_module_version": "1.2.0",
633 | "bar_color": null,
634 | "_model_module": "@jupyter-widgets/controls"
635 | }
636 | },
637 | "282d0d86aa1e45c48f85951a5dcd9079": {
638 | "model_module": "@jupyter-widgets/base",
639 | "model_name": "LayoutModel",
640 | "state": {
641 | "_view_name": "LayoutView",
642 | "grid_template_rows": null,
643 | "right": null,
644 | "justify_content": null,
645 | "_view_module": "@jupyter-widgets/base",
646 | "overflow": null,
647 | "_model_module_version": "1.2.0",
648 | "_view_count": null,
649 | "flex_flow": null,
650 | "width": null,
651 | "min_width": null,
652 | "border": null,
653 | "align_items": null,
654 | "bottom": null,
655 | "_model_module": "@jupyter-widgets/base",
656 | "top": null,
657 | "grid_column": null,
658 | "overflow_y": null,
659 | "overflow_x": null,
660 | "grid_auto_flow": null,
661 | "grid_area": null,
662 | "grid_template_columns": null,
663 | "flex": null,
664 | "_model_name": "LayoutModel",
665 | "justify_items": null,
666 | "grid_row": null,
667 | "max_height": null,
668 | "align_content": null,
669 | "visibility": null,
670 | "align_self": null,
671 | "height": null,
672 | "min_height": null,
673 | "padding": null,
674 | "grid_auto_rows": null,
675 | "grid_gap": null,
676 | "max_width": null,
677 | "order": null,
678 | "_view_module_version": "1.2.0",
679 | "grid_template_areas": null,
680 | "object_position": null,
681 | "object_fit": null,
682 | "grid_auto_columns": null,
683 | "margin": null,
684 | "display": null,
685 | "left": null
686 | }
687 | },
688 | "9e83dbf68ab94874a0a60d95ff9cd27c": {
689 | "model_module": "@jupyter-widgets/controls",
690 | "model_name": "DescriptionStyleModel",
691 | "state": {
692 | "_view_name": "StyleView",
693 | "_model_name": "DescriptionStyleModel",
694 | "description_width": "",
695 | "_view_module": "@jupyter-widgets/base",
696 | "_model_module_version": "1.5.0",
697 | "_view_count": null,
698 | "_view_module_version": "1.2.0",
699 | "_model_module": "@jupyter-widgets/controls"
700 | }
701 | },
702 | "36afbf787e3845d7a8fe0898b5c87591": {
703 | "model_module": "@jupyter-widgets/base",
704 | "model_name": "LayoutModel",
705 | "state": {
706 | "_view_name": "LayoutView",
707 | "grid_template_rows": null,
708 | "right": null,
709 | "justify_content": null,
710 | "_view_module": "@jupyter-widgets/base",
711 | "overflow": null,
712 | "_model_module_version": "1.2.0",
713 | "_view_count": null,
714 | "flex_flow": null,
715 | "width": null,
716 | "min_width": null,
717 | "border": null,
718 | "align_items": null,
719 | "bottom": null,
720 | "_model_module": "@jupyter-widgets/base",
721 | "top": null,
722 | "grid_column": null,
723 | "overflow_y": null,
724 | "overflow_x": null,
725 | "grid_auto_flow": null,
726 | "grid_area": null,
727 | "grid_template_columns": null,
728 | "flex": null,
729 | "_model_name": "LayoutModel",
730 | "justify_items": null,
731 | "grid_row": null,
732 | "max_height": null,
733 | "align_content": null,
734 | "visibility": null,
735 | "align_self": null,
736 | "height": null,
737 | "min_height": null,
738 | "padding": null,
739 | "grid_auto_rows": null,
740 | "grid_gap": null,
741 | "max_width": null,
742 | "order": null,
743 | "_view_module_version": "1.2.0",
744 | "grid_template_areas": null,
745 | "object_position": null,
746 | "object_fit": null,
747 | "grid_auto_columns": null,
748 | "margin": null,
749 | "display": null,
750 | "left": null
751 | }
752 | },
753 | "7e5a9b5191df4eb8ba5938ec194f5c73": {
754 | "model_module": "@jupyter-widgets/controls",
755 | "model_name": "HBoxModel",
756 | "state": {
757 | "_view_name": "HBoxView",
758 | "_dom_classes": [],
759 | "_model_name": "HBoxModel",
760 | "_view_module": "@jupyter-widgets/controls",
761 | "_model_module_version": "1.5.0",
762 | "_view_count": null,
763 | "_view_module_version": "1.5.0",
764 | "box_style": "",
765 | "layout": "IPY_MODEL_f5385bc04faf4740912dd953f11743d8",
766 | "_model_module": "@jupyter-widgets/controls",
767 | "children": [
768 | "IPY_MODEL_646641e9a6804f8288e011c19548d600",
769 | "IPY_MODEL_0912e7c1972d4fd8b16ac0d9c49ea266"
770 | ]
771 | }
772 | },
773 | "f5385bc04faf4740912dd953f11743d8": {
774 | "model_module": "@jupyter-widgets/base",
775 | "model_name": "LayoutModel",
776 | "state": {
777 | "_view_name": "LayoutView",
778 | "grid_template_rows": null,
779 | "right": null,
780 | "justify_content": null,
781 | "_view_module": "@jupyter-widgets/base",
782 | "overflow": null,
783 | "_model_module_version": "1.2.0",
784 | "_view_count": null,
785 | "flex_flow": null,
786 | "width": null,
787 | "min_width": null,
788 | "border": null,
789 | "align_items": null,
790 | "bottom": null,
791 | "_model_module": "@jupyter-widgets/base",
792 | "top": null,
793 | "grid_column": null,
794 | "overflow_y": null,
795 | "overflow_x": null,
796 | "grid_auto_flow": null,
797 | "grid_area": null,
798 | "grid_template_columns": null,
799 | "flex": null,
800 | "_model_name": "LayoutModel",
801 | "justify_items": null,
802 | "grid_row": null,
803 | "max_height": null,
804 | "align_content": null,
805 | "visibility": null,
806 | "align_self": null,
807 | "height": null,
808 | "min_height": null,
809 | "padding": null,
810 | "grid_auto_rows": null,
811 | "grid_gap": null,
812 | "max_width": null,
813 | "order": null,
814 | "_view_module_version": "1.2.0",
815 | "grid_template_areas": null,
816 | "object_position": null,
817 | "object_fit": null,
818 | "grid_auto_columns": null,
819 | "margin": null,
820 | "display": null,
821 | "left": null
822 | }
823 | },
824 | "646641e9a6804f8288e011c19548d600": {
825 | "model_module": "@jupyter-widgets/controls",
826 | "model_name": "FloatProgressModel",
827 | "state": {
828 | "_view_name": "ProgressView",
829 | "style": "IPY_MODEL_387e7312c2864a02ba9de00fb8d1f34b",
830 | "_dom_classes": [],
831 | "description": "Downloading: 100%",
832 | "_model_name": "FloatProgressModel",
833 | "bar_style": "success",
834 | "max": 891691430,
835 | "_view_module": "@jupyter-widgets/controls",
836 | "_model_module_version": "1.5.0",
837 | "value": 891691430,
838 | "_view_count": null,
839 | "_view_module_version": "1.5.0",
840 | "orientation": "horizontal",
841 | "min": 0,
842 | "description_tooltip": null,
843 | "_model_module": "@jupyter-widgets/controls",
844 | "layout": "IPY_MODEL_209a666fbe3446c19456f140e22044f0"
845 | }
846 | },
847 | "0912e7c1972d4fd8b16ac0d9c49ea266": {
848 | "model_module": "@jupyter-widgets/controls",
849 | "model_name": "HTMLModel",
850 | "state": {
851 | "_view_name": "HTMLView",
852 | "style": "IPY_MODEL_02783989ed3a4ac49ee833498551f5d4",
853 | "_dom_classes": [],
854 | "description": "",
855 | "_model_name": "HTMLModel",
856 | "placeholder": "",
857 | "_view_module": "@jupyter-widgets/controls",
858 | "_model_module_version": "1.5.0",
859 | "value": " 892M/892M [00:33<00:00, 26.7MB/s]",
860 | "_view_count": null,
861 | "_view_module_version": "1.5.0",
862 | "description_tooltip": null,
863 | "_model_module": "@jupyter-widgets/controls",
864 | "layout": "IPY_MODEL_a3c164c2313647aab1b8fae1157f3d15"
865 | }
866 | },
867 | "387e7312c2864a02ba9de00fb8d1f34b": {
868 | "model_module": "@jupyter-widgets/controls",
869 | "model_name": "ProgressStyleModel",
870 | "state": {
871 | "_view_name": "StyleView",
872 | "_model_name": "ProgressStyleModel",
873 | "description_width": "initial",
874 | "_view_module": "@jupyter-widgets/base",
875 | "_model_module_version": "1.5.0",
876 | "_view_count": null,
877 | "_view_module_version": "1.2.0",
878 | "bar_color": null,
879 | "_model_module": "@jupyter-widgets/controls"
880 | }
881 | },
882 | "209a666fbe3446c19456f140e22044f0": {
883 | "model_module": "@jupyter-widgets/base",
884 | "model_name": "LayoutModel",
885 | "state": {
886 | "_view_name": "LayoutView",
887 | "grid_template_rows": null,
888 | "right": null,
889 | "justify_content": null,
890 | "_view_module": "@jupyter-widgets/base",
891 | "overflow": null,
892 | "_model_module_version": "1.2.0",
893 | "_view_count": null,
894 | "flex_flow": null,
895 | "width": null,
896 | "min_width": null,
897 | "border": null,
898 | "align_items": null,
899 | "bottom": null,
900 | "_model_module": "@jupyter-widgets/base",
901 | "top": null,
902 | "grid_column": null,
903 | "overflow_y": null,
904 | "overflow_x": null,
905 | "grid_auto_flow": null,
906 | "grid_area": null,
907 | "grid_template_columns": null,
908 | "flex": null,
909 | "_model_name": "LayoutModel",
910 | "justify_items": null,
911 | "grid_row": null,
912 | "max_height": null,
913 | "align_content": null,
914 | "visibility": null,
915 | "align_self": null,
916 | "height": null,
917 | "min_height": null,
918 | "padding": null,
919 | "grid_auto_rows": null,
920 | "grid_gap": null,
921 | "max_width": null,
922 | "order": null,
923 | "_view_module_version": "1.2.0",
924 | "grid_template_areas": null,
925 | "object_position": null,
926 | "object_fit": null,
927 | "grid_auto_columns": null,
928 | "margin": null,
929 | "display": null,
930 | "left": null
931 | }
932 | },
933 | "02783989ed3a4ac49ee833498551f5d4": {
934 | "model_module": "@jupyter-widgets/controls",
935 | "model_name": "DescriptionStyleModel",
936 | "state": {
937 | "_view_name": "StyleView",
938 | "_model_name": "DescriptionStyleModel",
939 | "description_width": "",
940 | "_view_module": "@jupyter-widgets/base",
941 | "_model_module_version": "1.5.0",
942 | "_view_count": null,
943 | "_view_module_version": "1.2.0",
944 | "_model_module": "@jupyter-widgets/controls"
945 | }
946 | },
947 | "a3c164c2313647aab1b8fae1157f3d15": {
948 | "model_module": "@jupyter-widgets/base",
949 | "model_name": "LayoutModel",
950 | "state": {
951 | "_view_name": "LayoutView",
952 | "grid_template_rows": null,
953 | "right": null,
954 | "justify_content": null,
955 | "_view_module": "@jupyter-widgets/base",
956 | "overflow": null,
957 | "_model_module_version": "1.2.0",
958 | "_view_count": null,
959 | "flex_flow": null,
960 | "width": null,
961 | "min_width": null,
962 | "border": null,
963 | "align_items": null,
964 | "bottom": null,
965 | "_model_module": "@jupyter-widgets/base",
966 | "top": null,
967 | "grid_column": null,
968 | "overflow_y": null,
969 | "overflow_x": null,
970 | "grid_auto_flow": null,
971 | "grid_area": null,
972 | "grid_template_columns": null,
973 | "flex": null,
974 | "_model_name": "LayoutModel",
975 | "justify_items": null,
976 | "grid_row": null,
977 | "max_height": null,
978 | "align_content": null,
979 | "visibility": null,
980 | "align_self": null,
981 | "height": null,
982 | "min_height": null,
983 | "padding": null,
984 | "grid_auto_rows": null,
985 | "grid_gap": null,
986 | "max_width": null,
987 | "order": null,
988 | "_view_module_version": "1.2.0",
989 | "grid_template_areas": null,
990 | "object_position": null,
991 | "object_fit": null,
992 | "grid_auto_columns": null,
993 | "margin": null,
994 | "display": null,
995 | "left": null
996 | }
997 | }
998 | }
999 | }
1000 | },
1001 | "cells": [
1002 | {
1003 | "cell_type": "code",
1004 | "metadata": {
1005 | "colab": {
1006 | "base_uri": "https://localhost:8080/"
1007 | },
1008 | "id": "gH3DPpu7MO7L",
1009 | "outputId": "9839c29a-ea0f-492d-a8b0-05927ad05e0f"
1010 | },
1011 | "source": [
1012 | "!pip install transformers\n",
1013 | "import torch\n",
1014 | "from transformers import AutoTokenizer, AutoModelWithLMHead"
1015 | ],
1016 | "execution_count": null,
1017 | "outputs": [
1018 | {
1019 | "output_type": "stream",
1020 | "text": [
1021 | "Collecting transformers\n",
1022 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/d8/b2/57495b5309f09fa501866e225c84532d1fd89536ea62406b2181933fb418/transformers-4.5.1-py3-none-any.whl (2.1MB)\n",
1023 | "\u001b[K |████████████████████████████████| 2.1MB 5.1MB/s \n",
1024 | "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.0.12)\n",
1025 | "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.41.1)\n",
1026 | "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.19.5)\n",
1027 | "Collecting sacremoses\n",
1028 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)\n",
1029 | "\u001b[K |████████████████████████████████| 901kB 22.7MB/s \n",
1030 | "\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20)\n",
1031 | "Requirement already satisfied: importlib-metadata; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from transformers) (3.10.1)\n",
1032 | "Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers) (20.9)\n",
1033 | "Collecting tokenizers<0.11,>=0.10.1\n",
1034 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/ae/04/5b870f26a858552025a62f1649c20d29d2672c02ff3c3fb4c688ca46467a/tokenizers-0.10.2-cp37-cp37m-manylinux2010_x86_64.whl (3.3MB)\n",
1035 | "\u001b[K |████████████████████████████████| 3.3MB 38.2MB/s \n",
1036 | "\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0)\n",
1037 | "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2)\n",
1038 | "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.0.1)\n",
1039 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0)\n",
1040 | "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->transformers) (3.4.1)\n",
1041 | "Requirement already satisfied: typing-extensions>=3.6.4; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < \"3.8\"->transformers) (3.7.4.3)\n",
1042 | "Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers) (2.4.7)\n",
1043 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2020.12.5)\n",
1044 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3)\n",
1045 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4)\n",
1046 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10)\n",
1047 | "Installing collected packages: sacremoses, tokenizers, transformers\n",
1048 | "Successfully installed sacremoses-0.0.45 tokenizers-0.10.2 transformers-4.5.1\n"
1049 | ],
1050 | "name": "stdout"
1051 | }
1052 | ]
1053 | },
1054 | {
1055 | "cell_type": "code",
1056 | "metadata": {
1057 | "colab": {
1058 | "base_uri": "https://localhost:8080/",
1059 | "height": 266,
1060 | "referenced_widgets": [
1061 | "e000043c0da4449b9df33f833ccef865",
1062 | "a14ce6e10acf4d2b8d6ba6bee597c389",
1063 | "28e05434e4c940cca6bc5e7b1f296226",
1064 | "72f1e0fa775043dba4542cd93b5a5470",
1065 | "6dc7555fa342465bb6a0afcabcd35537",
1066 | "04e937bc336e4823a883a1645ff2cd76",
1067 | "18f7a33467aa4d64a32c9de0e86a1b65",
1068 | "c33aaa6e71d24bf48968bab3729c3bd5",
1069 | "6def94fb658f47e59663e5ef80ca3b8d",
1070 | "177f493826d3486aa640b5c6acba89f0",
1071 | "408845522cd9452c8da2c545f38f6b7f",
1072 | "837c2f05b7b24f5684d0ad1e4651cdd3",
1073 | "650303a960ec4610ac2463167d0f88ad",
1074 | "708987ca3048462ba07c405879bb7f5f",
1075 | "bc3fe2af97a04e829a5c35aba4fa89f0",
1076 | "e4b5af1b74b84caf8cf962f2c2d88f4a",
1077 | "4e3aedb9a7a3462194c640ed52370db4",
1078 | "06049fe5455d4ca8a25d65b3b50b64e4",
1079 | "5fba47af158a4382914e76284402939d",
1080 | "426c9236a2fe41ecae28f48a915769d6",
1081 | "d6312e9620ff4397a56e9aa1bef17c1c",
1082 | "282d0d86aa1e45c48f85951a5dcd9079",
1083 | "9e83dbf68ab94874a0a60d95ff9cd27c",
1084 | "36afbf787e3845d7a8fe0898b5c87591",
1085 | "7e5a9b5191df4eb8ba5938ec194f5c73",
1086 | "f5385bc04faf4740912dd953f11743d8",
1087 | "646641e9a6804f8288e011c19548d600",
1088 | "0912e7c1972d4fd8b16ac0d9c49ea266",
1089 | "387e7312c2864a02ba9de00fb8d1f34b",
1090 | "209a666fbe3446c19456f140e22044f0",
1091 | "02783989ed3a4ac49ee833498551f5d4",
1092 | "a3c164c2313647aab1b8fae1157f3d15"
1093 | ]
1094 | },
1095 | "id": "9U8SawuHMV6i",
1096 | "outputId": "a96017bf-cfc5-4256-b93c-cf2e9d5111e9"
1097 | },
1098 | "source": [
1099 | "tokenizer = AutoTokenizer.from_pretrained('t5-base')\n",
1100 | "model = AutoModelWithLMHead.from_pretrained('t5-base', return_dict=True)"
1101 | ],
1102 | "execution_count": null,
1103 | "outputs": [
1104 | {
1105 | "output_type": "display_data",
1106 | "data": {
1107 | "application/vnd.jupyter.widget-view+json": {
1108 | "model_id": "e000043c0da4449b9df33f833ccef865",
1109 | "version_minor": 0,
1110 | "version_major": 2
1111 | },
1112 | "text/plain": [
1113 | "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1199.0, style=ProgressStyle(description…"
1114 | ]
1115 | },
1116 | "metadata": {
1117 | "tags": []
1118 | }
1119 | },
1120 | {
1121 | "output_type": "stream",
1122 | "text": [
1123 | "\n"
1124 | ],
1125 | "name": "stdout"
1126 | },
1127 | {
1128 | "output_type": "display_data",
1129 | "data": {
1130 | "application/vnd.jupyter.widget-view+json": {
1131 | "model_id": "6def94fb658f47e59663e5ef80ca3b8d",
1132 | "version_minor": 0,
1133 | "version_major": 2
1134 | },
1135 | "text/plain": [
1136 | "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=791656.0, style=ProgressStyle(descripti…"
1137 | ]
1138 | },
1139 | "metadata": {
1140 | "tags": []
1141 | }
1142 | },
1143 | {
1144 | "output_type": "stream",
1145 | "text": [
1146 | "\n"
1147 | ],
1148 | "name": "stdout"
1149 | },
1150 | {
1151 | "output_type": "display_data",
1152 | "data": {
1153 | "application/vnd.jupyter.widget-view+json": {
1154 | "model_id": "4e3aedb9a7a3462194c640ed52370db4",
1155 | "version_minor": 0,
1156 | "version_major": 2
1157 | },
1158 | "text/plain": [
1159 | "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1389353.0, style=ProgressStyle(descript…"
1160 | ]
1161 | },
1162 | "metadata": {
1163 | "tags": []
1164 | }
1165 | },
1166 | {
1167 | "output_type": "stream",
1168 | "text": [
1169 | "\n"
1170 | ],
1171 | "name": "stdout"
1172 | },
1173 | {
1174 | "output_type": "stream",
1175 | "text": [
1176 | "/usr/local/lib/python3.7/dist-packages/transformers/models/auto/modeling_auto.py:762: FutureWarning: The class `AutoModelWithLMHead` is deprecated and will be removed in a future version. Please use `AutoModelForCausalLM` for causal language models, `AutoModelForMaskedLM` for masked language models and `AutoModelForSeq2SeqLM` for encoder-decoder models.\n",
1177 | " FutureWarning,\n"
1178 | ],
1179 | "name": "stderr"
1180 | },
1181 | {
1182 | "output_type": "display_data",
1183 | "data": {
1184 | "application/vnd.jupyter.widget-view+json": {
1185 | "model_id": "7e5a9b5191df4eb8ba5938ec194f5c73",
1186 | "version_minor": 0,
1187 | "version_major": 2
1188 | },
1189 | "text/plain": [
1190 | "HBox(children=(FloatProgress(value=0.0, description='Downloading', max=891691430.0, style=ProgressStyle(descri…"
1191 | ]
1192 | },
1193 | "metadata": {
1194 | "tags": []
1195 | }
1196 | },
1197 | {
1198 | "output_type": "stream",
1199 | "text": [
1200 | "\n"
1201 | ],
1202 | "name": "stdout"
1203 | }
1204 | ]
1205 | },
1206 | {
1207 | "cell_type": "code",
1208 | "metadata": {
1209 | "id": "KX7bj0lFMf5f"
1210 | },
1211 | "source": [
1212 | "sequence = (\"In May, Churchill was still generally unpopular with many Conservatives and probably most of the Labour Party. Chamberlain \"\n",
1213 | " \"remained Conservative Party leader until October when ill health forced his resignation. By that time, Churchill had won the \"\n",
1214 | " \"doubters over and his succession as party leader was a formality.\"\n",
1215 | " \" \"\n",
1216 | " \"He began his premiership by forming a five-man war cabinet which included Chamberlain as Lord President of the Council, \"\n",
1217 | " \"Labour leader Clement Attlee as Lord Privy Seal (later as Deputy Prime Minister), Halifax as Foreign Secretary and Labour's \"\n",
1218 | " \"Arthur Greenwood as a minister without portfolio. In practice, these five were augmented by the service chiefs and ministers \"\n",
1219 | " \"who attended the majority of meetings. The cabinet changed in size and membership as the war progressed, one of the key \"\n",
1220 | " \"appointments being the leading trades unionist Ernest Bevin as Minister of Labour and National Service. In response to \"\n",
1221 | " \"previous criticisms that there had been no clear single minister in charge of the prosecution of the war, Churchill created \"\n",
1222 | " \"and took the additional position of Minister of Defence, making him the most powerful wartime Prime Minister in British \"\n",
1223 | " \"history. He drafted outside experts into government to fulfil vital functions, especially on the Home Front. These included \"\n",
1224 | " \"personal friends like Lord Beaverbrook and Frederick Lindemann, who became the government's scientific advisor.\"\n",
1225 | " \" \"\n",
1226 | " \"At the end of May, with the British Expeditionary Force in retreat to Dunkirk and the Fall of France seemingly imminent, \"\n",
1227 | " \"Halifax proposed that the government should explore the possibility of a negotiated peace settlement using the still-neutral \"\n",
1228 | " \"Mussolini as an intermediary. There were several high-level meetings from 26 to 28 May, including two with the French \"\n",
1229 | " \"premier Paul Reynaud. Churchill's resolve was to fight on, even if France capitulated, but his position remained precarious \"\n",
1230 | " \"until Chamberlain resolved to support him. Churchill had the full support of the two Labour members but knew he could not \"\n",
1231 | " \"survive as Prime Minister if both Chamberlain and Halifax were against him. In the end, by gaining the support of his outer \"\n",
1232 | " \"cabinet, Churchill outmanoeuvred Halifax and won Chamberlain over. Churchill believed that the only option was to fight on \"\n",
1233 | " \"and his use of rhetoric hardened public opinion against a peaceful resolution and prepared the British people for a long war \"\n",
1234 | " \"– Jenkins says Churchill's speeches were 'an inspiration for the nation, and a catharsis for Churchill himself'.\"\n",
1235 | " \" \"\n",
1236 | " \"His first speech as Prime Minister, delivered to the Commons on 13 May was the 'blood, toil, tears and sweat' speech. It was \"\n",
1237 | " \"little more than a short statement but, Jenkins says, 'it included phrases which have reverberated down the decades'.\")\n"
1238 | ],
1239 | "execution_count": null,
1240 | "outputs": []
1241 | },
1242 | {
1243 | "cell_type": "code",
1244 | "metadata": {
1245 | "id": "NflG_FH5MgmT"
1246 | },
1247 | "source": [
1248 | "inputs = tokenizer.encode(\"summarize: \" + sequence,\n",
1249 | " return_tensors='pt',\n",
1250 | " max_length=512,\n",
1251 | " truncation=True)"
1252 | ],
1253 | "execution_count": null,
1254 | "outputs": []
1255 | },
1256 | {
1257 | "cell_type": "code",
1258 | "metadata": {
1259 | "id": "kfNr-KJeMkoE"
1260 | },
1261 | "source": [
1262 | "summary_ids = model.generate(inputs, max_length=150, min_length=80, length_penalty=5., num_beams=2)\n"
1263 | ],
1264 | "execution_count": null,
1265 | "outputs": []
1266 | },
1267 | {
1268 | "cell_type": "code",
1269 | "metadata": {
1270 | "id": "B4lu0y59Mm0B"
1271 | },
1272 | "source": [
1273 | "summary = tokenizer.decode(summary_ids[0])"
1274 | ],
1275 | "execution_count": null,
1276 | "outputs": []
1277 | },
1278 | {
1279 | "cell_type": "code",
1280 | "metadata": {
1281 | "colab": {
1282 | "base_uri": "https://localhost:8080/",
1283 | "height": 69
1284 | },
1285 | "id": "qcIHHnmRMpIx",
1286 | "outputId": "895c0ba3-efb7-4836-e9ce-d3fffdda39af"
1287 | },
1288 | "source": [
1289 | "summary"
1290 | ],
1291 | "execution_count": null,
1292 | "outputs": [
1293 | {
1294 | "output_type": "execute_result",
1295 | "data": {
1296 | "application/vnd.google.colaboratory.intrinsic+json": {
1297 | "type": "string"
1298 | },
1299 | "text/plain": [
1300 | "\" churchill formed a five-man war cabinet which included chamberlain as Lord President of the Council, labour leader Clement Attlee as Lord Privy Seal (later as Deputy Prime Minister), Halifax as Foreign Secretary and labour's Arthur Greenwood as a minister without portfolio. he drafted outside experts into government to fulfil vital functions, especially on the home front. he was the most powerful wartime prime minister in British history.\""
1301 | ]
1302 | },
1303 | "metadata": {
1304 | "tags": []
1305 | },
1306 | "execution_count": 8
1307 | }
1308 | ]
1309 | },
1310 | {
1311 | "cell_type": "code",
1312 | "metadata": {
1313 | "id": "BxSbosLHMp9w"
1314 | },
1315 | "source": [
1316 | ""
1317 | ],
1318 | "execution_count": null,
1319 | "outputs": []
1320 | }
1321 | ]
1322 | }
--------------------------------------------------------------------------------
/src/extractive/BERT_Extractive_Text_Summarization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "BERT-Extractive-Text-Summarization.ipynb",
7 | "provenance": []
8 | },
9 | "kernelspec": {
10 | "name": "python3",
11 | "display_name": "Python 3"
12 | },
13 | "language_info": {
14 | "name": "python"
15 | }
16 | },
17 | "cells": [
18 | {
19 | "cell_type": "code",
20 | "metadata": {
21 | "colab": {
22 | "base_uri": "https://localhost:8080/"
23 | },
24 | "id": "7kOx0267b-ve",
25 | "outputId": "b6780494-63de-46e5-c62f-7a42616732b7"
26 | },
27 | "source": [
28 | "! pip install transformers==2.2.0\n",
29 | "! pip install spacy==2.0.12\n",
30 | "! pip install bert-extractive-summarizer"
31 | ],
32 | "execution_count": 1,
33 | "outputs": [
34 | {
35 | "output_type": "stream",
36 | "text": [
37 | "Collecting transformers==2.2.0\n",
38 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/ec/e7/0a1babead1b79afabb654fbec0a052e0d833ba4205a6dfd98b1aeda9c82e/transformers-2.2.0-py3-none-any.whl (360kB)\n",
39 | "\u001b[K |████████████████████████████████| 368kB 3.9MB/s \n",
40 | "\u001b[?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (4.41.1)\n",
41 | "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (2019.12.20)\n",
42 | "Collecting sacremoses\n",
43 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)\n",
44 | "\u001b[K |████████████████████████████████| 901kB 20.7MB/s \n",
45 | "\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (2.23.0)\n",
46 | "Collecting sentencepiece\n",
47 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)\n",
48 | "\u001b[K |████████████████████████████████| 1.2MB 24.2MB/s \n",
49 | "\u001b[?25hCollecting boto3\n",
50 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/73/ad/62cdfb117f875258b5925d6dbe987bea500e91f2e7ec343a42556167174c/boto3-1.17.96-py2.py3-none-any.whl (131kB)\n",
51 | "\u001b[K |████████████████████████████████| 133kB 34.9MB/s \n",
52 | "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (1.19.5)\n",
53 | "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (7.1.2)\n",
54 | "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (1.0.1)\n",
55 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (1.15.0)\n",
56 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (3.0.4)\n",
57 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (2021.5.30)\n",
58 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (1.24.3)\n",
59 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (2.10)\n",
60 | "Collecting jmespath<1.0.0,>=0.7.1\n",
61 | " Downloading https://files.pythonhosted.org/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl\n",
62 | "Collecting botocore<1.21.0,>=1.20.96\n",
63 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/63/cb/e9805f42151c5c8fbdd1ffac92deffa947f09dcff148fc9d8a1b8066240d/botocore-1.20.96-py2.py3-none-any.whl (7.6MB)\n",
64 | "\u001b[K |████████████████████████████████| 7.6MB 29.9MB/s \n",
65 | "\u001b[?25hCollecting s3transfer<0.5.0,>=0.4.0\n",
66 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/63/d0/693477c688348654ddc21dcdce0817653a294aa43f41771084c25e7ff9c7/s3transfer-0.4.2-py2.py3-none-any.whl (79kB)\n",
67 | "\u001b[K |████████████████████████████████| 81kB 8.4MB/s \n",
68 | "\u001b[?25hRequirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.7/dist-packages (from botocore<1.21.0,>=1.20.96->boto3->transformers==2.2.0) (2.8.1)\n",
69 | "\u001b[31mERROR: botocore 1.20.96 has requirement urllib3<1.27,>=1.25.4, but you'll have urllib3 1.24.3 which is incompatible.\u001b[0m\n",
70 | "Installing collected packages: sacremoses, sentencepiece, jmespath, botocore, s3transfer, boto3, transformers\n",
71 | "Successfully installed boto3-1.17.96 botocore-1.20.96 jmespath-0.10.0 s3transfer-0.4.2 sacremoses-0.0.45 sentencepiece-0.1.95 transformers-2.2.0\n",
72 | "Collecting spacy==2.0.12\n",
73 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/24/de/ac14cd453c98656d6738a5669f96a4ac7f668493d5e6b78227ac933c5fd4/spacy-2.0.12.tar.gz (22.0MB)\n",
74 | "\u001b[K |████████████████████████████████| 22.0MB 141kB/s \n",
75 | "\u001b[?25hRequirement already satisfied: numpy>=1.7 in /usr/local/lib/python3.7/dist-packages (from spacy==2.0.12) (1.19.5)\n",
76 | "Collecting murmurhash<0.29,>=0.28\n",
77 | " Downloading https://files.pythonhosted.org/packages/5e/31/c8c1ecafa44db30579c8c457ac7a0f819e8b1dbc3e58308394fff5ff9ba7/murmurhash-0.28.0.tar.gz\n",
78 | "Collecting cymem<1.32,>=1.30\n",
79 | " Downloading https://files.pythonhosted.org/packages/f8/9e/273fbea507de99166c11cd0cb3fde1ac01b5bc724d9a407a2f927ede91a1/cymem-1.31.2.tar.gz\n",
80 | "Collecting preshed<2.0.0,>=1.0.0\n",
81 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/f4/0c/64933c18f02fbf46acdd0c7705ec1c1194f58b564bb5a2d140fabcb37bad/preshed-1.0.1-cp37-cp37m-manylinux1_x86_64.whl (79kB)\n",
82 | "\u001b[K |████████████████████████████████| 81kB 8.4MB/s \n",
83 | "\u001b[?25hCollecting thinc<6.11.0,>=6.10.3\n",
84 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/94/b1/47a88072d0a38b3594c0a638a62f9ef7c742b8b8a87f7b105f7ed720b14b/thinc-6.10.3.tar.gz (1.2MB)\n",
85 | "\u001b[K |████████████████████████████████| 1.2MB 43.8MB/s \n",
86 | "\u001b[?25hCollecting plac<1.0.0,>=0.9.6\n",
87 | " Downloading https://files.pythonhosted.org/packages/9e/9b/62c60d2f5bc135d2aa1d8c8a86aaf84edb719a59c7f11a4316259e61a298/plac-0.9.6-py2.py3-none-any.whl\n",
88 | "Collecting ujson>=1.35\n",
89 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/17/4e/50e8e4cf5f00b537095711c2c86ac4d7191aed2b4fffd5a19f06898f6929/ujson-4.0.2-cp37-cp37m-manylinux1_x86_64.whl (179kB)\n",
90 | "\u001b[K |████████████████████████████████| 184kB 41.6MB/s \n",
91 | "\u001b[?25hCollecting dill<0.3,>=0.2\n",
92 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/fe/42/bfe2e0857bc284cbe6a011d93f2a9ad58a22cb894461b199ae72cfef0f29/dill-0.2.9.tar.gz (150kB)\n",
93 | "\u001b[K |████████████████████████████████| 153kB 42.1MB/s \n",
94 | "\u001b[?25hCollecting regex==2017.4.5\n",
95 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/62/c0c0d762ffd4ffaf39f372eb8561b8d491a11ace5a7884610424a8b40f95/regex-2017.04.05.tar.gz (601kB)\n",
96 | "\u001b[K |████████████████████████████████| 604kB 36.2MB/s \n",
97 | "\u001b[?25hRequirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from spacy==2.0.12) (2.23.0)\n",
98 | "Collecting msgpack<1.0.0,>=0.5.6\n",
99 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/25/f8/6009e73f5b08743718d0660a18ecbc44b013a68980347a3835b63e875cdb/msgpack-0.6.2-cp37-cp37m-manylinux1_x86_64.whl (243kB)\n",
100 | "\u001b[K |████████████████████████████████| 245kB 40.0MB/s \n",
101 | "\u001b[?25hCollecting msgpack-numpy<1.0.0,>=0.4.1\n",
102 | " Downloading https://files.pythonhosted.org/packages/19/05/05b8d7c69c6abb36a34325cc3150089bdafc359f0a81fb998d93c5d5c737/msgpack_numpy-0.4.7.1-py2.py3-none-any.whl\n",
103 | "Collecting cytoolz<0.10,>=0.9.0\n",
104 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/f4/9728ba01ccb2f55df9a5af029b48ba0aaca1081bbd7823ea2ee223ba7a42/cytoolz-0.9.0.1.tar.gz (443kB)\n",
105 | "\u001b[K |████████████████████████████████| 450kB 27.4MB/s \n",
106 | "\u001b[?25hCollecting wrapt<1.11.0,>=1.10.0\n",
107 | " Downloading https://files.pythonhosted.org/packages/a0/47/66897906448185fcb77fc3c2b1bc20ed0ecca81a0f2f88eda3fc5a34fc3d/wrapt-1.10.11.tar.gz\n",
108 | "Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in /usr/local/lib/python3.7/dist-packages (from thinc<6.11.0,>=6.10.3->spacy==2.0.12) (4.41.1)\n",
109 | "Requirement already satisfied: six<2.0.0,>=1.10.0 in /usr/local/lib/python3.7/dist-packages (from thinc<6.11.0,>=6.10.3->spacy==2.0.12) (1.15.0)\n",
110 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (2.10)\n",
111 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (1.24.3)\n",
112 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (2021.5.30)\n",
113 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (3.0.4)\n",
114 | "Requirement already satisfied: toolz>=0.8.0 in /usr/local/lib/python3.7/dist-packages (from cytoolz<0.10,>=0.9.0->thinc<6.11.0,>=6.10.3->spacy==2.0.12) (0.11.1)\n",
115 | "Building wheels for collected packages: spacy, murmurhash, cymem, thinc, dill, regex, cytoolz, wrapt\n",
116 | " Building wheel for spacy (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
117 | " Created wheel for spacy: filename=spacy-2.0.12-cp37-cp37m-linux_x86_64.whl size=28971942 sha256=e2da81a32a705fee25a3effd0ecfcd1726efbb8d129947185f460e4c0cd9035c\n",
118 | " Stored in directory: /root/.cache/pip/wheels/60/0b/bb/7c2e28db574dbb2358176934eddd32a1c5f838ba0bc23eaaab\n",
119 | " Building wheel for murmurhash (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
120 | " Created wheel for murmurhash: filename=murmurhash-0.28.0-cp37-cp37m-linux_x86_64.whl size=42762 sha256=1c9e04ef7ac30a7071e7333e2935280522b06803179cc769bae2d0332f1f367e\n",
121 | " Stored in directory: /root/.cache/pip/wheels/b8/94/a4/f69f8664cdc1098603df44771b7fec5fd1b3d8364cdd83f512\n",
122 | " Building wheel for cymem (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
123 | " Created wheel for cymem: filename=cymem-1.31.2-cp37-cp37m-linux_x86_64.whl size=77006 sha256=8d8aff8810df2e4ca37d487643233e08422975c4a171135da257044a59830433\n",
124 | " Stored in directory: /root/.cache/pip/wheels/55/8d/4a/f6328252aa2aaec0b1cb906fd96a1566d77f0f67701071ad13\n",
125 | " Building wheel for thinc (setup.py) ... \u001b[?25lerror\n",
126 | "\u001b[31m ERROR: Failed building wheel for thinc\u001b[0m\n",
127 | "\u001b[?25h Running setup.py clean for thinc\n",
128 | " Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
129 | " Created wheel for dill: filename=dill-0.2.9-cp37-none-any.whl size=77421 sha256=4087ab6e3f86fae97c2c354fccff7ec6a72267336c176d6cd9675d3c36bb6e69\n",
130 | " Stored in directory: /root/.cache/pip/wheels/5b/d7/0f/e58eae695403de585269f4e4a94e0cd6ca60ec0c202936fa4a\n",
131 | " Building wheel for regex (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
132 | " Created wheel for regex: filename=regex-2017.4.5-cp37-cp37m-linux_x86_64.whl size=534417 sha256=6c408e4f534f9e2c59bfc2a112b69e0add2ee7403b7b346c6a6268dc43fdf003\n",
133 | " Stored in directory: /root/.cache/pip/wheels/75/07/38/3c16b529d50cb4e0cd3dbc7b75cece8a09c132692c74450b01\n",
134 | " Building wheel for cytoolz (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
135 | " Created wheel for cytoolz: filename=cytoolz-0.9.0.1-cp37-cp37m-linux_x86_64.whl size=1239554 sha256=9f30971dbbc56066fc1c1e71411fd6c2a943661902773ff0951d4d460a36e168\n",
136 | " Stored in directory: /root/.cache/pip/wheels/88/f3/11/9817b001e59ab04889e8cffcbd9087e2e2155b9ebecfc8dd38\n",
137 | " Building wheel for wrapt (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
138 | " Created wheel for wrapt: filename=wrapt-1.10.11-cp37-cp37m-linux_x86_64.whl size=66038 sha256=20b8ab828cf90846b00bc403ce7f459a4a8e30a0a43608ba8d197d57e8091752\n",
139 | " Stored in directory: /root/.cache/pip/wheels/48/5d/04/22361a593e70d23b1f7746d932802efe1f0e523376a74f321e\n",
140 | "Successfully built spacy murmurhash cymem dill regex cytoolz wrapt\n",
141 | "Failed to build thinc\n",
142 | "\u001b[31mERROR: tensorflow 2.5.0 has requirement wrapt~=1.12.1, but you'll have wrapt 1.10.11 which is incompatible.\u001b[0m\n",
143 | "\u001b[31mERROR: multiprocess 0.70.11.1 has requirement dill>=0.3.3, but you'll have dill 0.2.9 which is incompatible.\u001b[0m\n",
144 | "\u001b[31mERROR: fastai 1.0.61 has requirement spacy>=2.0.18; python_version < \"3.8\", but you'll have spacy 2.0.12 which is incompatible.\u001b[0m\n",
145 | "\u001b[31mERROR: en-core-web-sm 2.2.5 has requirement spacy>=2.2.2, but you'll have spacy 2.0.12 which is incompatible.\u001b[0m\n",
146 | "Installing collected packages: murmurhash, cymem, preshed, msgpack, msgpack-numpy, cytoolz, wrapt, plac, dill, thinc, ujson, regex, spacy\n",
147 | " Found existing installation: murmurhash 1.0.5\n",
148 | " Uninstalling murmurhash-1.0.5:\n",
149 | " Successfully uninstalled murmurhash-1.0.5\n",
150 | " Found existing installation: cymem 2.0.5\n",
151 | " Uninstalling cymem-2.0.5:\n",
152 | " Successfully uninstalled cymem-2.0.5\n",
153 | " Found existing installation: preshed 3.0.5\n",
154 | " Uninstalling preshed-3.0.5:\n",
155 | " Successfully uninstalled preshed-3.0.5\n",
156 | " Found existing installation: msgpack 1.0.2\n",
157 | " Uninstalling msgpack-1.0.2:\n",
158 | " Successfully uninstalled msgpack-1.0.2\n",
159 | " Found existing installation: wrapt 1.12.1\n",
160 | " Uninstalling wrapt-1.12.1:\n",
161 | " Successfully uninstalled wrapt-1.12.1\n",
162 | " Found existing installation: plac 1.1.3\n",
163 | " Uninstalling plac-1.1.3:\n",
164 | " Successfully uninstalled plac-1.1.3\n",
165 | " Found existing installation: dill 0.3.3\n",
166 | " Uninstalling dill-0.3.3:\n",
167 | " Successfully uninstalled dill-0.3.3\n",
168 | " Found existing installation: thinc 7.4.0\n",
169 | " Uninstalling thinc-7.4.0:\n",
170 | " Successfully uninstalled thinc-7.4.0\n",
171 | " Running setup.py install for thinc ... \u001b[?25l\u001b[?25herror\n",
172 | " Rolling back uninstall of thinc\n",
173 | " Moving to /usr/local/lib/python3.7/dist-packages/thinc-7.4.0.dist-info/\n",
174 | " from /usr/local/lib/python3.7/dist-packages/~hinc-7.4.0.dist-info\n",
175 | " Moving to /usr/local/lib/python3.7/dist-packages/thinc/\n",
176 | " from /usr/local/lib/python3.7/dist-packages/~hinc\n",
177 | "\u001b[31mERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '\"'\"'/tmp/pip-install-ihuvde1u/thinc/setup.py'\"'\"'; __file__='\"'\"'/tmp/pip-install-ihuvde1u/thinc/setup.py'\"'\"';f=getattr(tokenize, '\"'\"'open'\"'\"', open)(__file__);code=f.read().replace('\"'\"'\\r\\n'\"'\"', '\"'\"'\\n'\"'\"');f.close();exec(compile(code, __file__, '\"'\"'exec'\"'\"'))' install --record /tmp/pip-record-mitzeiqu/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.\u001b[0m\n",
178 | "Collecting bert-extractive-summarizer\n",
179 | " Downloading https://files.pythonhosted.org/packages/1a/07/fdb05f9e18b6f641499ef56737126fbd2fafe1cdc1a04ba069d5aa205901/bert_extractive_summarizer-0.7.1-py3-none-any.whl\n",
180 | "Requirement already satisfied: spacy in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (2.2.4)\n",
181 | "Requirement already satisfied: transformers in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (2.2.0)\n",
182 | "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (0.22.2.post1)\n",
183 | "Collecting cymem<2.1.0,>=2.0.2\n",
184 | " Downloading https://files.pythonhosted.org/packages/b6/34/40547e057c1b31080c1d78f6accf9f1ed6ee46e3fc7ebd8599197915ef89/cymem-2.0.5-cp37-cp37m-manylinux2014_x86_64.whl\n",
185 | "Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.19.5)\n",
186 | "Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.9.6)\n",
187 | "Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.8.2)\n",
188 | "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (2.23.0)\n",
189 | "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (4.41.1)\n",
190 | "Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.0.5)\n",
191 | "Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.4.1)\n",
192 | "Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (7.4.0)\n",
193 | "Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.0.0)\n",
194 | "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.28.0)\n",
195 | "Collecting preshed<3.1.0,>=3.0.2\n",
196 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/6c/91/1cf0f7f0a6720f93632fc8ec42d54233e8e142640ac3fcf0fecaa8dc4648/preshed-3.0.5-cp37-cp37m-manylinux2014_x86_64.whl (126kB)\n",
197 | "\u001b[K |████████████████████████████████| 133kB 4.8MB/s \n",
198 | "\u001b[?25hRequirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (57.0.0)\n",
199 | "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (2019.12.20)\n",
200 | "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (0.1.95)\n",
201 | "Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (0.0.45)\n",
202 | "Requirement already satisfied: boto3 in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (1.17.96)\n",
203 | "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->bert-extractive-summarizer) (1.0.1)\n",
204 | "Requirement already satisfied: scipy>=0.17.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->bert-extractive-summarizer) (1.4.1)\n",
205 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (3.0.4)\n",
206 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (2021.5.30)\n",
207 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (1.24.3)\n",
208 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (2.10)\n",
209 | "Requirement already satisfied: importlib-metadata>=0.20; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (4.5.0)\n",
210 | "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers->bert-extractive-summarizer) (7.1.2)\n",
211 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers->bert-extractive-summarizer) (1.15.0)\n",
212 | "Requirement already satisfied: botocore<1.21.0,>=1.20.96 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (1.20.96)\n",
213 | "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (0.10.0)\n",
214 | "Requirement already satisfied: s3transfer<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (0.4.2)\n",
215 | "Requirement already satisfied: typing-extensions>=3.6.4; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.20; python_version < \"3.8\"->catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (3.7.4.3)\n",
216 | "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.20; python_version < \"3.8\"->catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (3.4.1)\n",
217 | "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.7/dist-packages (from botocore<1.21.0,>=1.20.96->boto3->transformers->bert-extractive-summarizer) (2.8.1)\n",
218 | "Installing collected packages: bert-extractive-summarizer, cymem, preshed\n",
219 | " Found existing installation: cymem 1.31.2\n",
220 | " Uninstalling cymem-1.31.2:\n",
221 | " Successfully uninstalled cymem-1.31.2\n",
222 | " Found existing installation: preshed 1.0.1\n",
223 | " Uninstalling preshed-1.0.1:\n",
224 | " Successfully uninstalled preshed-1.0.1\n",
225 | "Successfully installed bert-extractive-summarizer-0.7.1 cymem-2.0.5 preshed-3.0.5\n"
226 | ],
227 | "name": "stdout"
228 | }
229 | ]
230 | },
231 | {
232 | "cell_type": "code",
233 | "metadata": {
234 | "id": "gyKfZQp4sSsV"
235 | },
236 | "source": [
237 | "from summarizer import Summarizer,TransformerSummarizer"
238 | ],
239 | "execution_count": 2,
240 | "outputs": []
241 | },
242 | {
243 | "cell_type": "code",
244 | "metadata": {
245 | "id": "0XoGpEAgsaC9"
246 | },
247 | "source": [
248 | "sequence = (\"In May, Churchill was still generally unpopular with many Conservatives and probably most of the Labour Party. Chamberlain \"\n",
249 | " \"remained Conservative Party leader until October when ill health forced his resignation. By that time, Churchill had won the \"\n",
250 | " \"doubters over and his succession as party leader was a formality.\"\n",
251 | " \" \"\n",
252 | " \"He began his premiership by forming a five-man war cabinet which included Chamberlain as Lord President of the Council, \"\n",
253 | " \"Labour leader Clement Attlee as Lord Privy Seal (later as Deputy Prime Minister), Halifax as Foreign Secretary and Labour's \"\n",
254 | " \"Arthur Greenwood as a minister without portfolio. In practice, these five were augmented by the service chiefs and ministers \"\n",
255 | " \"who attended the majority of meetings. The cabinet changed in size and membership as the war progressed, one of the key \"\n",
256 | " \"appointments being the leading trades unionist Ernest Bevin as Minister of Labour and National Service. In response to \"\n",
257 | " \"previous criticisms that there had been no clear single minister in charge of the prosecution of the war, Churchill created \"\n",
258 | " \"and took the additional position of Minister of Defence, making him the most powerful wartime Prime Minister in British \"\n",
259 | " \"history. He drafted outside experts into government to fulfil vital functions, especially on the Home Front. These included \"\n",
260 | " \"personal friends like Lord Beaverbrook and Frederick Lindemann, who became the government's scientific advisor.\"\n",
261 | " \" \"\n",
262 | " \"At the end of May, with the British Expeditionary Force in retreat to Dunkirk and the Fall of France seemingly imminent, \"\n",
263 | " \"Halifax proposed that the government should explore the possibility of a negotiated peace settlement using the still-neutral \"\n",
264 | " \"Mussolini as an intermediary. There were several high-level meetings from 26 to 28 May, including two with the French \"\n",
265 | " \"premier Paul Reynaud. Churchill's resolve was to fight on, even if France capitulated, but his position remained precarious \"\n",
266 | " \"until Chamberlain resolved to support him. Churchill had the full support of the two Labour members but knew he could not \"\n",
267 | " \"survive as Prime Minister if both Chamberlain and Halifax were against him. In the end, by gaining the support of his outer \"\n",
268 | " \"cabinet, Churchill outmanoeuvred Halifax and won Chamberlain over. Churchill believed that the only option was to fight on \"\n",
269 | " \"and his use of rhetoric hardened public opinion against a peaceful resolution and prepared the British people for a long war \"\n",
270 | " \"– Jenkins says Churchill's speeches were 'an inspiration for the nation, and a catharsis for Churchill himself'.\"\n",
271 | " \" \"\n",
272 | " \"His first speech as Prime Minister, delivered to the Commons on 13 May was the 'blood, toil, tears and sweat' speech. It was \"\n",
273 | " \"little more than a short statement but, Jenkins says, 'it included phrases which have reverberated down the decades'.\")\n"
274 | ],
275 | "execution_count": 3,
276 | "outputs": []
277 | },
278 | {
279 | "cell_type": "code",
280 | "metadata": {
281 | "id": "kGF4q99Qsoxw"
282 | },
283 | "source": [
284 | "model = Summarizer()"
285 | ],
286 | "execution_count": 8,
287 | "outputs": []
288 | },
289 | {
290 | "cell_type": "code",
291 | "metadata": {
292 | "id": "YKTjMytusvZ1"
293 | },
294 | "source": [
295 | "summary = ''.join(model(sequence, min_length=60))"
296 | ],
297 | "execution_count": 9,
298 | "outputs": []
299 | },
300 | {
301 | "cell_type": "code",
302 | "metadata": {
303 | "colab": {
304 | "base_uri": "https://localhost:8080/",
305 | "height": 153
306 | },
307 | "id": "NL3NEce4s-hp",
308 | "outputId": "5652f896-f519-4363-dc55-87e068609859"
309 | },
310 | "source": [
311 | "summary"
312 | ],
313 | "execution_count": 10,
314 | "outputs": [
315 | {
316 | "output_type": "execute_result",
317 | "data": {
318 | "application/vnd.google.colaboratory.intrinsic+json": {
319 | "type": "string"
320 | },
321 | "text/plain": [
322 | "'In May, Churchill was still generally unpopular with many Conservatives and probably most of the Labour Party. Chamberlain remained Conservative Party leader until October when ill health forced his resignation. In response to previous criticisms that there had been no clear single minister in charge of the prosecution of the war, Churchill created and took the additional position of Minister of Defence, making him the most powerful wartime Prime Minister in British history. In the end, by gaining the support of his outer cabinet, Churchill outmanoeuvred Halifax and won Chamberlain over.'"
323 | ]
324 | },
325 | "metadata": {
326 | "tags": []
327 | },
328 | "execution_count": 10
329 | }
330 | ]
331 | },
332 | {
333 | "cell_type": "code",
334 | "metadata": {
335 | "id": "Sxlglqyvug11"
336 | },
337 | "source": [
338 | ""
339 | ],
340 | "execution_count": null,
341 | "outputs": []
342 | }
343 | ]
344 | }
--------------------------------------------------------------------------------
/src/extractive/GPT2_Extractive_Text_Summarization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "GPT2-Extractive-Text-Summarization.ipynb",
7 | "provenance": []
8 | },
9 | "kernelspec": {
10 | "name": "python3",
11 | "display_name": "Python 3"
12 | },
13 | "language_info": {
14 | "name": "python"
15 | }
16 | },
17 | "cells": [
18 | {
19 | "cell_type": "code",
20 | "metadata": {
21 | "colab": {
22 | "base_uri": "https://localhost:8080/"
23 | },
24 | "id": "7kOx0267b-ve",
25 | "outputId": "b6780494-63de-46e5-c62f-7a42616732b7"
26 | },
27 | "source": [
28 | "! pip install transformers==2.2.0\n",
29 | "! pip install spacy==2.0.12"
30 | ],
31 | "execution_count": 1,
32 | "outputs": [
33 | {
34 | "output_type": "stream",
35 | "text": [
36 | "Collecting transformers==2.2.0\n",
37 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/ec/e7/0a1babead1b79afabb654fbec0a052e0d833ba4205a6dfd98b1aeda9c82e/transformers-2.2.0-py3-none-any.whl (360kB)\n",
38 | "\u001b[K |████████████████████████████████| 368kB 3.9MB/s \n",
39 | "\u001b[?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (4.41.1)\n",
40 | "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (2019.12.20)\n",
41 | "Collecting sacremoses\n",
42 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)\n",
43 | "\u001b[K |████████████████████████████████| 901kB 20.7MB/s \n",
44 | "\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (2.23.0)\n",
45 | "Collecting sentencepiece\n",
46 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)\n",
47 | "\u001b[K |████████████████████████████████| 1.2MB 24.2MB/s \n",
48 | "\u001b[?25hCollecting boto3\n",
49 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/73/ad/62cdfb117f875258b5925d6dbe987bea500e91f2e7ec343a42556167174c/boto3-1.17.96-py2.py3-none-any.whl (131kB)\n",
50 | "\u001b[K |████████████████████████████████| 133kB 34.9MB/s \n",
51 | "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (1.19.5)\n",
52 | "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (7.1.2)\n",
53 | "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (1.0.1)\n",
54 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (1.15.0)\n",
55 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (3.0.4)\n",
56 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (2021.5.30)\n",
57 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (1.24.3)\n",
58 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (2.10)\n",
59 | "Collecting jmespath<1.0.0,>=0.7.1\n",
60 | " Downloading https://files.pythonhosted.org/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl\n",
61 | "Collecting botocore<1.21.0,>=1.20.96\n",
62 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/63/cb/e9805f42151c5c8fbdd1ffac92deffa947f09dcff148fc9d8a1b8066240d/botocore-1.20.96-py2.py3-none-any.whl (7.6MB)\n",
63 | "\u001b[K |████████████████████████████████| 7.6MB 29.9MB/s \n",
64 | "\u001b[?25hCollecting s3transfer<0.5.0,>=0.4.0\n",
65 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/63/d0/693477c688348654ddc21dcdce0817653a294aa43f41771084c25e7ff9c7/s3transfer-0.4.2-py2.py3-none-any.whl (79kB)\n",
66 | "\u001b[K |████████████████████████████████| 81kB 8.4MB/s \n",
67 | "\u001b[?25hRequirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.7/dist-packages (from botocore<1.21.0,>=1.20.96->boto3->transformers==2.2.0) (2.8.1)\n",
68 | "\u001b[31mERROR: botocore 1.20.96 has requirement urllib3<1.27,>=1.25.4, but you'll have urllib3 1.24.3 which is incompatible.\u001b[0m\n",
69 | "Installing collected packages: sacremoses, sentencepiece, jmespath, botocore, s3transfer, boto3, transformers\n",
70 | "Successfully installed boto3-1.17.96 botocore-1.20.96 jmespath-0.10.0 s3transfer-0.4.2 sacremoses-0.0.45 sentencepiece-0.1.95 transformers-2.2.0\n",
71 | "Collecting spacy==2.0.12\n",
72 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/24/de/ac14cd453c98656d6738a5669f96a4ac7f668493d5e6b78227ac933c5fd4/spacy-2.0.12.tar.gz (22.0MB)\n",
73 | "\u001b[K |████████████████████████████████| 22.0MB 141kB/s \n",
74 | "\u001b[?25hRequirement already satisfied: numpy>=1.7 in /usr/local/lib/python3.7/dist-packages (from spacy==2.0.12) (1.19.5)\n",
75 | "Collecting murmurhash<0.29,>=0.28\n",
76 | " Downloading https://files.pythonhosted.org/packages/5e/31/c8c1ecafa44db30579c8c457ac7a0f819e8b1dbc3e58308394fff5ff9ba7/murmurhash-0.28.0.tar.gz\n",
77 | "Collecting cymem<1.32,>=1.30\n",
78 | " Downloading https://files.pythonhosted.org/packages/f8/9e/273fbea507de99166c11cd0cb3fde1ac01b5bc724d9a407a2f927ede91a1/cymem-1.31.2.tar.gz\n",
79 | "Collecting preshed<2.0.0,>=1.0.0\n",
80 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/f4/0c/64933c18f02fbf46acdd0c7705ec1c1194f58b564bb5a2d140fabcb37bad/preshed-1.0.1-cp37-cp37m-manylinux1_x86_64.whl (79kB)\n",
81 | "\u001b[K |████████████████████████████████| 81kB 8.4MB/s \n",
82 | "\u001b[?25hCollecting thinc<6.11.0,>=6.10.3\n",
83 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/94/b1/47a88072d0a38b3594c0a638a62f9ef7c742b8b8a87f7b105f7ed720b14b/thinc-6.10.3.tar.gz (1.2MB)\n",
84 | "\u001b[K |████████████████████████████████| 1.2MB 43.8MB/s \n",
85 | "\u001b[?25hCollecting plac<1.0.0,>=0.9.6\n",
86 | " Downloading https://files.pythonhosted.org/packages/9e/9b/62c60d2f5bc135d2aa1d8c8a86aaf84edb719a59c7f11a4316259e61a298/plac-0.9.6-py2.py3-none-any.whl\n",
87 | "Collecting ujson>=1.35\n",
88 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/17/4e/50e8e4cf5f00b537095711c2c86ac4d7191aed2b4fffd5a19f06898f6929/ujson-4.0.2-cp37-cp37m-manylinux1_x86_64.whl (179kB)\n",
89 | "\u001b[K |████████████████████████████████| 184kB 41.6MB/s \n",
90 | "\u001b[?25hCollecting dill<0.3,>=0.2\n",
91 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/fe/42/bfe2e0857bc284cbe6a011d93f2a9ad58a22cb894461b199ae72cfef0f29/dill-0.2.9.tar.gz (150kB)\n",
92 | "\u001b[K |████████████████████████████████| 153kB 42.1MB/s \n",
93 | "\u001b[?25hCollecting regex==2017.4.5\n",
94 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/62/c0c0d762ffd4ffaf39f372eb8561b8d491a11ace5a7884610424a8b40f95/regex-2017.04.05.tar.gz (601kB)\n",
95 | "\u001b[K |████████████████████████████████| 604kB 36.2MB/s \n",
96 | "\u001b[?25hRequirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from spacy==2.0.12) (2.23.0)\n",
97 | "Collecting msgpack<1.0.0,>=0.5.6\n",
98 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/25/f8/6009e73f5b08743718d0660a18ecbc44b013a68980347a3835b63e875cdb/msgpack-0.6.2-cp37-cp37m-manylinux1_x86_64.whl (243kB)\n",
99 | "\u001b[K |████████████████████████████████| 245kB 40.0MB/s \n",
100 | "\u001b[?25hCollecting msgpack-numpy<1.0.0,>=0.4.1\n",
101 | " Downloading https://files.pythonhosted.org/packages/19/05/05b8d7c69c6abb36a34325cc3150089bdafc359f0a81fb998d93c5d5c737/msgpack_numpy-0.4.7.1-py2.py3-none-any.whl\n",
102 | "Collecting cytoolz<0.10,>=0.9.0\n",
103 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/f4/9728ba01ccb2f55df9a5af029b48ba0aaca1081bbd7823ea2ee223ba7a42/cytoolz-0.9.0.1.tar.gz (443kB)\n",
104 | "\u001b[K |████████████████████████████████| 450kB 27.4MB/s \n",
105 | "\u001b[?25hCollecting wrapt<1.11.0,>=1.10.0\n",
106 | " Downloading https://files.pythonhosted.org/packages/a0/47/66897906448185fcb77fc3c2b1bc20ed0ecca81a0f2f88eda3fc5a34fc3d/wrapt-1.10.11.tar.gz\n",
107 | "Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in /usr/local/lib/python3.7/dist-packages (from thinc<6.11.0,>=6.10.3->spacy==2.0.12) (4.41.1)\n",
108 | "Requirement already satisfied: six<2.0.0,>=1.10.0 in /usr/local/lib/python3.7/dist-packages (from thinc<6.11.0,>=6.10.3->spacy==2.0.12) (1.15.0)\n",
109 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (2.10)\n",
110 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (1.24.3)\n",
111 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (2021.5.30)\n",
112 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (3.0.4)\n",
113 | "Requirement already satisfied: toolz>=0.8.0 in /usr/local/lib/python3.7/dist-packages (from cytoolz<0.10,>=0.9.0->thinc<6.11.0,>=6.10.3->spacy==2.0.12) (0.11.1)\n",
114 | "Building wheels for collected packages: spacy, murmurhash, cymem, thinc, dill, regex, cytoolz, wrapt\n",
115 | " Building wheel for spacy (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
116 | " Created wheel for spacy: filename=spacy-2.0.12-cp37-cp37m-linux_x86_64.whl size=28971942 sha256=e2da81a32a705fee25a3effd0ecfcd1726efbb8d129947185f460e4c0cd9035c\n",
117 | " Stored in directory: /root/.cache/pip/wheels/60/0b/bb/7c2e28db574dbb2358176934eddd32a1c5f838ba0bc23eaaab\n",
118 | " Building wheel for murmurhash (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
119 | " Created wheel for murmurhash: filename=murmurhash-0.28.0-cp37-cp37m-linux_x86_64.whl size=42762 sha256=1c9e04ef7ac30a7071e7333e2935280522b06803179cc769bae2d0332f1f367e\n",
120 | " Stored in directory: /root/.cache/pip/wheels/b8/94/a4/f69f8664cdc1098603df44771b7fec5fd1b3d8364cdd83f512\n",
121 | " Building wheel for cymem (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
122 | " Created wheel for cymem: filename=cymem-1.31.2-cp37-cp37m-linux_x86_64.whl size=77006 sha256=8d8aff8810df2e4ca37d487643233e08422975c4a171135da257044a59830433\n",
123 | " Stored in directory: /root/.cache/pip/wheels/55/8d/4a/f6328252aa2aaec0b1cb906fd96a1566d77f0f67701071ad13\n",
124 | " Building wheel for thinc (setup.py) ... \u001b[?25lerror\n",
125 | "\u001b[31m ERROR: Failed building wheel for thinc\u001b[0m\n",
126 | "\u001b[?25h Running setup.py clean for thinc\n",
127 | " Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
128 | " Created wheel for dill: filename=dill-0.2.9-cp37-none-any.whl size=77421 sha256=4087ab6e3f86fae97c2c354fccff7ec6a72267336c176d6cd9675d3c36bb6e69\n",
129 | " Stored in directory: /root/.cache/pip/wheels/5b/d7/0f/e58eae695403de585269f4e4a94e0cd6ca60ec0c202936fa4a\n",
130 | " Building wheel for regex (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
131 | " Created wheel for regex: filename=regex-2017.4.5-cp37-cp37m-linux_x86_64.whl size=534417 sha256=6c408e4f534f9e2c59bfc2a112b69e0add2ee7403b7b346c6a6268dc43fdf003\n",
132 | " Stored in directory: /root/.cache/pip/wheels/75/07/38/3c16b529d50cb4e0cd3dbc7b75cece8a09c132692c74450b01\n",
133 | " Building wheel for cytoolz (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
134 | " Created wheel for cytoolz: filename=cytoolz-0.9.0.1-cp37-cp37m-linux_x86_64.whl size=1239554 sha256=9f30971dbbc56066fc1c1e71411fd6c2a943661902773ff0951d4d460a36e168\n",
135 | " Stored in directory: /root/.cache/pip/wheels/88/f3/11/9817b001e59ab04889e8cffcbd9087e2e2155b9ebecfc8dd38\n",
136 | " Building wheel for wrapt (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
137 | " Created wheel for wrapt: filename=wrapt-1.10.11-cp37-cp37m-linux_x86_64.whl size=66038 sha256=20b8ab828cf90846b00bc403ce7f459a4a8e30a0a43608ba8d197d57e8091752\n",
138 | " Stored in directory: /root/.cache/pip/wheels/48/5d/04/22361a593e70d23b1f7746d932802efe1f0e523376a74f321e\n",
139 | "Successfully built spacy murmurhash cymem dill regex cytoolz wrapt\n",
140 | "Failed to build thinc\n",
141 | "\u001b[31mERROR: tensorflow 2.5.0 has requirement wrapt~=1.12.1, but you'll have wrapt 1.10.11 which is incompatible.\u001b[0m\n",
142 | "\u001b[31mERROR: multiprocess 0.70.11.1 has requirement dill>=0.3.3, but you'll have dill 0.2.9 which is incompatible.\u001b[0m\n",
143 | "\u001b[31mERROR: fastai 1.0.61 has requirement spacy>=2.0.18; python_version < \"3.8\", but you'll have spacy 2.0.12 which is incompatible.\u001b[0m\n",
144 | "\u001b[31mERROR: en-core-web-sm 2.2.5 has requirement spacy>=2.2.2, but you'll have spacy 2.0.12 which is incompatible.\u001b[0m\n",
145 | "Installing collected packages: murmurhash, cymem, preshed, msgpack, msgpack-numpy, cytoolz, wrapt, plac, dill, thinc, ujson, regex, spacy\n",
146 | " Found existing installation: murmurhash 1.0.5\n",
147 | " Uninstalling murmurhash-1.0.5:\n",
148 | " Successfully uninstalled murmurhash-1.0.5\n",
149 | " Found existing installation: cymem 2.0.5\n",
150 | " Uninstalling cymem-2.0.5:\n",
151 | " Successfully uninstalled cymem-2.0.5\n",
152 | " Found existing installation: preshed 3.0.5\n",
153 | " Uninstalling preshed-3.0.5:\n",
154 | " Successfully uninstalled preshed-3.0.5\n",
155 | " Found existing installation: msgpack 1.0.2\n",
156 | " Uninstalling msgpack-1.0.2:\n",
157 | " Successfully uninstalled msgpack-1.0.2\n",
158 | " Found existing installation: wrapt 1.12.1\n",
159 | " Uninstalling wrapt-1.12.1:\n",
160 | " Successfully uninstalled wrapt-1.12.1\n",
161 | " Found existing installation: plac 1.1.3\n",
162 | " Uninstalling plac-1.1.3:\n",
163 | " Successfully uninstalled plac-1.1.3\n",
164 | " Found existing installation: dill 0.3.3\n",
165 | " Uninstalling dill-0.3.3:\n",
166 | " Successfully uninstalled dill-0.3.3\n",
167 | " Found existing installation: thinc 7.4.0\n",
168 | " Uninstalling thinc-7.4.0:\n",
169 | " Successfully uninstalled thinc-7.4.0\n",
170 | " Running setup.py install for thinc ... \u001b[?25l\u001b[?25herror\n",
171 | " Rolling back uninstall of thinc\n",
172 | " Moving to /usr/local/lib/python3.7/dist-packages/thinc-7.4.0.dist-info/\n",
173 | " from /usr/local/lib/python3.7/dist-packages/~hinc-7.4.0.dist-info\n",
174 | " Moving to /usr/local/lib/python3.7/dist-packages/thinc/\n",
175 | " from /usr/local/lib/python3.7/dist-packages/~hinc\n",
176 | "\u001b[31mERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '\"'\"'/tmp/pip-install-ihuvde1u/thinc/setup.py'\"'\"'; __file__='\"'\"'/tmp/pip-install-ihuvde1u/thinc/setup.py'\"'\"';f=getattr(tokenize, '\"'\"'open'\"'\"', open)(__file__);code=f.read().replace('\"'\"'\\r\\n'\"'\"', '\"'\"'\\n'\"'\"');f.close();exec(compile(code, __file__, '\"'\"'exec'\"'\"'))' install --record /tmp/pip-record-mitzeiqu/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.\u001b[0m\n",
177 | "Collecting bert-extractive-summarizer\n",
178 | " Downloading https://files.pythonhosted.org/packages/1a/07/fdb05f9e18b6f641499ef56737126fbd2fafe1cdc1a04ba069d5aa205901/bert_extractive_summarizer-0.7.1-py3-none-any.whl\n",
179 | "Requirement already satisfied: spacy in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (2.2.4)\n",
180 | "Requirement already satisfied: transformers in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (2.2.0)\n",
181 | "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (0.22.2.post1)\n",
182 | "Collecting cymem<2.1.0,>=2.0.2\n",
183 | " Downloading https://files.pythonhosted.org/packages/b6/34/40547e057c1b31080c1d78f6accf9f1ed6ee46e3fc7ebd8599197915ef89/cymem-2.0.5-cp37-cp37m-manylinux2014_x86_64.whl\n",
184 | "Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.19.5)\n",
185 | "Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.9.6)\n",
186 | "Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.8.2)\n",
187 | "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (2.23.0)\n",
188 | "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (4.41.1)\n",
189 | "Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.0.5)\n",
190 | "Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.4.1)\n",
191 | "Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (7.4.0)\n",
192 | "Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.0.0)\n",
193 | "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.28.0)\n",
194 | "Collecting preshed<3.1.0,>=3.0.2\n",
195 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/6c/91/1cf0f7f0a6720f93632fc8ec42d54233e8e142640ac3fcf0fecaa8dc4648/preshed-3.0.5-cp37-cp37m-manylinux2014_x86_64.whl (126kB)\n",
196 | "\u001b[K |████████████████████████████████| 133kB 4.8MB/s \n",
197 | "\u001b[?25hRequirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (57.0.0)\n",
198 | "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (2019.12.20)\n",
199 | "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (0.1.95)\n",
200 | "Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (0.0.45)\n",
201 | "Requirement already satisfied: boto3 in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (1.17.96)\n",
202 | "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->bert-extractive-summarizer) (1.0.1)\n",
203 | "Requirement already satisfied: scipy>=0.17.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->bert-extractive-summarizer) (1.4.1)\n",
204 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (3.0.4)\n",
205 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (2021.5.30)\n",
206 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (1.24.3)\n",
207 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (2.10)\n",
208 | "Requirement already satisfied: importlib-metadata>=0.20; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (4.5.0)\n",
209 | "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers->bert-extractive-summarizer) (7.1.2)\n",
210 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers->bert-extractive-summarizer) (1.15.0)\n",
211 | "Requirement already satisfied: botocore<1.21.0,>=1.20.96 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (1.20.96)\n",
212 | "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (0.10.0)\n",
213 | "Requirement already satisfied: s3transfer<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (0.4.2)\n",
214 | "Requirement already satisfied: typing-extensions>=3.6.4; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.20; python_version < \"3.8\"->catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (3.7.4.3)\n",
215 | "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.20; python_version < \"3.8\"->catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (3.4.1)\n",
216 | "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.7/dist-packages (from botocore<1.21.0,>=1.20.96->boto3->transformers->bert-extractive-summarizer) (2.8.1)\n",
217 | "Installing collected packages: bert-extractive-summarizer, cymem, preshed\n",
218 | " Found existing installation: cymem 1.31.2\n",
219 | " Uninstalling cymem-1.31.2:\n",
220 | " Successfully uninstalled cymem-1.31.2\n",
221 | " Found existing installation: preshed 1.0.1\n",
222 | " Uninstalling preshed-1.0.1:\n",
223 | " Successfully uninstalled preshed-1.0.1\n",
224 | "Successfully installed bert-extractive-summarizer-0.7.1 cymem-2.0.5 preshed-3.0.5\n"
225 | ],
226 | "name": "stdout"
227 | }
228 | ]
229 | },
230 | {
231 | "cell_type": "code",
232 | "metadata": {
233 | "id": "gyKfZQp4sSsV"
234 | },
235 | "source": [
236 | "from summarizer import Summarizer,TransformerSummarizer"
237 | ],
238 | "execution_count": 11,
239 | "outputs": []
240 | },
241 | {
242 | "cell_type": "code",
243 | "metadata": {
244 | "id": "0XoGpEAgsaC9"
245 | },
246 | "source": [
247 | "sequence = (\"In May, Churchill was still generally unpopular with many Conservatives and probably most of the Labour Party. Chamberlain \"\n",
248 | " \"remained Conservative Party leader until October when ill health forced his resignation. By that time, Churchill had won the \"\n",
249 | " \"doubters over and his succession as party leader was a formality.\"\n",
250 | " \" \"\n",
251 | " \"He began his premiership by forming a five-man war cabinet which included Chamberlain as Lord President of the Council, \"\n",
252 | " \"Labour leader Clement Attlee as Lord Privy Seal (later as Deputy Prime Minister), Halifax as Foreign Secretary and Labour's \"\n",
253 | " \"Arthur Greenwood as a minister without portfolio. In practice, these five were augmented by the service chiefs and ministers \"\n",
254 | " \"who attended the majority of meetings. The cabinet changed in size and membership as the war progressed, one of the key \"\n",
255 | " \"appointments being the leading trades unionist Ernest Bevin as Minister of Labour and National Service. In response to \"\n",
256 | " \"previous criticisms that there had been no clear single minister in charge of the prosecution of the war, Churchill created \"\n",
257 | " \"and took the additional position of Minister of Defence, making him the most powerful wartime Prime Minister in British \"\n",
258 | " \"history. He drafted outside experts into government to fulfil vital functions, especially on the Home Front. These included \"\n",
259 | " \"personal friends like Lord Beaverbrook and Frederick Lindemann, who became the government's scientific advisor.\"\n",
260 | " \" \"\n",
261 | " \"At the end of May, with the British Expeditionary Force in retreat to Dunkirk and the Fall of France seemingly imminent, \"\n",
262 | " \"Halifax proposed that the government should explore the possibility of a negotiated peace settlement using the still-neutral \"\n",
263 | " \"Mussolini as an intermediary. There were several high-level meetings from 26 to 28 May, including two with the French \"\n",
264 | " \"premier Paul Reynaud. Churchill's resolve was to fight on, even if France capitulated, but his position remained precarious \"\n",
265 | " \"until Chamberlain resolved to support him. Churchill had the full support of the two Labour members but knew he could not \"\n",
266 | " \"survive as Prime Minister if both Chamberlain and Halifax were against him. In the end, by gaining the support of his outer \"\n",
267 | " \"cabinet, Churchill outmanoeuvred Halifax and won Chamberlain over. Churchill believed that the only option was to fight on \"\n",
268 | " \"and his use of rhetoric hardened public opinion against a peaceful resolution and prepared the British people for a long war \"\n",
269 | " \"– Jenkins says Churchill's speeches were 'an inspiration for the nation, and a catharsis for Churchill himself'.\"\n",
270 | " \" \"\n",
271 | " \"His first speech as Prime Minister, delivered to the Commons on 13 May was the 'blood, toil, tears and sweat' speech. It was \"\n",
272 | " \"little more than a short statement but, Jenkins says, 'it included phrases which have reverberated down the decades'.\")\n"
273 | ],
274 | "execution_count": 12,
275 | "outputs": []
276 | },
277 | {
278 | "cell_type": "code",
279 | "metadata": {
280 | "colab": {
281 | "base_uri": "https://localhost:8080/"
282 | },
283 | "id": "kGF4q99Qsoxw",
284 | "outputId": "77bf9d50-8f03-4e8a-be1d-9e0f54e24095"
285 | },
286 | "source": [
287 | "model = TransformerSummarizer(transformer_type=\"GPT2\",transformer_model_key=\"gpt2-medium\")"
288 | ],
289 | "execution_count": 14,
290 | "outputs": [
291 | {
292 | "output_type": "stream",
293 | "text": [
294 | "100%|██████████| 718/718 [00:00<00:00, 135470.55B/s]\n",
295 | "100%|██████████| 1520013706/1520013706 [00:37<00:00, 40543952.72B/s]\n",
296 | "100%|██████████| 1042301/1042301 [00:00<00:00, 8508136.47B/s]\n",
297 | "100%|██████████| 456318/456318 [00:00<00:00, 4319169.75B/s]\n"
298 | ],
299 | "name": "stderr"
300 | }
301 | ]
302 | },
303 | {
304 | "cell_type": "code",
305 | "metadata": {
306 | "id": "YKTjMytusvZ1"
307 | },
308 | "source": [
309 | "summary = ''.join(model(sequence, min_length=60))"
310 | ],
311 | "execution_count": 15,
312 | "outputs": []
313 | },
314 | {
315 | "cell_type": "code",
316 | "metadata": {
317 | "colab": {
318 | "base_uri": "https://localhost:8080/",
319 | "height": 136
320 | },
321 | "id": "NL3NEce4s-hp",
322 | "outputId": "06373a84-5135-4ec4-adf9-fd8790fbc963"
323 | },
324 | "source": [
325 | "summary"
326 | ],
327 | "execution_count": 16,
328 | "outputs": [
329 | {
330 | "output_type": "execute_result",
331 | "data": {
332 | "application/vnd.google.colaboratory.intrinsic+json": {
333 | "type": "string"
334 | },
335 | "text/plain": [
336 | "'In May, Churchill was still generally unpopular with many Conservatives and probably most of the Labour Party. By that time, Churchill had won the doubters over and his succession as party leader was a formality. The cabinet changed in size and membership as the war progressed, one of the key appointments being the leading trades unionist Ernest Bevin as Minister of Labour and National Service. He drafted outside experts into government to fulfil vital functions, especially on the Home Front.'"
337 | ]
338 | },
339 | "metadata": {
340 | "tags": []
341 | },
342 | "execution_count": 16
343 | }
344 | ]
345 | },
346 | {
347 | "cell_type": "code",
348 | "metadata": {
349 | "id": "Sxlglqyvug11"
350 | },
351 | "source": [
352 | ""
353 | ],
354 | "execution_count": null,
355 | "outputs": []
356 | }
357 | ]
358 | }
--------------------------------------------------------------------------------
/src/extractive/README.md:
--------------------------------------------------------------------------------
1 | # Text-Summarization (BERT, GPT2, XLNet)
2 | Jupyter Notebooks containing use cases of pretrained models of BERT, GPT2, XLNet Transformers for Extractive Text Summarization.
3 |
4 | XLNet has a better output by a reasonable margin compared to other 2.
--------------------------------------------------------------------------------
/src/extractive/XLNet_Extractive_Text_Summarization.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "name": "XLNet-Extractive-Text-Summarization.ipynb",
7 | "provenance": []
8 | },
9 | "kernelspec": {
10 | "name": "python3",
11 | "display_name": "Python 3"
12 | },
13 | "language_info": {
14 | "name": "python"
15 | }
16 | },
17 | "cells": [
18 | {
19 | "cell_type": "code",
20 | "metadata": {
21 | "colab": {
22 | "base_uri": "https://localhost:8080/"
23 | },
24 | "id": "7kOx0267b-ve",
25 | "outputId": "b6780494-63de-46e5-c62f-7a42616732b7"
26 | },
27 | "source": [
28 | "! pip install transformers==2.2.0\n",
29 | "! pip install spacy==2.0.12"
30 | ],
31 | "execution_count": 1,
32 | "outputs": [
33 | {
34 | "output_type": "stream",
35 | "text": [
36 | "Collecting transformers==2.2.0\n",
37 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/ec/e7/0a1babead1b79afabb654fbec0a052e0d833ba4205a6dfd98b1aeda9c82e/transformers-2.2.0-py3-none-any.whl (360kB)\n",
38 | "\u001b[K |████████████████████████████████| 368kB 3.9MB/s \n",
39 | "\u001b[?25hRequirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (4.41.1)\n",
40 | "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (2019.12.20)\n",
41 | "Collecting sacremoses\n",
42 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/75/ee/67241dc87f266093c533a2d4d3d69438e57d7a90abb216fa076e7d475d4a/sacremoses-0.0.45-py3-none-any.whl (895kB)\n",
43 | "\u001b[K |████████████████████████████████| 901kB 20.7MB/s \n",
44 | "\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (2.23.0)\n",
45 | "Collecting sentencepiece\n",
46 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/f5/99/e0808cb947ba10f575839c43e8fafc9cc44e4a7a2c8f79c60db48220a577/sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2MB)\n",
47 | "\u001b[K |████████████████████████████████| 1.2MB 24.2MB/s \n",
48 | "\u001b[?25hCollecting boto3\n",
49 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/73/ad/62cdfb117f875258b5925d6dbe987bea500e91f2e7ec343a42556167174c/boto3-1.17.96-py2.py3-none-any.whl (131kB)\n",
50 | "\u001b[K |████████████████████████████████| 133kB 34.9MB/s \n",
51 | "\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from transformers==2.2.0) (1.19.5)\n",
52 | "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (7.1.2)\n",
53 | "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (1.0.1)\n",
54 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers==2.2.0) (1.15.0)\n",
55 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (3.0.4)\n",
56 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (2021.5.30)\n",
57 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (1.24.3)\n",
58 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers==2.2.0) (2.10)\n",
59 | "Collecting jmespath<1.0.0,>=0.7.1\n",
60 | " Downloading https://files.pythonhosted.org/packages/07/cb/5f001272b6faeb23c1c9e0acc04d48eaaf5c862c17709d20e3469c6e0139/jmespath-0.10.0-py2.py3-none-any.whl\n",
61 | "Collecting botocore<1.21.0,>=1.20.96\n",
62 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/63/cb/e9805f42151c5c8fbdd1ffac92deffa947f09dcff148fc9d8a1b8066240d/botocore-1.20.96-py2.py3-none-any.whl (7.6MB)\n",
63 | "\u001b[K |████████████████████████████████| 7.6MB 29.9MB/s \n",
64 | "\u001b[?25hCollecting s3transfer<0.5.0,>=0.4.0\n",
65 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/63/d0/693477c688348654ddc21dcdce0817653a294aa43f41771084c25e7ff9c7/s3transfer-0.4.2-py2.py3-none-any.whl (79kB)\n",
66 | "\u001b[K |████████████████████████████████| 81kB 8.4MB/s \n",
67 | "\u001b[?25hRequirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.7/dist-packages (from botocore<1.21.0,>=1.20.96->boto3->transformers==2.2.0) (2.8.1)\n",
68 | "\u001b[31mERROR: botocore 1.20.96 has requirement urllib3<1.27,>=1.25.4, but you'll have urllib3 1.24.3 which is incompatible.\u001b[0m\n",
69 | "Installing collected packages: sacremoses, sentencepiece, jmespath, botocore, s3transfer, boto3, transformers\n",
70 | "Successfully installed boto3-1.17.96 botocore-1.20.96 jmespath-0.10.0 s3transfer-0.4.2 sacremoses-0.0.45 sentencepiece-0.1.95 transformers-2.2.0\n",
71 | "Collecting spacy==2.0.12\n",
72 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/24/de/ac14cd453c98656d6738a5669f96a4ac7f668493d5e6b78227ac933c5fd4/spacy-2.0.12.tar.gz (22.0MB)\n",
73 | "\u001b[K |████████████████████████████████| 22.0MB 141kB/s \n",
74 | "\u001b[?25hRequirement already satisfied: numpy>=1.7 in /usr/local/lib/python3.7/dist-packages (from spacy==2.0.12) (1.19.5)\n",
75 | "Collecting murmurhash<0.29,>=0.28\n",
76 | " Downloading https://files.pythonhosted.org/packages/5e/31/c8c1ecafa44db30579c8c457ac7a0f819e8b1dbc3e58308394fff5ff9ba7/murmurhash-0.28.0.tar.gz\n",
77 | "Collecting cymem<1.32,>=1.30\n",
78 | " Downloading https://files.pythonhosted.org/packages/f8/9e/273fbea507de99166c11cd0cb3fde1ac01b5bc724d9a407a2f927ede91a1/cymem-1.31.2.tar.gz\n",
79 | "Collecting preshed<2.0.0,>=1.0.0\n",
80 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/f4/0c/64933c18f02fbf46acdd0c7705ec1c1194f58b564bb5a2d140fabcb37bad/preshed-1.0.1-cp37-cp37m-manylinux1_x86_64.whl (79kB)\n",
81 | "\u001b[K |████████████████████████████████| 81kB 8.4MB/s \n",
82 | "\u001b[?25hCollecting thinc<6.11.0,>=6.10.3\n",
83 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/94/b1/47a88072d0a38b3594c0a638a62f9ef7c742b8b8a87f7b105f7ed720b14b/thinc-6.10.3.tar.gz (1.2MB)\n",
84 | "\u001b[K |████████████████████████████████| 1.2MB 43.8MB/s \n",
85 | "\u001b[?25hCollecting plac<1.0.0,>=0.9.6\n",
86 | " Downloading https://files.pythonhosted.org/packages/9e/9b/62c60d2f5bc135d2aa1d8c8a86aaf84edb719a59c7f11a4316259e61a298/plac-0.9.6-py2.py3-none-any.whl\n",
87 | "Collecting ujson>=1.35\n",
88 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/17/4e/50e8e4cf5f00b537095711c2c86ac4d7191aed2b4fffd5a19f06898f6929/ujson-4.0.2-cp37-cp37m-manylinux1_x86_64.whl (179kB)\n",
89 | "\u001b[K |████████████████████████████████| 184kB 41.6MB/s \n",
90 | "\u001b[?25hCollecting dill<0.3,>=0.2\n",
91 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/fe/42/bfe2e0857bc284cbe6a011d93f2a9ad58a22cb894461b199ae72cfef0f29/dill-0.2.9.tar.gz (150kB)\n",
92 | "\u001b[K |████████████████████████████████| 153kB 42.1MB/s \n",
93 | "\u001b[?25hCollecting regex==2017.4.5\n",
94 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/62/c0c0d762ffd4ffaf39f372eb8561b8d491a11ace5a7884610424a8b40f95/regex-2017.04.05.tar.gz (601kB)\n",
95 | "\u001b[K |████████████████████████████████| 604kB 36.2MB/s \n",
96 | "\u001b[?25hRequirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from spacy==2.0.12) (2.23.0)\n",
97 | "Collecting msgpack<1.0.0,>=0.5.6\n",
98 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/25/f8/6009e73f5b08743718d0660a18ecbc44b013a68980347a3835b63e875cdb/msgpack-0.6.2-cp37-cp37m-manylinux1_x86_64.whl (243kB)\n",
99 | "\u001b[K |████████████████████████████████| 245kB 40.0MB/s \n",
100 | "\u001b[?25hCollecting msgpack-numpy<1.0.0,>=0.4.1\n",
101 | " Downloading https://files.pythonhosted.org/packages/19/05/05b8d7c69c6abb36a34325cc3150089bdafc359f0a81fb998d93c5d5c737/msgpack_numpy-0.4.7.1-py2.py3-none-any.whl\n",
102 | "Collecting cytoolz<0.10,>=0.9.0\n",
103 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/36/f4/9728ba01ccb2f55df9a5af029b48ba0aaca1081bbd7823ea2ee223ba7a42/cytoolz-0.9.0.1.tar.gz (443kB)\n",
104 | "\u001b[K |████████████████████████████████| 450kB 27.4MB/s \n",
105 | "\u001b[?25hCollecting wrapt<1.11.0,>=1.10.0\n",
106 | " Downloading https://files.pythonhosted.org/packages/a0/47/66897906448185fcb77fc3c2b1bc20ed0ecca81a0f2f88eda3fc5a34fc3d/wrapt-1.10.11.tar.gz\n",
107 | "Requirement already satisfied: tqdm<5.0.0,>=4.10.0 in /usr/local/lib/python3.7/dist-packages (from thinc<6.11.0,>=6.10.3->spacy==2.0.12) (4.41.1)\n",
108 | "Requirement already satisfied: six<2.0.0,>=1.10.0 in /usr/local/lib/python3.7/dist-packages (from thinc<6.11.0,>=6.10.3->spacy==2.0.12) (1.15.0)\n",
109 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (2.10)\n",
110 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (1.24.3)\n",
111 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (2021.5.30)\n",
112 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy==2.0.12) (3.0.4)\n",
113 | "Requirement already satisfied: toolz>=0.8.0 in /usr/local/lib/python3.7/dist-packages (from cytoolz<0.10,>=0.9.0->thinc<6.11.0,>=6.10.3->spacy==2.0.12) (0.11.1)\n",
114 | "Building wheels for collected packages: spacy, murmurhash, cymem, thinc, dill, regex, cytoolz, wrapt\n",
115 | " Building wheel for spacy (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
116 | " Created wheel for spacy: filename=spacy-2.0.12-cp37-cp37m-linux_x86_64.whl size=28971942 sha256=e2da81a32a705fee25a3effd0ecfcd1726efbb8d129947185f460e4c0cd9035c\n",
117 | " Stored in directory: /root/.cache/pip/wheels/60/0b/bb/7c2e28db574dbb2358176934eddd32a1c5f838ba0bc23eaaab\n",
118 | " Building wheel for murmurhash (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
119 | " Created wheel for murmurhash: filename=murmurhash-0.28.0-cp37-cp37m-linux_x86_64.whl size=42762 sha256=1c9e04ef7ac30a7071e7333e2935280522b06803179cc769bae2d0332f1f367e\n",
120 | " Stored in directory: /root/.cache/pip/wheels/b8/94/a4/f69f8664cdc1098603df44771b7fec5fd1b3d8364cdd83f512\n",
121 | " Building wheel for cymem (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
122 | " Created wheel for cymem: filename=cymem-1.31.2-cp37-cp37m-linux_x86_64.whl size=77006 sha256=8d8aff8810df2e4ca37d487643233e08422975c4a171135da257044a59830433\n",
123 | " Stored in directory: /root/.cache/pip/wheels/55/8d/4a/f6328252aa2aaec0b1cb906fd96a1566d77f0f67701071ad13\n",
124 | " Building wheel for thinc (setup.py) ... \u001b[?25lerror\n",
125 | "\u001b[31m ERROR: Failed building wheel for thinc\u001b[0m\n",
126 | "\u001b[?25h Running setup.py clean for thinc\n",
127 | " Building wheel for dill (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
128 | " Created wheel for dill: filename=dill-0.2.9-cp37-none-any.whl size=77421 sha256=4087ab6e3f86fae97c2c354fccff7ec6a72267336c176d6cd9675d3c36bb6e69\n",
129 | " Stored in directory: /root/.cache/pip/wheels/5b/d7/0f/e58eae695403de585269f4e4a94e0cd6ca60ec0c202936fa4a\n",
130 | " Building wheel for regex (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
131 | " Created wheel for regex: filename=regex-2017.4.5-cp37-cp37m-linux_x86_64.whl size=534417 sha256=6c408e4f534f9e2c59bfc2a112b69e0add2ee7403b7b346c6a6268dc43fdf003\n",
132 | " Stored in directory: /root/.cache/pip/wheels/75/07/38/3c16b529d50cb4e0cd3dbc7b75cece8a09c132692c74450b01\n",
133 | " Building wheel for cytoolz (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
134 | " Created wheel for cytoolz: filename=cytoolz-0.9.0.1-cp37-cp37m-linux_x86_64.whl size=1239554 sha256=9f30971dbbc56066fc1c1e71411fd6c2a943661902773ff0951d4d460a36e168\n",
135 | " Stored in directory: /root/.cache/pip/wheels/88/f3/11/9817b001e59ab04889e8cffcbd9087e2e2155b9ebecfc8dd38\n",
136 | " Building wheel for wrapt (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
137 | " Created wheel for wrapt: filename=wrapt-1.10.11-cp37-cp37m-linux_x86_64.whl size=66038 sha256=20b8ab828cf90846b00bc403ce7f459a4a8e30a0a43608ba8d197d57e8091752\n",
138 | " Stored in directory: /root/.cache/pip/wheels/48/5d/04/22361a593e70d23b1f7746d932802efe1f0e523376a74f321e\n",
139 | "Successfully built spacy murmurhash cymem dill regex cytoolz wrapt\n",
140 | "Failed to build thinc\n",
141 | "\u001b[31mERROR: tensorflow 2.5.0 has requirement wrapt~=1.12.1, but you'll have wrapt 1.10.11 which is incompatible.\u001b[0m\n",
142 | "\u001b[31mERROR: multiprocess 0.70.11.1 has requirement dill>=0.3.3, but you'll have dill 0.2.9 which is incompatible.\u001b[0m\n",
143 | "\u001b[31mERROR: fastai 1.0.61 has requirement spacy>=2.0.18; python_version < \"3.8\", but you'll have spacy 2.0.12 which is incompatible.\u001b[0m\n",
144 | "\u001b[31mERROR: en-core-web-sm 2.2.5 has requirement spacy>=2.2.2, but you'll have spacy 2.0.12 which is incompatible.\u001b[0m\n",
145 | "Installing collected packages: murmurhash, cymem, preshed, msgpack, msgpack-numpy, cytoolz, wrapt, plac, dill, thinc, ujson, regex, spacy\n",
146 | " Found existing installation: murmurhash 1.0.5\n",
147 | " Uninstalling murmurhash-1.0.5:\n",
148 | " Successfully uninstalled murmurhash-1.0.5\n",
149 | " Found existing installation: cymem 2.0.5\n",
150 | " Uninstalling cymem-2.0.5:\n",
151 | " Successfully uninstalled cymem-2.0.5\n",
152 | " Found existing installation: preshed 3.0.5\n",
153 | " Uninstalling preshed-3.0.5:\n",
154 | " Successfully uninstalled preshed-3.0.5\n",
155 | " Found existing installation: msgpack 1.0.2\n",
156 | " Uninstalling msgpack-1.0.2:\n",
157 | " Successfully uninstalled msgpack-1.0.2\n",
158 | " Found existing installation: wrapt 1.12.1\n",
159 | " Uninstalling wrapt-1.12.1:\n",
160 | " Successfully uninstalled wrapt-1.12.1\n",
161 | " Found existing installation: plac 1.1.3\n",
162 | " Uninstalling plac-1.1.3:\n",
163 | " Successfully uninstalled plac-1.1.3\n",
164 | " Found existing installation: dill 0.3.3\n",
165 | " Uninstalling dill-0.3.3:\n",
166 | " Successfully uninstalled dill-0.3.3\n",
167 | " Found existing installation: thinc 7.4.0\n",
168 | " Uninstalling thinc-7.4.0:\n",
169 | " Successfully uninstalled thinc-7.4.0\n",
170 | " Running setup.py install for thinc ... \u001b[?25l\u001b[?25herror\n",
171 | " Rolling back uninstall of thinc\n",
172 | " Moving to /usr/local/lib/python3.7/dist-packages/thinc-7.4.0.dist-info/\n",
173 | " from /usr/local/lib/python3.7/dist-packages/~hinc-7.4.0.dist-info\n",
174 | " Moving to /usr/local/lib/python3.7/dist-packages/thinc/\n",
175 | " from /usr/local/lib/python3.7/dist-packages/~hinc\n",
176 | "\u001b[31mERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '\"'\"'/tmp/pip-install-ihuvde1u/thinc/setup.py'\"'\"'; __file__='\"'\"'/tmp/pip-install-ihuvde1u/thinc/setup.py'\"'\"';f=getattr(tokenize, '\"'\"'open'\"'\"', open)(__file__);code=f.read().replace('\"'\"'\\r\\n'\"'\"', '\"'\"'\\n'\"'\"');f.close();exec(compile(code, __file__, '\"'\"'exec'\"'\"'))' install --record /tmp/pip-record-mitzeiqu/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.\u001b[0m\n",
177 | "Collecting bert-extractive-summarizer\n",
178 | " Downloading https://files.pythonhosted.org/packages/1a/07/fdb05f9e18b6f641499ef56737126fbd2fafe1cdc1a04ba069d5aa205901/bert_extractive_summarizer-0.7.1-py3-none-any.whl\n",
179 | "Requirement already satisfied: spacy in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (2.2.4)\n",
180 | "Requirement already satisfied: transformers in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (2.2.0)\n",
181 | "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from bert-extractive-summarizer) (0.22.2.post1)\n",
182 | "Collecting cymem<2.1.0,>=2.0.2\n",
183 | " Downloading https://files.pythonhosted.org/packages/b6/34/40547e057c1b31080c1d78f6accf9f1ed6ee46e3fc7ebd8599197915ef89/cymem-2.0.5-cp37-cp37m-manylinux2014_x86_64.whl\n",
184 | "Requirement already satisfied: numpy>=1.15.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.19.5)\n",
185 | "Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.9.6)\n",
186 | "Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.8.2)\n",
187 | "Requirement already satisfied: requests<3.0.0,>=2.13.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (2.23.0)\n",
188 | "Requirement already satisfied: tqdm<5.0.0,>=4.38.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (4.41.1)\n",
189 | "Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.0.5)\n",
190 | "Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.4.1)\n",
191 | "Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (7.4.0)\n",
192 | "Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (1.0.0)\n",
193 | "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (0.28.0)\n",
194 | "Collecting preshed<3.1.0,>=3.0.2\n",
195 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/6c/91/1cf0f7f0a6720f93632fc8ec42d54233e8e142640ac3fcf0fecaa8dc4648/preshed-3.0.5-cp37-cp37m-manylinux2014_x86_64.whl (126kB)\n",
196 | "\u001b[K |████████████████████████████████| 133kB 4.8MB/s \n",
197 | "\u001b[?25hRequirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from spacy->bert-extractive-summarizer) (57.0.0)\n",
198 | "Requirement already satisfied: regex in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (2019.12.20)\n",
199 | "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (0.1.95)\n",
200 | "Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (0.0.45)\n",
201 | "Requirement already satisfied: boto3 in /usr/local/lib/python3.7/dist-packages (from transformers->bert-extractive-summarizer) (1.17.96)\n",
202 | "Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->bert-extractive-summarizer) (1.0.1)\n",
203 | "Requirement already satisfied: scipy>=0.17.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->bert-extractive-summarizer) (1.4.1)\n",
204 | "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (3.0.4)\n",
205 | "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (2021.5.30)\n",
206 | "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (1.24.3)\n",
207 | "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0,>=2.13.0->spacy->bert-extractive-summarizer) (2.10)\n",
208 | "Requirement already satisfied: importlib-metadata>=0.20; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (4.5.0)\n",
209 | "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers->bert-extractive-summarizer) (7.1.2)\n",
210 | "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers->bert-extractive-summarizer) (1.15.0)\n",
211 | "Requirement already satisfied: botocore<1.21.0,>=1.20.96 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (1.20.96)\n",
212 | "Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (0.10.0)\n",
213 | "Requirement already satisfied: s3transfer<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from boto3->transformers->bert-extractive-summarizer) (0.4.2)\n",
214 | "Requirement already satisfied: typing-extensions>=3.6.4; python_version < \"3.8\" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.20; python_version < \"3.8\"->catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (3.7.4.3)\n",
215 | "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata>=0.20; python_version < \"3.8\"->catalogue<1.1.0,>=0.0.7->spacy->bert-extractive-summarizer) (3.4.1)\n",
216 | "Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.7/dist-packages (from botocore<1.21.0,>=1.20.96->boto3->transformers->bert-extractive-summarizer) (2.8.1)\n",
217 | "Installing collected packages: bert-extractive-summarizer, cymem, preshed\n",
218 | " Found existing installation: cymem 1.31.2\n",
219 | " Uninstalling cymem-1.31.2:\n",
220 | " Successfully uninstalled cymem-1.31.2\n",
221 | " Found existing installation: preshed 1.0.1\n",
222 | " Uninstalling preshed-1.0.1:\n",
223 | " Successfully uninstalled preshed-1.0.1\n",
224 | "Successfully installed bert-extractive-summarizer-0.7.1 cymem-2.0.5 preshed-3.0.5\n"
225 | ],
226 | "name": "stdout"
227 | }
228 | ]
229 | },
230 | {
231 | "cell_type": "code",
232 | "metadata": {
233 | "id": "gyKfZQp4sSsV"
234 | },
235 | "source": [
236 | "from summarizer import Summarizer,TransformerSummarizer"
237 | ],
238 | "execution_count": 11,
239 | "outputs": []
240 | },
241 | {
242 | "cell_type": "code",
243 | "metadata": {
244 | "id": "0XoGpEAgsaC9"
245 | },
246 | "source": [
247 | "sequence = (\"In May, Churchill was still generally unpopular with many Conservatives and probably most of the Labour Party. Chamberlain \"\n",
248 | " \"remained Conservative Party leader until October when ill health forced his resignation. By that time, Churchill had won the \"\n",
249 | " \"doubters over and his succession as party leader was a formality.\"\n",
250 | " \" \"\n",
251 | " \"He began his premiership by forming a five-man war cabinet which included Chamberlain as Lord President of the Council, \"\n",
252 | " \"Labour leader Clement Attlee as Lord Privy Seal (later as Deputy Prime Minister), Halifax as Foreign Secretary and Labour's \"\n",
253 | " \"Arthur Greenwood as a minister without portfolio. In practice, these five were augmented by the service chiefs and ministers \"\n",
254 | " \"who attended the majority of meetings. The cabinet changed in size and membership as the war progressed, one of the key \"\n",
255 | " \"appointments being the leading trades unionist Ernest Bevin as Minister of Labour and National Service. In response to \"\n",
256 | " \"previous criticisms that there had been no clear single minister in charge of the prosecution of the war, Churchill created \"\n",
257 | " \"and took the additional position of Minister of Defence, making him the most powerful wartime Prime Minister in British \"\n",
258 | " \"history. He drafted outside experts into government to fulfil vital functions, especially on the Home Front. These included \"\n",
259 | " \"personal friends like Lord Beaverbrook and Frederick Lindemann, who became the government's scientific advisor.\"\n",
260 | " \" \"\n",
261 | " \"At the end of May, with the British Expeditionary Force in retreat to Dunkirk and the Fall of France seemingly imminent, \"\n",
262 | " \"Halifax proposed that the government should explore the possibility of a negotiated peace settlement using the still-neutral \"\n",
263 | " \"Mussolini as an intermediary. There were several high-level meetings from 26 to 28 May, including two with the French \"\n",
264 | " \"premier Paul Reynaud. Churchill's resolve was to fight on, even if France capitulated, but his position remained precarious \"\n",
265 | " \"until Chamberlain resolved to support him. Churchill had the full support of the two Labour members but knew he could not \"\n",
266 | " \"survive as Prime Minister if both Chamberlain and Halifax were against him. In the end, by gaining the support of his outer \"\n",
267 | " \"cabinet, Churchill outmanoeuvred Halifax and won Chamberlain over. Churchill believed that the only option was to fight on \"\n",
268 | " \"and his use of rhetoric hardened public opinion against a peaceful resolution and prepared the British people for a long war \"\n",
269 | " \"– Jenkins says Churchill's speeches were 'an inspiration for the nation, and a catharsis for Churchill himself'.\"\n",
270 | " \" \"\n",
271 | " \"His first speech as Prime Minister, delivered to the Commons on 13 May was the 'blood, toil, tears and sweat' speech. It was \"\n",
272 | " \"little more than a short statement but, Jenkins says, 'it included phrases which have reverberated down the decades'.\")\n"
273 | ],
274 | "execution_count": 12,
275 | "outputs": []
276 | },
277 | {
278 | "cell_type": "code",
279 | "metadata": {
280 | "colab": {
281 | "base_uri": "https://localhost:8080/"
282 | },
283 | "id": "kGF4q99Qsoxw",
284 | "outputId": "2ce75710-d638-476d-c3ba-51e9c59fa542"
285 | },
286 | "source": [
287 | "model = TransformerSummarizer(transformer_type=\"XLNet\",transformer_model_key=\"xlnet-base-cased\")"
288 | ],
289 | "execution_count": 19,
290 | "outputs": [
291 | {
292 | "output_type": "stream",
293 | "text": [
294 | "100%|██████████| 760/760 [00:00<00:00, 252348.88B/s]\n",
295 | "100%|██████████| 467042463/467042463 [00:10<00:00, 43845344.01B/s]\n",
296 | "100%|██████████| 798011/798011 [00:00<00:00, 7158333.49B/s]\n"
297 | ],
298 | "name": "stderr"
299 | }
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "metadata": {
305 | "id": "YKTjMytusvZ1"
306 | },
307 | "source": [
308 | "summary = ''.join(model(sequence, min_length=60))"
309 | ],
310 | "execution_count": 20,
311 | "outputs": []
312 | },
313 | {
314 | "cell_type": "code",
315 | "metadata": {
316 | "colab": {
317 | "base_uri": "https://localhost:8080/",
318 | "height": 170
319 | },
320 | "id": "NL3NEce4s-hp",
321 | "outputId": "9eb21b71-8178-4dcd-8ecc-d70d0531ff79"
322 | },
323 | "source": [
324 | "summary"
325 | ],
326 | "execution_count": 21,
327 | "outputs": [
328 | {
329 | "output_type": "execute_result",
330 | "data": {
331 | "application/vnd.google.colaboratory.intrinsic+json": {
332 | "type": "string"
333 | },
334 | "text/plain": [
335 | "'In May, Churchill was still generally unpopular with many Conservatives and probably most of the Labour Party. The cabinet changed in size and membership as the war progressed, one of the key appointments being the leading trades unionist Ernest Bevin as Minister of Labour and National Service. In response to previous criticisms that there had been no clear single minister in charge of the prosecution of the war, Churchill created and took the additional position of Minister of Defence, making him the most powerful wartime Prime Minister in British history. There were several high-level meetings from 26 to 28 May, including two with the French premier Paul Reynaud.'"
336 | ]
337 | },
338 | "metadata": {
339 | "tags": []
340 | },
341 | "execution_count": 21
342 | }
343 | ]
344 | },
345 | {
346 | "cell_type": "code",
347 | "metadata": {
348 | "id": "Sxlglqyvug11"
349 | },
350 | "source": [
351 | ""
352 | ],
353 | "execution_count": null,
354 | "outputs": []
355 | }
356 | ]
357 | }
--------------------------------------------------------------------------------