├── McDonaldsCSR2019.pdf
├── README.md
└── CSR_Report_NLP_Walkthrough.ipynb
/McDonaldsCSR2019.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hannahawalsh/HTTF4-ESG-and-NLP/HEAD/McDonaldsCSR2019.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Sustainability reports & NLP
2 | This repository accompanies Finastra's Hack to the Future 4 learning session "Sustainability Reports and NLP" presented on March 17, 2022.
3 |
4 | ---
5 |
6 | ### What is ESG?
7 | **E**nviornmental issues, such as climate change and pollution
8 | **S**ocial issues around workplace practices and human capital
9 | **G**overnance issues such as executive pay, accounting, and ethics
10 |
11 | ---
12 |
13 | ### Why care about ESG?
14 | In addition to just being "good" to care about for obvious reasons, investors like ESG because it has been shown [^1][^2][^3] that
15 | - Higher ESG is associated with higher profitability and lower volatility
16 | - High-ESG performance companies are good allocators of capital
17 | - Good-ESG companies generally have higher valuations, EVA Growth, size, and returns
18 | - Higher-ESG performance and profitable firms have higher returns with lower risk
19 |
20 | ---
21 |
22 | ### ESG Reporting
23 | A corporate social responsibility (CSR) report is an internal and external facing document companies use to communicate CSR efforts around environmental, ethical, philanthropic, and economic impacts on the environment and community.
24 | - 92% of S&P 500 index companies published annual sustainability reports in 2020[^4]
25 | - There is not one standard reporting format in the US, but there are general reporting guidelines, such as that provided by Nasdaq[^5]
26 | - Reports range from 30 pages to 200+ pages
27 | - Each CSR report highlights the company’s strong suits, goals, and plans around ESG
28 | - Analysts and investors read multiple reports to understand and compare company trends and themes
29 |
30 |
31 | ---
32 |
33 | ### Natural Language Processing
34 | Natural language processing (NLP) is a field of linguistics and machine learning that deals with natural (i.e., human) languages. The goal is to "understand" the unstructured text data and produce something new. Examples of NLP tasks are language translation, text summarization, and sentiment analysis.
35 | Here are a handful of the many NLP tasks:
36 | - Language translation
37 | - Sentiment analysis
38 | - Text classification
39 | - Document summarization
40 | - Chat bots
41 | - Autocomplete
42 |
43 | ---
44 |
45 | ### Zero-Shot Learning
46 | Human languages are really complex, so it is impossible to train classifiers on every single phrase. Zero-shot learning (ZSL) models allow classification of text into categories unseen by the model during training. These methods work by combining the observed/seen and the non-observed/unseen categories through auxiliary information, which encodes properties of objects.
47 |
48 | Other common uses for zero-shot learning models are images and videos. And the uses keep growing, such as activity recognition from sensors.
49 |
50 | Zero-shot learning models are extremely helpful when you want to classify text on very specific labels and don't have labeled data. Labeled data can be difficult, expensive, and tedious to acquire, so zero-shot learning provides a quick way to get a classification without specialized data and additional model training.
51 |
52 | The downside to zero-shot learning is that it is extremely slow compared to models trained on specific labels. It basically has to compute "what it means to be that label" then it has to check if your sentence "is that label."
53 |
54 | ---
55 | ### Sources
56 | [^1]: https://corpgov.law.harvard.edu/2020/01/14/esg-matters/
57 | [^2]: https://corpgov.law.harvard.edu/2021/06/02/esg-matters-ii/
58 | [^3]: https://www.blackrock.com/corporate/literature/publication/blk-esg-investment-statement-web.pdf
59 | [^4]: https://www.ga-institute.com/index.php?id=9128
60 | [^5]: https://www.nasdaq.com/ESG-Guide
61 |
--------------------------------------------------------------------------------
/CSR_Report_NLP_Walkthrough.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "# Sustainability reports & NLP \n",
8 | "Thursday, March 17, 2022\n",
9 | "\n",
10 | "In this hackathon learning session, we will walk through an example of parsing a pdf sustainability report and classifying it using one-shot learning in Python. \n",
11 | "\n",
12 | "---"
13 | ]
14 | },
15 | {
16 | "cell_type": "markdown",
17 | "metadata": {},
18 | "source": [
19 | "## Concepts\n",
20 | "#### Corporate Social Responsibility Reports (CSR)\n",
21 | "A corporate social responsibility (CSR) report is an internal and external facing document companies use to communicate CSR efforts around environmental, ethical, philanthropic, and economic impacts on the environment and community. \n",
22 | "\n",
23 | "While they are not required in any sense, over 90% of S&P 500 index companies publish them anually [[source]](https://www.ga-institute.com/index.php?id=9128). As there's not a standard reporting process, the quantity and quality of information disclosed is up to the PR department at each company. The reports can be anywhere from 30 to 200+ pages. \n",
24 | "\n",
25 | "CSR reports are available to the public on a company's website or on [www.responsibilityreports.com/Company](www.responsibilityreports.com).\n",
26 | "\n",
27 | "\n",
28 | "#### Natural Language Processing (NLP)\n",
29 | "Natural language processing (NLP) is a field of linguistics and machine learning that deals with natural (i.e., human) languages. The goal is to \"understand\" the unstructured text data and produce something new. Examples of NLP tasks are language translation, text summarization, and sentiment analysis. \n",
30 | "\n",
31 | "\n",
32 | "#### Zero-Shot Learning (ZSL)\n",
33 | "Human languages are really complex, so it is impossible to train classifiers on every single phrase. Zero-shot learning (ZSL) models allow classification of text into categories unseen by the model during training. These methods work by combining the observed/seen and the non-observed/unseen categories through auxiliary information, which encodes properties of objects. \n",
34 | "\n",
35 | "Other common uses for zero-shot learning models are images and videos. And the uses keep growing, such as activity recognition from sensors.\n",
36 | "\n",
37 | "We will use NLP and ZSL to analyze a CSR report in order to classify each sentence as one of several categories relating to ESG. \n",
38 | "\n",
39 | "---"
40 | ]
41 | },
42 | {
43 | "cell_type": "code",
44 | "execution_count": 1,
45 | "metadata": {},
46 | "outputs": [],
47 | "source": [
48 | "# Imports\n",
49 | "import re\n",
50 | "import string\n",
51 | "from collections import defaultdict\n",
52 | "import pandas as pd\n",
53 | "from tika import parser\n",
54 | "import nltk\n",
55 | "import torch\n",
56 | "from transformers import pipeline # Hugging Face\n",
57 | "\n",
58 | "pd.set_option(\"display.max_colwidth\", None)"
59 | ]
60 | },
61 | {
62 | "cell_type": "markdown",
63 | "metadata": {},
64 | "source": [
65 | "## Parsing CSR PDFs\n",
66 | "A non-trivial portion of classifying CSR reports is converting them to a computer-readable format. Companies publish their CSR reports as PDFs, which are notoriously hard to read. Our goal is to extract text as a list of sentences. \n",
67 | "\n",
68 | "We will be doing very simple parsing of a PDF report using the package tika to extract the text, regular expressions to filter and join the text, and NLTK to split the text into sentences. \n",
69 | "\n",
70 | "This is by no means the best way to do it, but it's relatively simple and gets the job done well enough for our purposes. Text cleaning is task-specific, so you need to consider what is sufficient for your problem. "
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 2,
76 | "metadata": {},
77 | "outputs": [],
78 | "source": [
79 | "class parsePDF:\n",
80 | " def __init__(self, url):\n",
81 | " self.url = url\n",
82 | " \n",
83 | " def extract_contents(self):\n",
84 | " \"\"\" Extract a pdf's contents using tika. \"\"\"\n",
85 | " pdf = parser.from_file(self.url)\n",
86 | " self.text = pdf[\"content\"]\n",
87 | " return self.text\n",
88 | " \n",
89 | " \n",
90 | " def clean_text(self):\n",
91 | " \"\"\" Extract & clean sentences from raw text of pdf. \"\"\"\n",
92 | " # Remove non ASCII characters\n",
93 | " printables = set(string.printable)\n",
94 | " self.text = \"\".join(filter(lambda x: x in printables, self.text))\n",
95 | "\n",
96 | " # Replace tabs with spaces\n",
97 | " self.text = re.sub(r\"\\t+\", r\" \", self.text)\n",
98 | "\n",
99 | " # Aggregate lines where the sentence wraps\n",
100 | " # Also, lines in CAPITALS is counted as a header\n",
101 | " fragments = []\n",
102 | " prev = \"\"\n",
103 | " for line in re.split(r\"\\n+\", self.text):\n",
104 | " if line.isupper():\n",
105 | " prev = \".\" # skip it\n",
106 | " elif line and (line.startswith(\" \") or line[0].islower()\n",
107 | " or not prev.endswith(\".\")):\n",
108 | " prev = f\"{prev} {line}\" # make into one line\n",
109 | " else:\n",
110 | " fragments.append(prev)\n",
111 | " prev = line\n",
112 | " fragments.append(prev)\n",
113 | "\n",
114 | " # Clean the lines into sentences\n",
115 | " sentences = []\n",
116 | " for line in fragments:\n",
117 | " # Use regular expressions to clean text\n",
118 | " url_str = (r\"((http|https)\\:\\/\\/)?[a-zA-Z0-9\\.\\/\\?\\:@\\-_=#]+\\.\"\n",
119 | " r\"([a-zA-Z]){2,6}([a-zA-Z0-9\\.\\&\\/\\?\\:@\\-_=#])*\")\n",
120 | " line = re.sub(url_str, r\" \", line) # URLs\n",
121 | " line = re.sub(r\"^\\s?\\d+(.*)$\", r\"\\1\", line) # headers\n",
122 | " line = re.sub(r\"\\d{5,}\", r\" \", line) # figures\n",
123 | " line = re.sub(r\"\\.+\", \".\", line) # multiple periods\n",
124 | " \n",
125 | " line = line.strip() # leading & trailing spaces\n",
126 | " line = re.sub(r\"\\s+\", \" \", line) # multiple spaces\n",
127 | " line = re.sub(r\"\\s?([,:;\\.])\", r\"\\1\", line) # punctuation spaces\n",
128 | " line = re.sub(r\"\\s?-\\s?\", \"-\", line) # split-line words\n",
129 | "\n",
130 | " # Use nltk to split the line into sentences\n",
131 | " for sentence in nltk.sent_tokenize(line):\n",
132 | " s = str(sentence).strip().lower() # lower case\n",
133 | " # Exclude tables of contents and short sentences\n",
134 | " if \"table of contents\" not in s and len(s) > 5:\n",
135 | " sentences.append(s)\n",
136 | " return sentences"
137 | ]
138 | },
139 | {
140 | "cell_type": "markdown",
141 | "metadata": {},
142 | "source": [
143 | "##### Example: McDonald's\n",
144 | "Here, we're pulling McDonalds' most recent CSR report from [responsibilityreports.com](https://www.responsibilityreports.com/Company/mcdonalds-corporation). We will extract and parse the text in order to move on to classifying it using zero shot learning."
145 | ]
146 | },
147 | {
148 | "cell_type": "code",
149 | "execution_count": 3,
150 | "metadata": {},
151 | "outputs": [
152 | {
153 | "name": "stderr",
154 | "output_type": "stream",
155 | "text": [
156 | "2022-03-16 15:46:16,450 [MainThread ] [INFO ] Retrieving https://www.responsibilityreports.com/Click/2534 to /var/folders/8k/zkkj1v6n7gbd6tf9pdj8cw1w0000gq/T/click-2534.\n"
157 | ]
158 | },
159 | {
160 | "name": "stdout",
161 | "output_type": "stream",
162 | "text": [
163 | "The McDonalds CSR report has 275 sentences\n"
164 | ]
165 | }
166 | ],
167 | "source": [
168 | "mcdonalds_url = \"https://www.responsibilityreports.com/Click/2534\"\n",
169 | "pp = parsePDF(mcdonalds_url)\n",
170 | "pp.extract_contents()\n",
171 | "sentences = pp.clean_text()\n",
172 | "\n",
173 | "print(f\"The McDonalds CSR report has {len(sentences):,d} sentences\")"
174 | ]
175 | },
176 | {
177 | "cell_type": "markdown",
178 | "metadata": {},
179 | "source": [
180 | "## Zero-Shot Learning\n",
181 | "Zero-shot learning models are extremely helpful when you want to classify text on very specific labels and don't have labeled data. Labeled data can be difficult, expensive, and tedious to acquire, so zero-shot learning provides a quick way to get a classification without specialized data and additional model training. \n",
182 | "\n",
183 | "We are going to define industry-specific ESG categories and ask our model to classify each sentence in our CSR report. We will get a \"score\" that shows how confident the model is that that label applies. A score of 1.0 means that that sentence is definitely about that topic. Conversely, a score of 0.0 means that the sentence definitely doesn't relate to that topic. \n",
184 | "\n",
185 | "The downside to zero-shot learning is that it is extremely slow compared to models trained on specific labels. It basically has to compute \"what it means to be that label\" then it has to check if your sentence \"is that label.\""
186 | ]
187 | },
188 | {
189 | "cell_type": "code",
190 | "execution_count": 4,
191 | "metadata": {},
192 | "outputs": [],
193 | "source": [
194 | "class ZeroShotClassifier:\n",
195 | "\n",
196 | " def create_zsl_model(self, model_name):\n",
197 | " \"\"\" Create the zero-shot learning model. \"\"\"\n",
198 | " self.model = pipeline(\"zero-shot-classification\", model=model_name)\n",
199 | " \n",
200 | " \n",
201 | " def classify_text(self, text, categories):\n",
202 | " \"\"\"\n",
203 | " Classify text(s) to the pre-defined categories using a\n",
204 | " zero-shot classification model and return the raw results.\n",
205 | " \"\"\"\n",
206 | " # Classify text using the zero-shot transformers model\n",
207 | " hypothesis_template = \"This text is about {}.\"\n",
208 | " result = self.model(text, categories, multi_label=True,\n",
209 | " hypothesis_template=hypothesis_template)\n",
210 | " return result\n",
211 | "\n",
212 | " \n",
213 | " def text_labels(self, text, category_dict, cutoff=None):\n",
214 | " \"\"\"\n",
215 | " Classify a text into the pre-defined categories. If cutoff\n",
216 | " is defined, return only those entries where the score > cutoff\n",
217 | " \"\"\"\n",
218 | " # Run the model on our categories\n",
219 | " categories = list(category_dict.keys())\n",
220 | " result = (self.classify_text(text, categories))\n",
221 | " \n",
222 | " # Format as a pandas dataframe and add ESG label\n",
223 | " df = pd.DataFrame(result).explode([\"labels\", \"scores\"])\n",
224 | " df[\"ESG\"] = df.labels.map(category_dict)\n",
225 | " \n",
226 | " # If a cutoff is provided, filter the dataframe\n",
227 | " if cutoff:\n",
228 | " df = df[df.scores.gt(cutoff)].copy()\n",
229 | " return df.reset_index(drop=True)"
230 | ]
231 | },
232 | {
233 | "cell_type": "markdown",
234 | "metadata": {},
235 | "source": [
236 | "##### Pre-Define Labels\n",
237 | "The labels chosen below are based on categories and topics used by ESG scoring companies. \n",
238 | "We define the plain-english version, which is what will be searched by the zero-shot learning model, as well as the general \"ESG\" label. \n",
239 | "\n",
240 | "Because of how zero-shot learning models work, inference time will increase linearly with the number of labels you define. Therefore, it is necessary to consider which labels you really want and how much time is acceptable for text classification."
241 | ]
242 | },
243 | {
244 | "cell_type": "code",
245 | "execution_count": 5,
246 | "metadata": {},
247 | "outputs": [],
248 | "source": [
249 | "# Define categories we want to classify\n",
250 | "esg_categories = {\n",
251 | " \"emissions\": \"E\",\n",
252 | " \"natural resources\": \"E\",\n",
253 | " \"pollution\": \"E\",\n",
254 | " \"diversity and inclusion\": \"S\",\n",
255 | " \"philanthropy\": \"S\",\n",
256 | " \"health and safety\": \"S\",\n",
257 | " \"training and education\": \"S\",\n",
258 | " \"transparancy\": \"G\",\n",
259 | " \"corporate compliance\": \"G\",\n",
260 | " \"board accountability\": \"G\"}"
261 | ]
262 | },
263 | {
264 | "cell_type": "markdown",
265 | "metadata": {},
266 | "source": [
267 | "##### Getting Text Classification\n",
268 | "Now, all we have to do is define the model and make predictions. The architecture of the model can be chosen from any text-classification model on [Hugging Face](https://huggingface.co/models). \n",
269 | "\n",
270 | "Here, we choose to use the extra large version of the DeBERTa model, as maintained by Microsoft. A larger model (generally) gives better performance but is much slower."
271 | ]
272 | },
273 | {
274 | "cell_type": "code",
275 | "execution_count": 6,
276 | "metadata": {},
277 | "outputs": [
278 | {
279 | "name": "stderr",
280 | "output_type": "stream",
281 | "text": [
282 | "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
283 | ]
284 | }
285 | ],
286 | "source": [
287 | "# Define and Create the zero-shot learning model\n",
288 | "model_name = \"microsoft/deberta-v2-xlarge-mnli\" \n",
289 | " # a smaller version: \"microsoft/deberta-base-mnli\"\n",
290 | "ZSC = ZeroShotClassifier()\n",
291 | "ZSC.create_zsl_model(model_name)\n",
292 | " # Note: the warning is expected, so ignore it"
293 | ]
294 | },
295 | {
296 | "cell_type": "code",
297 | "execution_count": 7,
298 | "metadata": {},
299 | "outputs": [
300 | {
301 | "data": {
302 | "text/html": [
303 | "
\n",
304 | "\n",
317 | "
\n",
318 | " \n",
319 | " \n",
320 | " | \n",
321 | " sequence | \n",
322 | " labels | \n",
323 | " scores | \n",
324 | " ESG | \n",
325 | "
\n",
326 | " \n",
327 | " \n",
328 | " \n",
329 | " | 175 | \n",
330 | " our customers want to see that the mcdonalds they visit locally matches how we act globally. | \n",
331 | " natural resources | \n",
332 | " 0.018979 | \n",
333 | " E | \n",
334 | "
\n",
335 | " \n",
336 | " | 1714 | \n",
337 | " across europe, mcdonalds and its franchisees partnered with organizations and local food banks to donate surplus ingredients to families. | \n",
338 | " board accountability | \n",
339 | " 0.003851 | \n",
340 | " G | \n",
341 | "
\n",
342 | " \n",
343 | " | 2670 | \n",
344 | " as one of the worlds largest restaurant companies, we have a responsibility to ensure long-term, sustainable value creation for shareholders while taking action on some of the worlds most pressing social and environmental challenges. | \n",
345 | " natural resources | \n",
346 | " 0.295828 | \n",
347 | " E | \n",
348 | "
\n",
349 | " \n",
350 | " | 2646 | \n",
351 | " mcdonalds continues to proactively make changes to restaurant operations and office settings based on the expert guidance of health authorities. | \n",
352 | " board accountability | \n",
353 | " 0.003085 | \n",
354 | " G | \n",
355 | "
\n",
356 | " \n",
357 | " | 1902 | \n",
358 | " 5.5m in france, 1.5 million was raised in-restaurant through customer donations in 2019, while restaurants mobilized to donate more than 4million to rmhc. | \n",
359 | " health and safety | \n",
360 | " 0.119588 | \n",
361 | " S | \n",
362 | "
\n",
363 | " \n",
364 | " | 1837 | \n",
365 | " ronald mcdonald care mobile program provides medical, dental and healthcare resources to children and families in underserved communities around the world. | \n",
366 | " natural resources | \n",
367 | " 0.000999 | \n",
368 | " E | \n",
369 | "
\n",
370 | " \n",
371 | " | 2601 | \n",
372 | " informed by rainn, the nations largest anti-sexual violence organization, the policy contains clear language on workplace conduct, manager responsibilities, employee resources and the investigation process. | \n",
373 | " transparancy | \n",
374 | " 0.19148 | \n",
375 | " G | \n",
376 | "
\n",
377 | " \n",
378 | " | 933 | \n",
379 | " we have more work to do but im confident we can continue to work with experts and learn from families to find areas where our system has the best opportunity to create positive and meaningful change. | \n",
380 | " health and safety | \n",
381 | " 0.05889 | \n",
382 | " S | \n",
383 | "
\n",
384 | " \n",
385 | " | 1347 | \n",
386 | " were also collaborating with target, cargill and the nature conservancy to support a five-year $8.5 million project in nebraska, a key state for both beef and cattle feed production. | \n",
387 | " health and safety | \n",
388 | " 0.001917 | \n",
389 | " S | \n",
390 | "
\n",
391 | " \n",
392 | " | 1494 | \n",
393 | " in europe, our renewable energy purchases in 2019 covered over 6,500 restaurants worth of electricity across 11 markets. | \n",
394 | " pollution | \n",
395 | " 0.017052 | \n",
396 | " E | \n",
397 | "
\n",
398 | " \n",
399 | " | 1130 | \n",
400 | " the cup that keeps ongiving through an industry-first global partnership with terracycles circular packaging service, loop, we will be testing a new reusable cup model for hot beverages across select mcdonalds restaurants in the u.k. | \n",
401 | " philanthropy | \n",
402 | " 0.915213 | \n",
403 | " S | \n",
404 | "
\n",
405 | " \n",
406 | " | 581 | \n",
407 | " we have also established research projects globally to validate pioneering sustainability practices for beef farming. | \n",
408 | " emissions | \n",
409 | " 0.612276 | \n",
410 | " E | \n",
411 | "
\n",
412 | " \n",
413 | " | 920 | \n",
414 | " by showing lower-calorie soft drinks first, wehave shifted 1.9 million purchases from full-sugar coca-cola to options with no added sugar. | \n",
415 | " health and safety | \n",
416 | " 0.194776 | \n",
417 | " S | \n",
418 | "
\n",
419 | " \n",
420 | " | 2543 | \n",
421 | " it is an ongoing process that requires continuous effort and improvement. | \n",
422 | " corporate compliance | \n",
423 | " 0.144693 | \n",
424 | " G | \n",
425 | "
\n",
426 | " \n",
427 | " | 2723 | \n",
428 | " while we are committed to providing timely updates, the company holds no obligation to update information or statements. | \n",
429 | " board accountability | \n",
430 | " 0.045778 | \n",
431 | " G | \n",
432 | "
\n",
433 | " \n",
434 | " | 1964 | \n",
435 | " she joined the company just as social justice began dominating the headlines. | \n",
436 | " board accountability | \n",
437 | " 0.003398 | \n",
438 | " G | \n",
439 | "
\n",
440 | " \n",
441 | " | 223 | \n",
442 | " our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. | \n",
443 | " philanthropy | \n",
444 | " 0.038022 | \n",
445 | " S | \n",
446 | "
\n",
447 | " \n",
448 | " | 1432 | \n",
449 | " our planet 10 mcdonalds purpose & impact summary report golden arches illuminated by a static bike that customers can use energy-generating photovoltaic glass windows solar-paneled roof solar-powered parking lot lights in 2020, mcdonalds unveiled a first-of-its-kind restaurant designed to create enough renewable energy on-site to cover 100% of its energy needs on a net annual basis. | \n",
450 | " transparancy | \n",
451 | " 0.003674 | \n",
452 | " G | \n",
453 | "
\n",
454 | " \n",
455 | " | 162 | \n",
456 | " delivering on our purpose today and in the future as we look to the future, its important that we continue to embrace the changes taking place around us so that we can grow, meet evolving customer expectations and make our brand even stronger. | \n",
457 | " training and education | \n",
458 | " 0.013431 | \n",
459 | " S | \n",
460 | "
\n",
461 | " \n",
462 | " | 2132 | \n",
463 | " this includes a focus on tackling any hiring bias and reducing barriers to employment for underrepresented groups. | \n",
464 | " training and education | \n",
465 | " 0.008031 | \n",
466 | " S | \n",
467 | "
\n",
468 | " \n",
469 | "
\n",
470 | "
"
471 | ],
472 | "text/plain": [
473 | " sequence \\\n",
474 | "175 our customers want to see that the mcdonalds they visit locally matches how we act globally. \n",
475 | "1714 across europe, mcdonalds and its franchisees partnered with organizations and local food banks to donate surplus ingredients to families. \n",
476 | "2670 as one of the worlds largest restaurant companies, we have a responsibility to ensure long-term, sustainable value creation for shareholders while taking action on some of the worlds most pressing social and environmental challenges. \n",
477 | "2646 mcdonalds continues to proactively make changes to restaurant operations and office settings based on the expert guidance of health authorities. \n",
478 | "1902 5.5m in france, 1.5 million was raised in-restaurant through customer donations in 2019, while restaurants mobilized to donate more than 4million to rmhc. \n",
479 | "1837 ronald mcdonald care mobile program provides medical, dental and healthcare resources to children and families in underserved communities around the world. \n",
480 | "2601 informed by rainn, the nations largest anti-sexual violence organization, the policy contains clear language on workplace conduct, manager responsibilities, employee resources and the investigation process. \n",
481 | "933 we have more work to do but im confident we can continue to work with experts and learn from families to find areas where our system has the best opportunity to create positive and meaningful change. \n",
482 | "1347 were also collaborating with target, cargill and the nature conservancy to support a five-year $8.5 million project in nebraska, a key state for both beef and cattle feed production. \n",
483 | "1494 in europe, our renewable energy purchases in 2019 covered over 6,500 restaurants worth of electricity across 11 markets. \n",
484 | "1130 the cup that keeps ongiving through an industry-first global partnership with terracycles circular packaging service, loop, we will be testing a new reusable cup model for hot beverages across select mcdonalds restaurants in the u.k. \n",
485 | "581 we have also established research projects globally to validate pioneering sustainability practices for beef farming. \n",
486 | "920 by showing lower-calorie soft drinks first, wehave shifted 1.9 million purchases from full-sugar coca-cola to options with no added sugar. \n",
487 | "2543 it is an ongoing process that requires continuous effort and improvement. \n",
488 | "2723 while we are committed to providing timely updates, the company holds no obligation to update information or statements. \n",
489 | "1964 she joined the company just as social justice began dominating the headlines. \n",
490 | "223 our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. \n",
491 | "1432 our planet 10 mcdonalds purpose & impact summary report golden arches illuminated by a static bike that customers can use energy-generating photovoltaic glass windows solar-paneled roof solar-powered parking lot lights in 2020, mcdonalds unveiled a first-of-its-kind restaurant designed to create enough renewable energy on-site to cover 100% of its energy needs on a net annual basis. \n",
492 | "162 delivering on our purpose today and in the future as we look to the future, its important that we continue to embrace the changes taking place around us so that we can grow, meet evolving customer expectations and make our brand even stronger. \n",
493 | "2132 this includes a focus on tackling any hiring bias and reducing barriers to employment for underrepresented groups. \n",
494 | "\n",
495 | " labels scores ESG \n",
496 | "175 natural resources 0.018979 E \n",
497 | "1714 board accountability 0.003851 G \n",
498 | "2670 natural resources 0.295828 E \n",
499 | "2646 board accountability 0.003085 G \n",
500 | "1902 health and safety 0.119588 S \n",
501 | "1837 natural resources 0.000999 E \n",
502 | "2601 transparancy 0.19148 G \n",
503 | "933 health and safety 0.05889 S \n",
504 | "1347 health and safety 0.001917 S \n",
505 | "1494 pollution 0.017052 E \n",
506 | "1130 philanthropy 0.915213 S \n",
507 | "581 emissions 0.612276 E \n",
508 | "920 health and safety 0.194776 S \n",
509 | "2543 corporate compliance 0.144693 G \n",
510 | "2723 board accountability 0.045778 G \n",
511 | "1964 board accountability 0.003398 G \n",
512 | "223 philanthropy 0.038022 S \n",
513 | "1432 transparancy 0.003674 G \n",
514 | "162 training and education 0.013431 S \n",
515 | "2132 training and education 0.008031 S "
516 | ]
517 | },
518 | "execution_count": 7,
519 | "metadata": {},
520 | "output_type": "execute_result"
521 | }
522 | ],
523 | "source": [
524 | "# Classify all the sentences in the report\n",
525 | " # Note: this takes a while\n",
526 | "classified = ZSC.text_labels(sentences, esg_categories)\n",
527 | "classified.sample(n=20) # display 20 random records"
528 | ]
529 | },
530 | {
531 | "cell_type": "code",
532 | "execution_count": 8,
533 | "metadata": {},
534 | "outputs": [
535 | {
536 | "data": {
537 | "text/html": [
538 | "\n",
539 | "\n",
552 | "
\n",
553 | " \n",
554 | " \n",
555 | " | \n",
556 | " sequence | \n",
557 | " labels | \n",
558 | " scores | \n",
559 | " ESG | \n",
560 | "
\n",
561 | " \n",
562 | " \n",
563 | " \n",
564 | " | 0 | \n",
565 | " feeding and fostering communities feeding and fostering communities mcdonalds purpose & impact summary report there when people need us most in a difficult year, mcdonalds showed up for its communities accelerating circular solutions how we are reimagining packaging our food journey sourcing quality ingredients while helping people, animals andthe planet thrive whats inside 04 foodquality&sourcing 04 our food journey 05 helping coffee communities build resilience 06 offering choices that kidsand parents love: byalistair macrow 07 our planet 07 reimagining packaging 09 taking action onclimate change: q&a with francescadebiase 10 what if a restaurant could generate all its own power from renewable energy? | \n",
566 | " natural resources | \n",
567 | " 0.98839 | \n",
568 | " E | \n",
569 | "
\n",
570 | " \n",
571 | " | 80 | \n",
572 | " we have continued our investment into sustainable packaging innovation, renewable energy and regenerative farming solutions to help drive action on climate change. | \n",
573 | " natural resources | \n",
574 | " 0.973544 | \n",
575 | " E | \n",
576 | "
\n",
577 | " \n",
578 | " | 81 | \n",
579 | " we have continued our investment into sustainable packaging innovation, renewable energy and regenerative farming solutions to help drive action on climate change. | \n",
580 | " emissions | \n",
581 | " 0.942017 | \n",
582 | " E | \n",
583 | "
\n",
584 | " \n",
585 | " | 100 | \n",
586 | " our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. | \n",
587 | " natural resources | \n",
588 | " 0.986246 | \n",
589 | " E | \n",
590 | "
\n",
591 | " \n",
592 | " | 101 | \n",
593 | " our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. | \n",
594 | " emissions | \n",
595 | " 0.943047 | \n",
596 | " E | \n",
597 | "
\n",
598 | " \n",
599 | " | 102 | \n",
600 | " our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. | \n",
601 | " pollution | \n",
602 | " 0.858435 | \n",
603 | " E | \n",
604 | "
\n",
605 | " \n",
606 | " | 220 | \n",
607 | " our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. | \n",
608 | " natural resources | \n",
609 | " 0.986246 | \n",
610 | " E | \n",
611 | "
\n",
612 | " \n",
613 | " | 221 | \n",
614 | " our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. | \n",
615 | " emissions | \n",
616 | " 0.943047 | \n",
617 | " E | \n",
618 | "
\n",
619 | " \n",
620 | " | 222 | \n",
621 | " our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. | \n",
622 | " pollution | \n",
623 | " 0.858435 | \n",
624 | " E | \n",
625 | "
\n",
626 | " \n",
627 | " | 292 | \n",
628 | " our values serve we put our customers and people first inclusion we open our doors toeveryone integrity we do the rightthing community we are goodneighbors family we get better together our values guide us to always put our customers and people first, and ensure we open our doors to everyone. | \n",
629 | " natural resources | \n",
630 | " 0.884307 | \n",
631 | " E | \n",
632 | "
\n",
633 | " \n",
634 | "
\n",
635 | "
"
636 | ],
637 | "text/plain": [
638 | " sequence \\\n",
639 | "0 feeding and fostering communities feeding and fostering communities mcdonalds purpose & impact summary report there when people need us most in a difficult year, mcdonalds showed up for its communities accelerating circular solutions how we are reimagining packaging our food journey sourcing quality ingredients while helping people, animals andthe planet thrive whats inside 04 foodquality&sourcing 04 our food journey 05 helping coffee communities build resilience 06 offering choices that kidsand parents love: byalistair macrow 07 our planet 07 reimagining packaging 09 taking action onclimate change: q&a with francescadebiase 10 what if a restaurant could generate all its own power from renewable energy? \n",
640 | "80 we have continued our investment into sustainable packaging innovation, renewable energy and regenerative farming solutions to help drive action on climate change. \n",
641 | "81 we have continued our investment into sustainable packaging innovation, renewable energy and regenerative farming solutions to help drive action on climate change. \n",
642 | "100 our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. \n",
643 | "101 our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. \n",
644 | "102 our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. \n",
645 | "220 our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. \n",
646 | "221 our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. \n",
647 | "222 our planet: we are partnering with our franchisees, suppliers and farmers to protect our planet by finding innovative ways to keep waste out of nature and drive climate action. \n",
648 | "292 our values serve we put our customers and people first inclusion we open our doors toeveryone integrity we do the rightthing community we are goodneighbors family we get better together our values guide us to always put our customers and people first, and ensure we open our doors to everyone. \n",
649 | "\n",
650 | " labels scores ESG \n",
651 | "0 natural resources 0.98839 E \n",
652 | "80 natural resources 0.973544 E \n",
653 | "81 emissions 0.942017 E \n",
654 | "100 natural resources 0.986246 E \n",
655 | "101 emissions 0.943047 E \n",
656 | "102 pollution 0.858435 E \n",
657 | "220 natural resources 0.986246 E \n",
658 | "221 emissions 0.943047 E \n",
659 | "222 pollution 0.858435 E \n",
660 | "292 natural resources 0.884307 E "
661 | ]
662 | },
663 | "execution_count": 8,
664 | "metadata": {},
665 | "output_type": "execute_result"
666 | }
667 | ],
668 | "source": [
669 | "# Look at an example of \"E\" classified sentences:\n",
670 | "E_sentences = classified[classified.scores.gt(0.8) & classified.ESG.eq(\"E\")].copy()\n",
671 | "E_sentences.head(10)"
672 | ]
673 | },
674 | {
675 | "cell_type": "code",
676 | "execution_count": null,
677 | "metadata": {},
678 | "outputs": [],
679 | "source": []
680 | },
681 | {
682 | "cell_type": "code",
683 | "execution_count": null,
684 | "metadata": {},
685 | "outputs": [],
686 | "source": []
687 | },
688 | {
689 | "cell_type": "code",
690 | "execution_count": null,
691 | "metadata": {},
692 | "outputs": [],
693 | "source": []
694 | }
695 | ],
696 | "metadata": {
697 | "kernelspec": {
698 | "display_name": "Python 3",
699 | "language": "python",
700 | "name": "python3"
701 | },
702 | "language_info": {
703 | "codemirror_mode": {
704 | "name": "ipython",
705 | "version": 3
706 | },
707 | "file_extension": ".py",
708 | "mimetype": "text/x-python",
709 | "name": "python",
710 | "nbconvert_exporter": "python",
711 | "pygments_lexer": "ipython3",
712 | "version": "3.8.5"
713 | }
714 | },
715 | "nbformat": 4,
716 | "nbformat_minor": 4
717 | }
718 |
--------------------------------------------------------------------------------