├── README.md ├── LICENSE └── python └── get_sentiment.py /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Lexicon-based Sentiment Analysis for Economic and Financial Applications 3 | 4 | This package allows Python users to leverage on cutting-hedge NLP 5 | techniques to easily run sentiment analysis on economic text. 6 | Given a list of texts as input and a list of tokens of interest (ToI), 7 | the algorithm analyses the texts and compute the economic sentiment associated 8 | each ToI. Two key features characterize this approach. First, it is 9 | *fine-grained*, since words are assigned a polarity score that ranges in 10 | \[-1,1\] based on a dictionary. Second, it is *aspect-based*, since the 11 | algorithm selects the chunk of text that relates to the ToI based on a 12 | set of semantic rules and calculates the sentiment only on that text, 13 | rather than the full article. 14 | 15 | The package includes some additional of features, like automatic 16 | negation handling, tense detection, location filtering and excluding 17 | some words from the sentiment computation. The algorithm only supports English 18 | language, as it relies on the *en\_core\_web\_lg* language model from 19 | the `spaCy` Python module. 20 | 21 | ## Installation 22 | 23 | Make sure to install the following libraries: 24 | 25 | ``` bash 26 | pip install utm lxml numpy fastavro h5py spacy nltk matplotlib joblib toolz textblob gensim pandas 27 | ``` 28 | 29 | ``` py 30 | pip install git+https://github.com/hanzhichao2000/pysentiment 31 | pip install sentence_splitter wordcloud ruamel-yaml pycosat 32 | ``` 33 | 34 | Set up NLTK: 35 | 36 | ``` bash 37 | python -m nltk.downloader all 38 | python -m nltk.downloader wordnet 39 | python -m nltk.downloader omw 40 | python -m nltk.downloader sentiwordnet 41 | ``` 42 | 43 | Finally get spacy models: 44 | 45 | ``` bash 46 | python -m spacy download en_core_web_lg 47 | python -m spacy download en_core_web_md 48 | python -m spacy download en_core_web_sm 49 | python -m spacy download it_core_news_sm 50 | python -m spacy download de_core_news_sm 51 | python -m spacy download fr_core_news_sm 52 | python -m spacy download fr_core_news_md 53 | python -m spacy download es_core_news_sm 54 | python -m spacy download es_core_news_md 55 | python -m spacy download nl_core_news_sm 56 | ``` 57 | 58 | ## Example 59 | 60 | Let’s assume that you want to compute the sentiment associated to two 61 | tokens of interest, namely *unemployment* and *economy*, given the two 62 | following sentences. 63 | 64 | ``` bash 65 | text = ['Unemployment is rising at high speed', 'The economy is slowing down and unemployment is booming'] 66 | include = ['unemployment', 'economy'] 67 | 68 | get_sentiment(text = text, include = include) 69 | > Doc_id Text Chunk Sentiment Tense Include 70 | > 1 Unemployment is rising at… Unemployment is r… -0.899 pres… unemploy… 71 | > 2 The economy is slowing do… economy is slowing -0.3 pres… economy 72 | > 2 The economy is slowing do… unemployment is b… -0.8 pres… unemploy… 73 | ``` 74 | 75 | The output of the function `get_sentiment` is a list, containing two 76 | objects: 77 | 78 | - “sentiment” containing the average sentiment computed for 79 | each text; 80 | 81 | - “sentiment\_by\_chunk” containing the sentiment computed 82 | for each chunk detected in the texts. 83 | 84 | The first element of the output list provides the overall average 85 | sentiment score of each text, while the second provides the detailed 86 | score of each chunk of text that relates to one of the ToI. 87 | 88 | 89 | 90 | 91 | ## Citations: 92 | 93 | If you use this package, we encourage you to add the following references to the related papers: 94 | 95 | 96 | 97 | - Barbaglia, L.; Consoli, S.; and Manzan, S. 2022. Forecasting with Economic News. Journal of Business and Economic Statistics, to appear. Available at SSRN: 98 | 99 | - Consoli, S.; Barbaglia, L.; and Manzan, S. 2020. Fine-grained, aspect-based semantic sentiment analysis within the economic and financial domains. In Proceedings - 2020 IEEE 2nd International Conference on Cognitive Machine Intelligence, CogMI 2020, 52 – 61. 100 | 101 | - Consoli, S.; Barbaglia, L.; and Manzan, S. 2021. Explaining sentiment from Lexicon. In CEUR Workshop Proceedings, volume 2918, 87 – 95. 102 | 103 | - Consoli, S.; Barbaglia, L.; and Manzan, S. 2022. Fine-grained, aspect-based sentiment analysis on economic and financial lexicon. Knowledge-Based Systems, 247: 108781 104 | 105 | 106 | 107 | 108 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /python/get_sentiment.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import time 3 | import multiprocessing 4 | import itertools 5 | import csv 6 | import os 7 | import os.path 8 | import spacy 9 | from collections import Counter 10 | from datetime import datetime 11 | import pandas as pd 12 | import operator 13 | import getpass 14 | from nltk.corpus import sentiwordnet as swn 15 | from pathlib import Path 16 | import re 17 | import senticnet5 18 | import senti_bignomics 19 | 20 | 21 | def safe_string_cast_to_numerictype(val, to_type, default = None): 22 | try: 23 | return to_type(val) 24 | except (ValueError, TypeError): 25 | return default 26 | 27 | 28 | def FeelIt(tlemma, tpos=None, tokentlemma=None, PrintScr=False, UseSenticNet=False,UseSentiWordNet=False,UseFigas=True): 29 | 30 | computed_sentiment = 0.0 31 | 32 | senti_bignomics_sentiment = 0.0 33 | if UseFigas==True: 34 | tosearcsenticnet = tlemma.lower().replace(" ", "_") 35 | sentibignomicsitem = senti_bignomics.senti_bignomics.get(tosearcsenticnet) 36 | if sentibignomicsitem and sentibignomicsitem[0]: 37 | valstr = sentibignomicsitem[0] 38 | senti_bignomics_sentiment = safe_string_cast_to_numerictype(valstr, float, 0) 39 | #computed_sentiment = senti_bignomics_sentiment 40 | #return computed_sentiment 41 | 42 | if tpos == "NOUN": 43 | posval = "n" 44 | elif tpos == "VERB": 45 | posval = "v" 46 | elif tpos == "ADJ": 47 | posval = "a" 48 | elif tpos == "ADV": 49 | posval = "r" 50 | else: 51 | posval = "n" 52 | 53 | avgscSWN = 0 54 | if UseSentiWordNet==True: 55 | try: 56 | llsenses_pos = list(swn.senti_synsets(tlemma.lower(), posval)) 57 | except: 58 | llsenses_pos = [] 59 | if llsenses_pos and len(llsenses_pos) > 0: 60 | for thistimescore in llsenses_pos: 61 | avgscSWN = thistimescore.pos_score() - thistimescore.neg_score() 62 | if avgscSWN != 0: 63 | break 64 | if avgscSWN == 0 and posval == "a": 65 | posval = "s" 66 | try: 67 | llsenses_pos = list(swn.senti_synsets(tlemma.lower(), posval)) 68 | except: 69 | llsenses_pos = [] 70 | if llsenses_pos and len(llsenses_pos) > 0: 71 | for thistimescore in llsenses_pos: 72 | avgscSWN = thistimescore.pos_score() - thistimescore.neg_score() 73 | if avgscSWN != 0: 74 | break 75 | 76 | sentic_sentiment = 0 77 | if UseSenticNet == True: 78 | tosearcsenticnet = tlemma.lower().replace(" ", "_") 79 | senticitem = senticnet5.senticnet.get(tosearcsenticnet) 80 | if senticitem and senticitem[7]: 81 | valstr = senticitem[7] 82 | sentic_sentiment = safe_string_cast_to_numerictype(valstr, float, 0) 83 | 84 | #if sentic_sentiment != 0: 85 | # computed_sentiment = (sentic_sentiment + avgscSWN) / 2 86 | #else: 87 | # computed_sentiment = avgscSWN 88 | 89 | denom_for_average = 0 90 | num_for_average = 0.0 91 | if sentic_sentiment != 0: 92 | denom_for_average=denom_for_average+1 93 | num_for_average = sentic_sentiment 94 | if avgscSWN != 0: 95 | denom_for_average = denom_for_average + 1 96 | num_for_average = avgscSWN 97 | if senti_bignomics_sentiment !=0: 98 | denom_for_average = denom_for_average + 1 99 | num_for_average = senti_bignomics_sentiment 100 | 101 | computed_sentiment = 0 102 | if denom_for_average != 0: 103 | computed_sentiment = num_for_average / denom_for_average 104 | 105 | return computed_sentiment 106 | 107 | 108 | def FeelIt_OverallSentiment(toi, PrintScr=False, UseSenticNet=True, UseSentiWordNet=True, UseFigas=True): 109 | sentim_all = 0.0 110 | countsss = 0 111 | for xin in toi.sent: 112 | if (xin.pos_ == "ADJ") | (xin.pos_ == "ADV") | (xin.pos_ == "NOUN") | (xin.pos_ == "PROPN") | ( 113 | xin.pos_ == "VERB"): 114 | sentim_app = FeelIt(xin.lemma_.lower(), xin.pos_, xin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 115 | if sentim_app != 0.0: 116 | countsss = countsss + 1 117 | sentim_all = sentim_all + sentim_app 118 | if countsss > 0: 119 | sentim_all = sentim_all / countsss 120 | return sentim_all 121 | 122 | 123 | def PREP_token_IE_parsing(xin,singleNOUNs, singleCompoundedHITs, singleCompoundedHITs_toEXCLUDE,LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP, COMPUTE_OVERALL_SENTIMENT_SCORE, minw, maxw, FoundVerb, t, nountoskip=None, previousprep=None,UseSenticNet=True, UseSentiWordNet=True, UseFigas=True): 124 | listOfPreps = [] 125 | listOfPreps_sentim = [] 126 | lxin_n = [x for x in xin.lefts] 127 | rxin_n = [x for x in xin.rights] 128 | if lxin_n: 129 | for xinxin in lxin_n: 130 | if xinxin.dep_ == "pobj" and xinxin.pos_ == "NOUN" and IsInterestingToken( 131 | xinxin) and xinxin.lemma_.lower() != t.lemma_.lower(): 132 | if (nountoskip): 133 | if xinxin.lemma_.lower() == nountoskip.lemma_.lower(): 134 | continue 135 | minw = min(minw, xinxin.i) 136 | maxw = max(maxw, xinxin.i) 137 | other_list_NOUN, sentilist, minw, maxw = NOUN_token_IE_parsing(xinxin,singleNOUNs,singleCompoundedHITs, singleCompoundedHITs_toEXCLUDE,LOCATION_SYNONYMS_FOR_HEURISTIC, 138 | VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, minw=minw,maxw=maxw, 139 | verbtoskip=FoundVerb, nountoskip=t) 140 | sentim_noun = FeelIt(xinxin.lemma_.lower(), xinxin.pos_, xinxin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 141 | if other_list_NOUN and len(other_list_NOUN) > 0: 142 | listNoun_app = [] 143 | FoundNounInlist = "___" + xinxin.lemma_.lower() + " ["+xinxin.pos_+", "+xinxin.tag_+" ("+str(sentim_noun)+ ")]" 144 | for modin in other_list_NOUN: 145 | FoundNounInlist = FoundNounInlist + "___" + modin 146 | listNoun_app.append(FoundNounInlist) 147 | other_list_NOUN = listNoun_app 148 | 149 | listOfPreps.extend(other_list_NOUN) 150 | else: 151 | FoundNounInlist = "___" + xinxin.lemma_.lower() + " ["+xinxin.pos_+", "+xinxin.tag_+" ("+str(sentim_noun)+ ")]" + "___" 152 | listOfPreps.append(FoundNounInlist) 153 | if sentilist and len(sentilist) > 0: 154 | for sentin in sentilist: 155 | if sentin != 0: 156 | if sentim_noun == 0: 157 | sentim_noun = sentin 158 | else: 159 | if (sentim_noun > 0 and sentin < 0) or ( 160 | sentim_noun < 0 and sentin > 0): 161 | sentim_noun = np.sign(sentin) * np.sign(sentim_noun) * ( 162 | abs(sentim_noun) + (1 - abs(sentim_noun)) * abs( 163 | sentin)) 164 | else: 165 | sentim_noun = np.sign(sentim_noun) * ( 166 | abs(sentim_noun) + (1 - abs(sentim_noun)) * abs( 167 | sentin)) 168 | listOfPreps_sentim.append(sentim_noun) 169 | elif xinxin.dep_ == "pcomp" and xinxin.pos_ == "VERB" and xinxin.lemma_.lower() != t.lemma_.lower(): 170 | minw = min(minw, xinxin.i) 171 | maxw = max(maxw, xinxin.i) 172 | iterated_list_VERB, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing(xinxin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 173 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 174 | t, minw, maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 175 | if iterated_list_VERB and len(iterated_list_VERB) > 0: 176 | listOfPreps.extend(iterated_list_VERB) 177 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 178 | listOfPreps_sentim.extend(list_verbs_sentim_app) 179 | elif xinxin.dep_ == "prep" and xinxin.pos_ == "ADP": 180 | if (previousprep): 181 | if xinxin.lemma_.lower() == previousprep.lemma_.lower(): 182 | continue 183 | minw = min(minw, xinxin.i) 184 | maxw = max(maxw, xinxin.i) 185 | iterated_list_prep, iterated_list_prep_sentim, minw, maxw = PREP_token_IE_parsing(xinxin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 186 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 187 | minw=minw,maxw=maxw,FoundVerb=FoundVerb,t=t,nountoskip=nountoskip, 188 | previousprep=xin,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 189 | if iterated_list_prep and len(iterated_list_prep) > 0: 190 | listOfPreps.extend(iterated_list_prep) 191 | if iterated_list_prep_sentim and len(iterated_list_prep_sentim) > 0: 192 | listOfPreps_sentim.extend(iterated_list_prep_sentim) 193 | if rxin_n: 194 | for xinxin in rxin_n: 195 | if xinxin.dep_ == "pobj" and xinxin.pos_ == "NOUN" and IsInterestingToken( 196 | xinxin) and xinxin.lemma_.lower() != t.lemma_.lower(): 197 | if (nountoskip): 198 | if xinxin.lemma_.lower() == nountoskip.lemma_.lower(): 199 | continue 200 | minw = min(minw, xinxin.i) 201 | maxw = max(maxw, xinxin.i) 202 | other_list_NOUN, sentilist, minw, maxw = NOUN_token_IE_parsing(xinxin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 203 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 204 | minw=minw, maxw=maxw,verbtoskip=FoundVerb, nountoskip=t) 205 | sentim_noun = FeelIt(xinxin.lemma_.lower(), xinxin.pos_, xinxin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 206 | if other_list_NOUN and len(other_list_NOUN) > 0: 207 | listNoun_app = [] 208 | FoundNounInlist = "___" + xinxin.lemma_.lower() + " ["+xinxin.pos_+", "+xinxin.tag_+" ("+str(sentim_noun)+ ")]" 209 | for modin in other_list_NOUN: 210 | FoundNounInlist = FoundNounInlist + "___" + modin 211 | listNoun_app.append(FoundNounInlist) 212 | other_list_NOUN = listNoun_app 213 | # 214 | listOfPreps.extend(other_list_NOUN) 215 | else: 216 | FoundNounInlist = "___" + xinxin.lemma_.lower() + " ["+xinxin.pos_+", "+xinxin.tag_+" ("+str(sentim_noun)+ ")]" + "___" 217 | listOfPreps.append(FoundNounInlist) 218 | if sentilist and len(sentilist) > 0: 219 | for sentin in sentilist: 220 | if sentin != 0: 221 | if sentim_noun == 0: 222 | sentim_noun = sentin 223 | else: 224 | if (sentim_noun > 0 and sentin < 0) or ( 225 | sentim_noun < 0 and sentin > 0): 226 | sentim_noun = np.sign(sentin) * np.sign(sentim_noun) * ( 227 | abs(sentim_noun) + (1 - abs(sentim_noun)) * abs( 228 | sentin)) 229 | else: 230 | sentim_noun = np.sign(sentim_noun) * ( 231 | abs(sentim_noun) + (1 - abs(sentim_noun)) * abs( 232 | sentin)) 233 | listOfPreps_sentim.append(sentim_noun) 234 | elif xinxin.dep_ == "pcomp" and xinxin.pos_ == "VERB" and xinxin.lemma_.lower() != t.lemma_.lower(): 235 | minw = min(minw, xinxin.i) 236 | maxw = max(maxw, xinxin.i) 237 | iterated_list_VERB, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing(xinxin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 238 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 239 | t, minw, maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 240 | if iterated_list_VERB and len(iterated_list_VERB) > 0: 241 | listOfPreps.extend(iterated_list_VERB) 242 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 243 | listOfPreps_sentim.extend(list_verbs_sentim_app) 244 | elif xinxin.dep_ == "prep" and xinxin.pos_ == "ADP": 245 | if (previousprep): 246 | if xinxin.lemma_.lower() == previousprep.lemma_.lower(): 247 | continue 248 | minw = min(minw, xinxin.i) 249 | maxw = max(maxw, xinxin.i) 250 | iterated_list_prep, iterated_list_prep_sentim, minw, maxw = PREP_token_IE_parsing(xinxin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 251 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 252 | minw=minw,maxw=maxw,FoundVerb=FoundVerb,t=t,nountoskip=nountoskip, 253 | previousprep=xin,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 254 | if iterated_list_prep and len(iterated_list_prep) > 0: 255 | listOfPreps.extend(iterated_list_prep) 256 | if iterated_list_prep_sentim and len(iterated_list_prep_sentim) > 0: 257 | listOfPreps_sentim.extend(iterated_list_prep_sentim) 258 | return listOfPreps, listOfPreps_sentim, minw, maxw 259 | 260 | 261 | def VERB_token_IE_parsing(FoundVerb, singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE,LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, t, minw, maxw, nountoskip=None,previousverb=None,UseSenticNet=True, UseSentiWordNet=True, UseFigas=True): 262 | listVerbs = [] 263 | listVerbs_sentim = [] 264 | CompoundsOfSingleHit = findCompoundedHITsOfTerm(singleCompoundedHITs, FoundVerb) 265 | FoundNeg = None 266 | FoundVerbAdverb = "" 267 | FoundVerbAdverb_sentim = 0 268 | listFoundModofVB = [] 269 | listFoundModofVB_sentim = [] 270 | l_n = [x for x in FoundVerb.lefts] 271 | if l_n: 272 | for xin in l_n: 273 | lxin_n = [x for x in xin.lefts] 274 | rxin_n = [x for x in xin.rights] 275 | if xin.dep_ == "neg": 276 | FoundNeg = "__not" 277 | minw = min(minw, xin.i) 278 | maxw = max(maxw, xin.i) 279 | elif xin.dep_ == "advmod" and (xin.pos_ == "ADV" and (xin.tag_ == "RBS" or xin.tag_ == "RBR")): 280 | if (xin.lemma_.lower() in CompoundsOfSingleHit): 281 | continue 282 | minw = min(minw, xin.i) 283 | maxw = max(maxw, xin.i) 284 | sentim_app = FeelIt(xin.lemma_.lower(), xin.pos_, xin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 285 | FoundVerbAdverb = FoundVerbAdverb + "__" + xin.lemma_.lower() + " [" + xin.pos_ + ", " + xin.tag_ + " (" + str(sentim_app) + ")]" 286 | if FoundVerbAdverb_sentim == 0: 287 | FoundVerbAdverb_sentim = sentim_app 288 | else: 289 | if (FoundVerbAdverb_sentim > 0 and sentim_app < 0) or (FoundVerbAdverb_sentim < 0 and sentim_app > 0): 290 | FoundVerbAdverb_sentim = FoundVerbAdverb_sentim + sentim_app 291 | else: 292 | FoundVerbAdverb_sentim = np.sign(FoundVerbAdverb_sentim) * ( 293 | abs(FoundVerbAdverb_sentim) + (1 - abs(FoundVerbAdverb_sentim)) * abs(sentim_app)) 294 | elif (xin.dep_ == "acomp" or xin.dep_ == "oprd") and (xin.pos_ == "ADJ" and ( 295 | xin.tag_ == "JJR" or xin.tag_ == "JJS" or xin.tag_ == "JJ")) and xin.lemma_.lower() != t.lemma_.lower(): 296 | if xin.lemma_.lower() in CompoundsOfSingleHit: 297 | continue 298 | foundadv = "" 299 | foundadv_sentim = 0 300 | if lxin_n: 301 | for xinxin in lxin_n: 302 | if ((xinxin.dep_ == "advmod" and (xinxin.pos_ == "ADV" and ( 303 | xinxin.tag_ == "RBS" or xinxin.tag_ == "RBR"))) or ( 304 | xinxin.dep_ == "conj" and (xinxin.pos_ == "ADJ" and ( 305 | xinxin.tag_ == "JJR" or xinxin.tag_ == "JJS" or xinxin.tag_ == "JJ")))) and xinxin.lemma_.lower() != t.lemma_.lower(): 306 | minw = min(minw, xinxin.i) 307 | maxw = max(maxw, xinxin.i) 308 | sentim_app = FeelIt(xinxin.lemma_.lower(), xinxin.pos_, xinxin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 309 | foundadv = foundadv + "__" + xinxin.lemma_.lower() + " [" + xinxin.pos_ + ", " + xinxin.tag_ + " (" + str(sentim_app) + ")]" 310 | if foundadv_sentim == 0: 311 | foundadv_sentim = sentim_app 312 | else: 313 | if (foundadv_sentim > 0 and sentim_app < 0) or (foundadv_sentim < 0 and sentim_app > 0): 314 | foundadv_sentim = foundadv_sentim + sentim_app 315 | else: 316 | foundadv_sentim = np.sign(foundadv_sentim) * ( 317 | abs(foundadv_sentim) + (1 - abs(foundadv_sentim)) * abs(sentim_app)) 318 | if rxin_n: 319 | for xinxin in rxin_n: 320 | if ((xinxin.dep_ == "advmod" and (xinxin.pos_ == "ADV" and ( 321 | xinxin.tag_ == "RBS" or xinxin.tag_ == "RBR"))) or ( 322 | xinxin.dep_ == "conj" and (xinxin.pos_ == "ADJ" and ( 323 | xinxin.tag_ == "JJR" or xinxin.tag_ == "JJS" or xinxin.tag_ == "JJ")))) and xinxin.lemma_.lower() != t.lemma_.lower(): 324 | minw = min(minw, xinxin.i) 325 | maxw = max(maxw, xinxin.i) 326 | sentim_app = FeelIt(xinxin.lemma_.lower(), xinxin.pos_, xinxin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 327 | foundadv = foundadv + "__" + xinxin.lemma_.lower() + " [" + xinxin.pos_ + ", " + xinxin.tag_ + " (" + str(sentim_app) + ")]" 328 | if foundadv_sentim == 0: 329 | foundadv_sentim = sentim_app 330 | else: 331 | if (foundadv_sentim > 0 and sentim_app < 0) or ( 332 | foundadv_sentim < 0 and sentim_app > 0): 333 | foundadv_sentim = foundadv_sentim + sentim_app 334 | else: 335 | foundadv_sentim = np.sign(foundadv_sentim) * ( 336 | abs(foundadv_sentim) + (1 - abs(foundadv_sentim)) * abs(sentim_app)) 337 | sentim_compound = FeelIt(xin.lemma_.lower(), xin.pos_, xin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 338 | listFoundModofVB.append((xin.lemma_.lower() + " [" + xin.pos_ + ", " + xin.tag_ + " (" + str(sentim_compound) + ")]" + foundadv)) 339 | if sentim_compound == 0: 340 | sentim_compound = foundadv_sentim 341 | else: 342 | if (foundadv_sentim > 0 and sentim_app < 0) or ( 343 | foundadv_sentim < 0 and sentim_app > 0): 344 | sentim_compound = np.sign(foundadv_sentim) * np.sign(sentim_compound) * ( 345 | abs(sentim_compound) + (1 - abs(sentim_compound)) * abs( 346 | foundadv_sentim)) 347 | else: 348 | sentim_compound = np.sign(sentim_compound) * ( 349 | abs(sentim_compound) + (1 - abs(sentim_compound)) * abs(foundadv_sentim)) 350 | listFoundModofVB_sentim.append(sentim_compound) 351 | minw = min(minw, xin.i) 352 | maxw = max(maxw, xin.i) 353 | elif (xin.dep_ == "dobj" or xin.dep_ == "attr") and xin.pos_ == "NOUN" and IsInterestingToken( 354 | xin) and xin.lemma_.lower() != t.lemma_.lower(): 355 | if (nountoskip): 356 | if xin.lemma_.lower() == nountoskip.lemma_.lower(): 357 | continue 358 | minw = min(minw, xin.i) 359 | maxw = max(maxw, xin.i) 360 | iterated_list_NOUN, sentilist, minw, maxw = NOUN_token_IE_parsing(xin, singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE,LOCATION_SYNONYMS_FOR_HEURISTIC, 361 | VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, minw=minw, maxw=maxw, 362 | verbtoskip=FoundVerb, nountoskip=t) 363 | sentim_noun = FeelIt(xin.lemma_.lower(), xin.pos_, xin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 364 | if iterated_list_NOUN and len(iterated_list_NOUN) > 0: 365 | # 366 | listNoun_app = [] 367 | for modin in iterated_list_NOUN: 368 | FoundNounInlist = "___" + xin.lemma_.lower() + " [" + xin.pos_ + ", " + xin.tag_ + " (" + str( 369 | sentim_noun) + ")]" + "___" + modin 370 | listNoun_app.append(FoundNounInlist) 371 | iterated_list_NOUN = listNoun_app 372 | # 373 | listFoundModofVB.extend(iterated_list_NOUN) 374 | else: 375 | FoundNounInlist = "___" + xin.lemma_.lower() + " [" + xin.pos_ + ", " + xin.tag_ + " (" + str( 376 | sentim_noun) + ")]" + "___" 377 | listFoundModofVB.append(FoundNounInlist) 378 | if sentilist and len(sentilist) > 0: 379 | for sentin in sentilist: 380 | if sentin != 0: 381 | if sentim_noun == 0: 382 | sentim_noun = sentin 383 | else: 384 | if (sentim_noun > 0 and sentin < 0) or ( 385 | sentim_noun < 0 and sentin > 0): 386 | sentim_noun = np.sign(sentin) * np.sign(sentim_noun) * ( 387 | abs(sentim_noun) + (1 - abs(sentim_noun)) * abs( 388 | sentin)) 389 | else: 390 | sentim_noun = np.sign(sentim_noun) * ( 391 | abs(sentim_noun) + (1 - abs(sentim_noun)) * abs( 392 | sentin)) 393 | listFoundModofVB_sentim.append(sentim_noun) 394 | elif xin.dep_ == "prep" and xin.pos_ == "ADP" and xin.lemma_.lower() != t.lemma_.lower(): 395 | minw = min(minw, xin.i) 396 | maxw = max(maxw, xin.i) 397 | iterated_list_prep, iterated_list_prep_sentim, minw, maxw = PREP_token_IE_parsing(xin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 398 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 399 | minw=minw,maxw=maxw,FoundVerb=FoundVerb,t=t,nountoskip=nountoskip,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 400 | if iterated_list_prep and len(iterated_list_prep) > 0: 401 | listFoundModofVB.extend(iterated_list_prep) 402 | if iterated_list_prep_sentim and len(iterated_list_prep_sentim) > 0: 403 | listFoundModofVB_sentim.extend(iterated_list_prep_sentim) 404 | if FoundNeg is None: 405 | for xinxin in lxin_n: 406 | if (xinxin.dep_ == "neg"): 407 | FoundNeg = "__not" 408 | minw = min(minw, xinxin.i) 409 | maxw = max(maxw, xinxin.i) 410 | for xinxin in rxin_n: 411 | if (xinxin.dep_ == "neg"): 412 | FoundNeg = "__not" 413 | minw = min(minw, xinxin.i) 414 | maxw = max(maxw, xinxin.i) 415 | l_r = [x for x in FoundVerb.rights] 416 | if l_r: 417 | for xin in l_r: 418 | lxin_n = [x for x in xin.lefts] 419 | rxin_n = [x for x in xin.rights] 420 | if xin.dep_ == "neg": 421 | FoundNeg = "__not" 422 | minw = min(minw, xin.i) 423 | maxw = max(maxw, xin.i) 424 | elif xin.dep_ == "advmod" and (xin.pos_ == "ADV" and (xin.tag_ == "RBS" or xin.tag_ == "RBR")): 425 | if (xin.lemma_.lower() in CompoundsOfSingleHit): 426 | continue 427 | minw = min(minw, xin.i) 428 | maxw = max(maxw, xin.i) 429 | sentim_app = FeelIt(xin.lemma_.lower(), xin.pos_, xin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 430 | FoundVerbAdverb = FoundVerbAdverb + "__" + xin.lemma_.lower() + " ["+xin.pos_+", "+xin.tag_+" ("+str(sentim_app)+ ")]" 431 | if FoundVerbAdverb_sentim == 0: 432 | FoundVerbAdverb_sentim = sentim_app 433 | else: 434 | if (FoundVerbAdverb_sentim > 0 and sentim_app < 0) or (FoundVerbAdverb_sentim < 0 and sentim_app > 0): 435 | FoundVerbAdverb_sentim = FoundVerbAdverb_sentim + sentim_app 436 | else: 437 | FoundVerbAdverb_sentim = np.sign(FoundVerbAdverb_sentim) * ( 438 | abs(FoundVerbAdverb_sentim) + (1 - abs(FoundVerbAdverb_sentim)) * abs(sentim_app)) 439 | elif (xin.dep_ == "acomp" or xin.dep_ == "oprd") and (xin.pos_ == "ADJ" and ( 440 | xin.tag_ == "JJR" or xin.tag_ == "JJS" or xin.tag_ == "JJ")) and xin.lemma_.lower() != t.lemma_.lower(): 441 | if xin.lemma_.lower() in CompoundsOfSingleHit: 442 | continue 443 | minw = min(minw, xin.i) 444 | maxw = max(maxw, xin.i) 445 | foundadv = "" 446 | foundadv_sentim = 0 447 | if lxin_n: 448 | for xinxin in lxin_n: 449 | if ((xinxin.dep_ == "advmod" and (xinxin.pos_ == "ADV" and ( 450 | xinxin.tag_ == "RBS" or xinxin.tag_ == "RBR"))) or ( 451 | xinxin.dep_ == "conj" and (xinxin.pos_ == "ADJ" and ( 452 | xinxin.tag_ == "JJR" or xinxin.tag_ == "JJS" or xinxin.tag_ == "JJ")))) and xinxin.lemma_.lower() != t.lemma_.lower(): 453 | minw = min(minw, xinxin.i) 454 | maxw = max(maxw, xinxin.i) 455 | sentim_app = FeelIt(xinxin.lemma_.lower(), xinxin.pos_, xinxin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 456 | foundadv = foundadv + "__" + xinxin.lemma_.lower() + " [" + xinxin.pos_ + ", " + xinxin.tag_ + " (" + str(sentim_app) + ")]" 457 | if foundadv_sentim == 0: 458 | foundadv_sentim = sentim_app 459 | else: 460 | if (foundadv_sentim > 0 and sentim_app < 0) or ( 461 | foundadv_sentim < 0 and sentim_app > 0): 462 | foundadv_sentim = foundadv_sentim + sentim_app 463 | else: 464 | foundadv_sentim = np.sign(foundadv_sentim) * ( 465 | abs(foundadv_sentim) + (1 - abs(foundadv_sentim)) * abs(sentim_app)) 466 | if rxin_n: 467 | for xinxin in rxin_n: 468 | if ((xinxin.dep_ == "advmod" and (xinxin.pos_ == "ADV" and ( 469 | xinxin.tag_ == "RBS" or xinxin.tag_ == "RBR"))) or ( 470 | xinxin.dep_ == "conj" and (xinxin.pos_ == "ADJ" and ( 471 | xinxin.tag_ == "JJR" or xinxin.tag_ == "JJS" or xinxin.tag_ == "JJ")))) and xinxin.lemma_.lower() != t.lemma_.lower(): 472 | minw = min(minw, xinxin.i) 473 | maxw = max(maxw, xinxin.i) 474 | sentim_app = FeelIt(xinxin.lemma_.lower(), xinxin.pos_, xinxin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 475 | foundadv = foundadv + "__" + xinxin.lemma_.lower() + " ["+xinxin.pos_+", "+xinxin.tag_+" ("+str(sentim_app)+ ")]" 476 | if foundadv_sentim == 0: 477 | foundadv_sentim = sentim_app 478 | else: 479 | if (foundadv_sentim > 0 and sentim_app < 0) or ( 480 | foundadv_sentim < 0 and sentim_app > 0): 481 | foundadv_sentim = foundadv_sentim + sentim_app 482 | else: 483 | foundadv_sentim = np.sign(foundadv_sentim) * ( 484 | abs(foundadv_sentim) + (1 - abs(foundadv_sentim)) * abs(sentim_app)) 485 | sentim_compound = FeelIt(xin.lemma_.lower(), xin.pos_, xin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 486 | listFoundModofVB.append((xin.lemma_.lower() + " [" + xin.pos_ + ", " + xin.tag_ + " (" + str(sentim_compound) + ")]" + foundadv)) 487 | if sentim_compound == 0: 488 | sentim_compound = foundadv_sentim 489 | else: 490 | if (foundadv_sentim > 0 and sentim_compound < 0) or (foundadv_sentim < 0 and sentim_compound > 0): 491 | sentim_compound = np.sign(foundadv_sentim) * np.sign(sentim_compound) * ( 492 | abs(sentim_compound) + (1 - abs(sentim_compound)) * abs( 493 | foundadv_sentim)) 494 | else: 495 | sentim_compound = np.sign(sentim_compound) * ( 496 | abs(sentim_compound) + (1 - abs(sentim_compound)) * abs(foundadv_sentim)) 497 | listFoundModofVB_sentim.append(sentim_compound) 498 | elif (xin.dep_ == "dobj" or xin.dep_ == "attr") and xin.pos_ == "NOUN" and IsInterestingToken( 499 | xin) and xin.lemma_.lower() != t.lemma_.lower(): 500 | if nountoskip: 501 | if xin.lemma_.lower() == nountoskip.lemma_.lower(): 502 | continue 503 | minw = min(minw, xin.i) 504 | maxw = max(maxw, xin.i) 505 | iterated_list_NOUN, sentilist, minw, maxw = NOUN_token_IE_parsing(xin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE,LOCATION_SYNONYMS_FOR_HEURISTIC, 506 | VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, minw=minw, maxw=maxw, 507 | verbtoskip=FoundVerb, nountoskip=t) 508 | sentim_noun = FeelIt(xin.lemma_.lower(), xin.pos_, xin, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 509 | if iterated_list_NOUN and len(iterated_list_NOUN) > 0: 510 | # 511 | listNoun_app = [] 512 | for modin in iterated_list_NOUN: 513 | FoundNounInlist = "___" + xin.lemma_.lower() + " ["+xin.pos_+", "+xin.tag_+" ("+str(sentim_noun)+ ")]" + "___" + modin 514 | listNoun_app.append(FoundNounInlist) 515 | iterated_list_NOUN = listNoun_app 516 | # 517 | listFoundModofVB.extend(iterated_list_NOUN) 518 | else: 519 | FoundNounInlist = "___" + xin.lemma_.lower() + " ["+xin.pos_+", "+xin.tag_+" ("+str(sentim_noun)+ ")]" + "___" 520 | listFoundModofVB.append(FoundNounInlist) 521 | if sentilist and len(sentilist) > 0: 522 | for sentin in sentilist: 523 | if sentin != 0: 524 | if sentim_noun == 0: 525 | sentim_noun = sentin 526 | else: 527 | if (sentim_noun > 0 and sentin < 0) or ( 528 | sentim_noun < 0 and sentin > 0): 529 | sentim_noun = np.sign(sentin) * np.sign(sentim_noun) * ( 530 | abs(sentim_noun) + (1 - abs(sentim_noun)) * abs( 531 | sentin)) 532 | else: 533 | sentim_noun = np.sign(sentim_noun) * ( 534 | abs(sentim_noun) + (1 - abs(sentim_noun)) * abs( 535 | sentin)) 536 | listFoundModofVB_sentim.append(sentim_noun) 537 | elif xin.dep_ == "prep" and xin.pos_ == "ADP" and xin.lemma_.lower() != t.lemma_.lower(): 538 | minw = min(minw, xin.i) 539 | maxw = max(maxw, xin.i) 540 | iterated_list_prep, iterated_list_prep_sentim, minw, maxw = PREP_token_IE_parsing(xin,singleNOUNs,singleCompoundedHITs, 541 | singleCompoundedHITs_toEXCLUDE,LOCATION_SYNONYMS_FOR_HEURISTIC, 542 | VERBS_TO_KEEP, COMPUTE_OVERALL_SENTIMENT_SCORE, minw=minw, 543 | maxw=maxw,FoundVerb=FoundVerb,t=t,nountoskip=nountoskip,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 544 | if iterated_list_prep and len(iterated_list_prep) > 0: 545 | listFoundModofVB.extend(iterated_list_prep) 546 | if iterated_list_prep_sentim and len(iterated_list_prep_sentim) > 0: 547 | listFoundModofVB_sentim.extend(iterated_list_prep_sentim) 548 | if FoundNeg is None: 549 | for xinxin in lxin_n: 550 | if (xinxin.dep_ == "neg"): 551 | FoundNeg = "__not" 552 | minw = min(minw, xinxin.i) 553 | maxw = max(maxw, xinxin.i) 554 | for xinxin in rxin_n: 555 | if (xinxin.dep_ == "neg"): 556 | FoundNeg = "__not" 557 | minw = min(minw, xinxin.i) 558 | maxw = max(maxw, xinxin.i) 559 | sentim_vb = FeelIt(FoundVerb.lemma_.lower(), FoundVerb.pos_, FoundVerb, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 560 | if not listFoundModofVB or len(listFoundModofVB) <= 0: 561 | if (FoundVerb.lemma_.lower() != "be" and FoundVerb.lemma_.lower() != "have"): 562 | FoundVerb_name = FoundVerb.lemma_.lower() + " ["+FoundVerb.pos_+", "+FoundVerb.tag_+" ("+str(sentim_vb)+ ")]" + FoundVerbAdverb 563 | listVerbs.append(FoundVerb_name) 564 | if FoundVerbAdverb_sentim != 0: 565 | if sentim_vb == 0: 566 | sentim_vb = FoundVerbAdverb_sentim 567 | else: 568 | if (sentim_vb > 0 and FoundVerbAdverb_sentim < 0) or ( 569 | sentim_vb < 0 and FoundVerbAdverb_sentim > 0): 570 | sentim_vb = np.sign(FoundVerbAdverb_sentim) * np.sign(sentim_vb) * ( 571 | abs(sentim_vb) + (1 - abs(sentim_vb)) * abs( 572 | FoundVerbAdverb_sentim)) 573 | else: 574 | sentim_vb = np.sign(sentim_vb) * ( 575 | abs(sentim_vb) + (1 - abs(sentim_vb)) * abs( 576 | FoundVerbAdverb_sentim)) 577 | listVerbs_sentim = [sentim_vb] 578 | else: 579 | minw = min(minw, FoundVerb.i) 580 | maxw = max(maxw, FoundVerb.i) 581 | FoundVBInlist = "___" + FoundVerb.lemma_.lower() + " ["+FoundVerb.pos_+", "+FoundVerb.tag_+" ("+str(sentim_vb)+ ")]" + "___" 582 | for j in range(0, len(listFoundModofVB)): 583 | modin = listFoundModofVB[j] 584 | FoundVBInlist = FoundVBInlist + modin 585 | if j < len(listFoundModofVB) - 1 and (modin.endswith('__') == False): 586 | FoundVBInlist = FoundVBInlist + "__" 587 | for j in range(0, len(listFoundModofVB_sentim)): 588 | sentin = listFoundModofVB_sentim[j] 589 | if sentin != 0: 590 | if sentim_vb == 0: 591 | sentim_vb = sentin 592 | else: 593 | if (sentim_vb > 0 and sentin < 0) or ( 594 | sentim_vb < 0 and sentin > 0): 595 | sentim_vb = np.sign(sentin) * np.sign(sentim_vb) * ( 596 | abs(sentim_vb) + (1 - abs(sentim_vb)) * abs( 597 | sentin)) 598 | else: 599 | sentim_vb = np.sign(sentim_vb) * ( 600 | abs(sentim_vb) + (1 - abs(sentim_vb)) * abs( 601 | sentin)) 602 | listVerbs.append(FoundVBInlist) 603 | listVerbs_sentim.append(sentim_vb) 604 | listVerbs_app = [] 605 | if FoundNeg == "__not" and len(listVerbs) > 0: 606 | for modin in listVerbs: 607 | listVerbs_app.append(modin + "__not") 608 | listVerbs = listVerbs_app 609 | return listVerbs, listVerbs_sentim, minw, maxw 610 | 611 | 612 | def IsInterestingToken(t): 613 | ret = False 614 | if t.ent_type_ == "" or t.ent_type_ == 'ORG' or t.ent_type_ == 'GPE' or t.ent_type_ == 'PRODUCT' or t.ent_type_ == 'EVENT' or t.ent_type_ == 'LAW' or t.ent_type_ == 'MONEY' or t.ent_type_ == 'QUANTITY' or t.ent_type_ == 'LOC' or t.ent_type_ == 'NORP': 615 | ret = True 616 | return ret 617 | 618 | 619 | def NOUN_token_IE_parsing(t, singleNOUNs, singleCompoundedHITs, singleCompoundedHITs_toEXCLUDE,LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE,minw, maxw, verbtoskip=None, nountoskip=None, UseSenticNet=True, UseSentiWordNet=True, UseFigas=True): 620 | to_give_back = [] 621 | to_give_back_sentiment = [] 622 | ll = [x for x in t.lefts] 623 | rr = [x for x in t.rights] 624 | CompoundsOfSingleHit = findCompoundedHITsOfTerm(singleCompoundedHITs, t.lemma_.lower()) 625 | listVerbs = [] 626 | listVerbs_sentim = [] 627 | FoundVerb = None 628 | if t.head.pos_ == "VERB" and (not t.head is verbtoskip): 629 | FoundVerb = t.head 630 | minw = min(minw, FoundVerb.i) 631 | maxw = max(maxw, FoundVerb.i) 632 | lvin_n = [x for x in FoundVerb.lefts] 633 | rvin_n = [x for x in FoundVerb.rights] 634 | if lvin_n: 635 | for vin in lvin_n: 636 | lvin_inner = [x for x in vin.lefts] 637 | rvin_inner = [x for x in vin.rights] 638 | if (vin.dep_ == "xcomp" or vin.dep_ == "advcl") and vin.lemma_.lower() != t.lemma_.lower() and vin.lemma_.lower() != FoundVerb.lemma_.lower() and vin.pos_ == "VERB": 639 | if (not verbtoskip) or (vin.lemma_.lower() != verbtoskip.lemma_.lower()): 640 | minw = min(minw, vin.i) 641 | maxw = max(maxw, vin.i) 642 | list_verbs_app, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing( 643 | vin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 644 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 645 | t, minw, maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 646 | if list_verbs_app and len(list_verbs_app) > 0: 647 | listVerbs.extend(list_verbs_app) 648 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 649 | listVerbs_sentim.extend(list_verbs_sentim_app) 650 | elif (vin.dep_ == "acomp" or vin.dep_ == "oprd") and vin.lemma_.lower() != t.lemma_.lower() and \ 651 | vin.lemma_.lower() != FoundVerb.lemma_.lower(): 652 | if lvin_inner: 653 | for vinvin in lvin_inner: 654 | if (vinvin.dep_ == "xcomp" or vinvin.dep_ == "advcl") and vinvin.lemma_.lower() != t.lemma_.lower() and vinvin.lemma_.lower() != FoundVerb.lemma_.lower() and vinvin.pos_ == "VERB": 655 | if (not verbtoskip) or (vinvin.lemma_.lower() != verbtoskip.lemma_.lower()): 656 | minw = min(minw, vinvin.i) 657 | maxw = max(maxw, vinvin.i) 658 | list_verbs_app, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing( 659 | vinvin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 660 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 661 | t,minw,maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 662 | if list_verbs_app and len(list_verbs_app) > 0: 663 | listVerbs.extend(list_verbs_app) 664 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 665 | listVerbs_sentim.extend(list_verbs_sentim_app) 666 | if rvin_inner: 667 | for vinvin in rvin_inner: 668 | if (vinvin.dep_ == "xcomp" or vinvin.dep_ == "advcl") and vinvin.lemma_.lower() != t.lemma_.lower() and vinvin.lemma_.lower() != FoundVerb.lemma_.lower() and vinvin.pos_ == "VERB": 669 | if (not verbtoskip) or (vinvin.lemma_.lower() != verbtoskip.lemma_.lower()): 670 | minw = min(minw, vinvin.i) 671 | maxw = max(maxw, vinvin.i) 672 | list_verbs_app, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing( 673 | vinvin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 674 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 675 | t,minw,maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 676 | if list_verbs_app and len(list_verbs_app) > 0: 677 | listVerbs.extend(list_verbs_app) 678 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 679 | listVerbs_sentim.extend(list_verbs_sentim_app) 680 | if rvin_n: 681 | for vin in rvin_n: 682 | lvin_inner = [x for x in vin.lefts] 683 | rvin_inner = [x for x in vin.rights] 684 | if (vin.dep_ == "xcomp" or vin.dep_ == "advcl") and vin.lemma_.lower() != t.lemma_.lower() and vin.lemma_.lower() != FoundVerb.lemma_.lower() and vin.pos_ == "VERB": 685 | if (not verbtoskip) or (vin.lemma_.lower() != verbtoskip.lemma_.lower()): 686 | minw = min(minw, vin.i) 687 | maxw = max(maxw, vin.i) 688 | list_verbs_app, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing( 689 | vin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 690 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 691 | t, minw, maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 692 | if list_verbs_app and len(list_verbs_app) > 0: 693 | listVerbs.extend(list_verbs_app) 694 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 695 | listVerbs_sentim.extend(list_verbs_sentim_app) 696 | elif (vin.dep_ == "acomp" or vin.dep_ == "oprd") and vin.lemma_.lower() != t.lemma_.lower() and vin.lemma_.lower() != FoundVerb.lemma_.lower(): 697 | if lvin_inner: 698 | for vinvin in lvin_inner: 699 | if ( 700 | vinvin.dep_ == "xcomp" or vinvin.dep_ == "advcl") and vinvin.lemma_.lower() != t.lemma_.lower() and vinvin.lemma_.lower() != FoundVerb.lemma_.lower() and vinvin.pos_ == "VERB": 701 | if (not verbtoskip) or (vinvin.lemma_.lower() != verbtoskip.lemma_.lower()): 702 | minw = min(minw, vinvin.i) 703 | maxw = max(maxw, vinvin.i) 704 | list_verbs_app, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing(vinvin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 705 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 706 | t,minw,maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 707 | if list_verbs_app and len(list_verbs_app) > 0: 708 | listVerbs.extend(list_verbs_app) 709 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 710 | listVerbs_sentim.extend(list_verbs_sentim_app) 711 | if rvin_inner: 712 | for vinvin in rvin_inner: 713 | if ( 714 | vinvin.dep_ == "xcomp" or vinvin.dep_ == "advcl") and vinvin.lemma_.lower() != t.lemma_.lower() and vinvin.lemma_.lower() != FoundVerb.lemma_.lower() and vinvin.pos_ == "VERB": 715 | if (not verbtoskip) or (vinvin.lemma_.lower() != verbtoskip.lemma_.lower()): 716 | minw = min(minw, vinvin.i) 717 | maxw = max(maxw, vinvin.i) 718 | list_verbs_app, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing(vinvin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 719 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 720 | t,minw,maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 721 | if list_verbs_app and len(list_verbs_app) > 0: 722 | listVerbs.extend(list_verbs_app) 723 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 724 | listVerbs_sentim.extend(list_verbs_sentim_app) 725 | if (not verbtoskip) or (FoundVerb.lemma_.lower() != verbtoskip.lemma_.lower()): 726 | minw = min(minw, FoundVerb.i) 727 | maxw = max(maxw, FoundVerb.i) 728 | list_verbs_app, list_verbs_sentim_app, minw, maxw = VERB_token_IE_parsing(FoundVerb, singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 729 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 730 | t, minw=minw, maxw=maxw,nountoskip=nountoskip,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 731 | if list_verbs_app and len(list_verbs_app) > 0: 732 | listVerbs.extend(list_verbs_app) 733 | if list_verbs_sentim_app and len(list_verbs_sentim_app) > 0: 734 | listVerbs_sentim.extend(list_verbs_sentim_app) 735 | # ------------------------------------------------------------------------------------------------ 736 | listAMODs = [] 737 | listAMODs_sentim = [] 738 | FoundAMOD = None 739 | FoundNeg_left = None 740 | if ll: 741 | for xin in ll: 742 | lxin_n = [x for x in xin.lefts] 743 | rxin_n = [x for x in xin.rights] 744 | if (xin.dep_ == "neg"): 745 | FoundNeg_left = "__not" 746 | minw = min(minw, xin.i) 747 | maxw = max(maxw, xin.i) 748 | elif (xin.dep_ == "amod" and ( 749 | (xin.pos_ == "ADJ" and (xin.tag_ == "JJR" or xin.tag_ == "JJS" or xin.tag_ == "JJ")) or ( 750 | xin.pos_ == "VERB")) and xin.lemma_.lower() != t.lemma_.lower()): 751 | if (xin.lemma_.lower() in CompoundsOfSingleHit): 752 | continue 753 | FoundAMOD = xin 754 | FoundAMOD_sentiment = FeelIt(FoundAMOD.lemma_.lower(), FoundAMOD.pos_, FoundAMOD, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 755 | listAMODs_sentim.append(FoundAMOD_sentiment) 756 | FoundAMOD_name = FoundAMOD.lemma_.lower() + " ["+FoundAMOD.pos_+", "+FoundAMOD.tag_+" ("+str(FoundAMOD_sentiment)+ ")]" 757 | listAMODs.append(FoundAMOD_name) 758 | minw = min(minw, FoundAMOD.i) 759 | maxw = max(maxw, FoundAMOD.i) 760 | elif xin.dep_ == "acl" and xin.lemma_.lower() != t.lemma_.lower(): 761 | if ( 762 | xin.pos_ == "VERB"): 763 | minw = min(minw, xin.i) 764 | maxw = max(maxw, xin.i) 765 | iterated_list_VERB, iterated_list_VERB_sentiment, minw, maxw = VERB_token_IE_parsing(xin, singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 766 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 767 | t, minw,maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 768 | if iterated_list_VERB and len(iterated_list_VERB) > 0: 769 | listAMODs.extend(iterated_list_VERB) 770 | if iterated_list_VERB_sentiment and len(iterated_list_VERB_sentiment) > 0: 771 | listAMODs_sentim.extend(iterated_list_VERB_sentiment) 772 | elif xin.dep_ == "prep" and xin.pos_ == "ADP" and xin.lemma_.lower() != t.lemma_.lower() and ( 773 | nountoskip is None): 774 | minw = min(minw, xin.i) 775 | maxw = max(maxw, xin.i) 776 | iterated_list_prep, iterated_list_prep_sentim, minw, maxw = PREP_token_IE_parsing(xin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 777 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 778 | minw=minw,maxw=maxw,FoundVerb=FoundVerb,t=t,nountoskip=nountoskip,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 779 | if iterated_list_prep and len(iterated_list_prep) > 0: 780 | listAMODs.extend(iterated_list_prep) 781 | if iterated_list_prep_sentim and len(iterated_list_prep_sentim) > 0: 782 | listAMODs_sentim.extend(iterated_list_prep_sentim) 783 | FoundNeg_right = None 784 | if rr: 785 | for xin in rr: 786 | lxin_n = [x for x in xin.lefts] 787 | rxin_n = [x for x in xin.rights] 788 | if (xin.dep_ == "neg"): 789 | FoundNeg_right = "__not" 790 | minw = min(minw, xin.i) 791 | maxw = max(maxw, xin.i) 792 | elif (xin.dep_ == "amod" and ( 793 | (xin.pos_ == "ADJ" and (xin.tag_ == "JJR" or xin.tag_ == "JJS" or xin.tag_ == "JJ")) or ( 794 | xin.pos_ == "VERB")) and xin.lemma_.lower() != t.lemma_.lower()): 795 | if (xin.lemma_.lower() in CompoundsOfSingleHit): 796 | continue 797 | FoundAMOD = xin 798 | FoundAMOD_sentiment = FeelIt(FoundAMOD.lemma_.lower(), FoundAMOD.pos_, FoundAMOD, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 799 | listAMODs_sentim.append(FoundAMOD_sentiment) 800 | FoundAMOD_name = FoundAMOD.lemma_.lower() + " ["+FoundAMOD.pos_+", "+FoundAMOD.tag_+" ("+str(FoundAMOD_sentiment)+ ")]" 801 | listAMODs.append(FoundAMOD_name) 802 | minw = min(minw, FoundAMOD.i) 803 | maxw = max(maxw, FoundAMOD.i) 804 | elif xin.dep_ == "acl" and xin.lemma_.lower() != t.lemma_.lower(): 805 | if xin.pos_ == "VERB": 806 | minw = min(minw, xin.i) 807 | maxw = max(maxw, xin.i) 808 | iterated_list_VERB, iterated_list_VERB_sentiment, minw, maxw = VERB_token_IE_parsing(xin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 809 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 810 | t, minw,maxw,nountoskip=nountoskip,previousverb=FoundVerb,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 811 | if iterated_list_VERB and len(iterated_list_VERB) > 0: 812 | listAMODs.extend(iterated_list_VERB) 813 | if iterated_list_VERB_sentiment and len(iterated_list_VERB_sentiment) > 0: 814 | listAMODs_sentim.extend(iterated_list_VERB_sentiment) 815 | elif xin.dep_ == "prep" and xin.pos_ == "ADP" and xin.lemma_.lower() != t.lemma_.lower() and ( 816 | nountoskip is None): 817 | minw = min(minw, xin.i) 818 | maxw = max(maxw, xin.i) 819 | iterated_list_prep, iterated_list_prep_sentim, minw, maxw = PREP_token_IE_parsing(xin,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 820 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 821 | minw=minw,maxw=maxw,FoundVerb=FoundVerb,t=t,nountoskip=nountoskip,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 822 | if iterated_list_prep and len(iterated_list_prep) > 0: 823 | listAMODs.extend(iterated_list_prep) 824 | if iterated_list_prep_sentim and len(iterated_list_prep_sentim) > 0: 825 | listAMODs_sentim.extend(iterated_list_prep_sentim) 826 | listAMODs_app = [] 827 | if (FoundNeg_left == "__not" or FoundNeg_right == "__not") and len(listAMODs) > 0: 828 | for modin in listAMODs: 829 | listAMODs_app.append(modin + "__not") 830 | listAMODs = listAMODs_app 831 | if len(listAMODs) > 0: 832 | to_give_back.extend(listAMODs) 833 | if len(listVerbs) > 0: 834 | to_give_back.extend(listVerbs) 835 | if len(listAMODs_sentim) > 0: 836 | to_give_back_sentiment.extend(listAMODs_sentim) 837 | if len(listVerbs_sentim) > 0: 838 | to_give_back_sentiment.extend(listVerbs_sentim) 839 | return to_give_back, to_give_back_sentiment, minw, maxw 840 | 841 | 842 | def determine_tense_input(tagged, posextractedn): 843 | for ww in tagged: 844 | if ww.pos_ == "VERB" and ww.dep_ == "aux" and ( 845 | ww.tag_ == "VBP" or ww.tag_ == "VBZ") and ww.head.lower_ == "going": 846 | lll = [x for x in ww.head.rights] 847 | for xxx in lll: 848 | if xxx.pos_ == "VERB" and xxx.tag_ == "VB": 849 | ww.tag_ = "MD" 850 | tense = {} 851 | future_words = [word for word in tagged if word.tag_ == "MD"] 852 | present_words = [word for word in tagged if word.tag_ in ["VBP", "VBZ", "VBG"]] 853 | pass_words = [word for word in tagged if word.tag_ in ["VBD", "VBN"]] 854 | inf_words = [word for word in tagged if word.tag_ in ["VB"]] 855 | valfuture = 0 856 | for word in future_words: 857 | valfuture = valfuture + 1 / abs(posextractedn - word.i) 858 | valpresent = 0 859 | for word in present_words: 860 | valpresent = valpresent + 1 / abs(posextractedn - word.i) 861 | valpass = 0 862 | for word in pass_words: 863 | valpass = valpass + 1 / abs(posextractedn - word.i) 864 | valinf = 0 865 | for word in inf_words: 866 | valinf = valinf + 1 / abs(posextractedn - word.i) 867 | tense["future"] = valfuture 868 | tense["present"] = valpresent 869 | tense["past"] = valpass 870 | return (tense) 871 | 872 | 873 | def keep_token_IE(t, singleNOUNs, singleCompoundedHITs, singleCompoundedHITs_toEXCLUDE, most_frequent_loc_DOC, LOCATION_SYNONYMS_FOR_HEURISTIC, VERBS_TO_KEEP, COMPUTE_OVERALL_SENTIMENT_SCORE, MOST_FREQ_LOC_HEURISTIC,UseSenticNet=True, UseSentiWordNet=True, UseFigas=True): 874 | to_give_back = [] 875 | sentiment_to_give_back = [] 876 | spantogiveback = [] 877 | textsentencetogiveback = [] 878 | tensetogiveback = [] 879 | if t.is_alpha and not (t.is_space or t.is_punct or t.is_stop or t.like_num) and t.pos_ == "NOUN": 880 | CompoundsOfSingleHit = findCompoundedHITsOfTerm(singleCompoundedHITs, t.lemma_.lower()) 881 | if (t.lemma_.lower() in singleNOUNs) or (CompoundsOfSingleHit): 882 | FoundCompound = None 883 | ll = [x for x in t.lefts] 884 | if not FoundCompound: 885 | if ll: 886 | for xin in ll: 887 | if (( 888 | xin.lemma_.lower() in CompoundsOfSingleHit)) and xin.lemma_.lower() != t.lemma_.lower(): 889 | FoundCompound = xin 890 | break 891 | rr = [x for x in t.rights] 892 | if not FoundCompound: 893 | if rr: 894 | for xin in rr: 895 | if (( 896 | xin.lemma_.lower() in CompoundsOfSingleHit)) and xin.lemma_.lower() != t.lemma_.lower(): 897 | FoundCompound = xin 898 | break 899 | if FoundCompound or t.lemma_.lower() in singleNOUNs: 900 | if singleCompoundedHITs_toEXCLUDE: 901 | CompoundsOfSingleHitToExclude = findCompoundedHITsOfTerm(singleCompoundedHITs_toEXCLUDE, 902 | t.lemma_.lower()) 903 | if CompoundsOfSingleHitToExclude: 904 | if ll: 905 | for xin in ll: 906 | if xin.lemma_.lower() in CompoundsOfSingleHitToExclude: 907 | return to_give_back, sentiment_to_give_back, spantogiveback, \ 908 | textsentencetogiveback,tensetogiveback 909 | if rr: 910 | for xin in rr: 911 | if xin.lemma_.lower() in CompoundsOfSingleHitToExclude: 912 | return to_give_back, sentiment_to_give_back, spantogiveback, \ 913 | textsentencetogiveback,tensetogiveback 914 | # 915 | if MOST_FREQ_LOC_HEURISTIC is True: 916 | most_frequent_loc_SENTENCE = determine_location_heuristic(t.sent.ents, t.i, t, 917 | LOCATION_SYNONYMS_FOR_HEURISTIC) 918 | if most_frequent_loc_DOC == LOCATION_SYNONYMS_FOR_HEURISTIC[0].lower(): 919 | if (most_frequent_loc_SENTENCE == LOCATION_SYNONYMS_FOR_HEURISTIC[0].lower() or most_frequent_loc_SENTENCE == "") == False: 920 | return to_give_back, sentiment_to_give_back, spantogiveback, \ 921 | textsentencetogiveback, tensetogiveback 922 | else: 923 | if most_frequent_loc_SENTENCE != LOCATION_SYNONYMS_FOR_HEURISTIC[0].lower(): 924 | return to_give_back, sentiment_to_give_back, spantogiveback, \ 925 | textsentencetogiveback, tensetogiveback 926 | if FoundCompound: 927 | minw = min(FoundCompound.i, t.i) 928 | maxw = max(FoundCompound.i, t.i) 929 | else: 930 | minw = t.i 931 | maxw = t.i 932 | if COMPUTE_OVERALL_SENTIMENT_SCORE is True: 933 | OSpolarity = FeelIt_OverallSentiment(t,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 934 | toveralltestsentece__ = t.sent.text.replace(" ", "__") 935 | to_give_back.append(toveralltestsentece__) 936 | sentiment_to_give_back.append(OSpolarity) 937 | minw = t.sent.start 938 | maxw = t.sent.end - 1 939 | else: 940 | to_give_back, sentiment_to_give_back, minw, maxw = NOUN_token_IE_parsing(t,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE, 941 | LOCATION_SYNONYMS_FOR_HEURISTIC,VERBS_TO_KEEP,COMPUTE_OVERALL_SENTIMENT_SCORE, 942 | minw=minw,maxw=maxw, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 943 | tryl = True 944 | while tryl == True: 945 | if t.doc[minw].pos_ == "VERB": 946 | tryl = False 947 | for xis in t.doc[minw].lefts: 948 | if (xis.dep_ == "aux" and xis.pos_ == "VERB"): 949 | minw = xis.i 950 | tryl = True 951 | else: 952 | tryl = False 953 | tryl = True 954 | while tryl == True: 955 | if t.doc[maxw].pos_ == "VERB": 956 | tryl = False 957 | for xis in t.doc[maxw].rights: 958 | if (xis.dep_ == "aux" and xis.pos_ == "VERB"): 959 | maxw = xis.i 960 | tryl = True 961 | else: 962 | tryl = False 963 | spansentence = t.doc[minw:(maxw + 1)] 964 | tensedict = determine_tense_input(spansentence, t.i) 965 | tense = "NaN" 966 | tupletense = max(tensedict.items(), key=operator.itemgetter(1)) # [0] 967 | if tupletense[1] > 0: 968 | tense = tupletense[0] 969 | tensetogiveback = [str(tense)] 970 | if (tense in VERBS_TO_KEEP) == False: 971 | return [], [], [], [], [] 972 | if sentiment_to_give_back and len(sentiment_to_give_back) > 0: 973 | sentim_noun = FeelIt(t.lemma_.lower(), t.pos_, t, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 974 | if to_give_back and len(to_give_back) > 0: 975 | if len(to_give_back) == len(sentiment_to_give_back): 976 | listNoun_app = [] 977 | listSentim_app = [] 978 | if FoundCompound: 979 | FoundNounInlist_ALLTOGETHER = "---" + FoundCompound.lemma_.lower() + " " + t.lemma_.lower() + "---" 980 | else: 981 | FoundNounInlist_ALLTOGETHER = "---" + t.lemma_.lower() + "---" 982 | numberofvaluesent = 0 983 | sumofvaluessent = 0 984 | for ii in range(len(to_give_back)): 985 | modin = to_give_back[ii] 986 | sentin = sentiment_to_give_back[ii] 987 | if FoundCompound: 988 | FoundNounInlist = "---" + FoundCompound.lemma_.lower() + " " + t.lemma_.lower() + "---" + modin 989 | else: 990 | FoundNounInlist = "---" + t.lemma_.lower() + "---" + modin 991 | listNoun_app.append(FoundNounInlist) 992 | FoundNounInlist_ALLTOGETHER = FoundNounInlist_ALLTOGETHER + "+++" + modin 993 | if sentin != 0 and sentim_noun != 0: 994 | sentin = np.sign(sentin) * np.sign(sentim_noun) * abs(sentin) 995 | if "__not" in FoundNounInlist: 996 | sentin = (-1) * sentin 997 | listSentim_app.append(sentin) 998 | if sentin != 0: 999 | numberofvaluesent = numberofvaluesent + 1 1000 | sumofvaluessent = sumofvaluessent + sentin 1001 | listNoun_app2 = [] 1002 | listSentim_app2 = [] 1003 | listNoun_app2.append(FoundNounInlist_ALLTOGETHER) 1004 | if numberofvaluesent > 0: 1005 | avgsent_ALLTOGETHER = sumofvaluessent / numberofvaluesent 1006 | else: 1007 | avgsent_ALLTOGETHER = 0 1008 | listSentim_app2.append(avgsent_ALLTOGETHER) 1009 | to_give_back = listNoun_app2 1010 | sentiment_to_give_back = listSentim_app2 1011 | spantogiveback = [spansentence.text] 1012 | textsentencetogiveback = [t.sent.text] 1013 | return to_give_back, sentiment_to_give_back, spantogiveback, textsentencetogiveback, tensetogiveback 1014 | 1015 | 1016 | def Most_Common(lista): 1017 | country = "" 1018 | if lista: 1019 | data = Counter(lista) 1020 | ordered_c = data.most_common() 1021 | country = ordered_c[0][0] 1022 | max_freq = ordered_c[0][1] 1023 | for j in range(0, len(ordered_c)): 1024 | if ordered_c[j][1] < max_freq: 1025 | break 1026 | return country 1027 | 1028 | 1029 | def findCompoundedHITsOfTerm(vector, term): 1030 | term = str(term) 1031 | outArray = [] 1032 | for x in vector: 1033 | if term.lower() in x.lower(): 1034 | compoundMinusTerm = x.lower().replace(term.lower(), "").strip() 1035 | outArray.append(compoundMinusTerm) 1036 | return outArray 1037 | 1038 | 1039 | def determine_location_heuristic(doc_entities, posextractedn, t,LOCATION_SYNONYMS_FOR_HEURISTIC): 1040 | most_probable_loc = "" 1041 | tagged = [] 1042 | for loc in doc_entities: 1043 | if loc.label_ == "GPE" or loc.label_ == "NORP" or loc.label_ == "LOC" or loc.label_ == "ORG": 1044 | tagged.append(loc) 1045 | if len(tagged) > 0: 1046 | unique_loc_labels = [] 1047 | unique_loc_values = [] 1048 | for loc in tagged: 1049 | x = loc.lemma_.lower() 1050 | if (x in LOCATION_SYNONYMS_FOR_HEURISTIC) or (removearticles(x) in LOCATION_SYNONYMS_FOR_HEURISTIC): 1051 | x = LOCATION_SYNONYMS_FOR_HEURISTIC[0].lower() 1052 | if x not in unique_loc_labels: 1053 | unique_loc_labels.append(x) 1054 | valword = 0 1055 | for word in tagged: 1056 | y = word.lemma_.lower() 1057 | if ((y in LOCATION_SYNONYMS_FOR_HEURISTIC) or ( 1058 | removearticles(y) in LOCATION_SYNONYMS_FOR_HEURISTIC)): 1059 | y = LOCATION_SYNONYMS_FOR_HEURISTIC[0].lower() 1060 | if y == x: 1061 | avgpostne = word.start + int((word.end - word.start) / 2) 1062 | dividendum = abs(posextractedn - avgpostne) 1063 | if dividendum == 0: 1064 | dividendum = 1 1065 | valword = valword + 1 / dividendum 1066 | unique_loc_values.append(valword) 1067 | maxvalue = max(unique_loc_values) 1068 | indices_max = [i for i, x in enumerate(unique_loc_values) if x == maxvalue] 1069 | most_probable_loc = unique_loc_labels[indices_max[0]] 1070 | for wwind in indices_max: 1071 | ww = unique_loc_labels[wwind] 1072 | if ww == LOCATION_SYNONYMS_FOR_HEURISTIC[0].lower(): 1073 | most_probable_loc = ww 1074 | return most_probable_loc 1075 | 1076 | 1077 | def removearticles(text): 1078 | removed = re.sub('\s+(a|an|and|the)(\s+)', ' ', " " + text + " ") 1079 | removed = re.sub(' +', ' ', removed) 1080 | removed = removed.strip() 1081 | return removed 1082 | 1083 | 1084 | def lemmatize_doc_IE_Sentiment(doc,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE,LOCATION_SYNONYMS_FOR_HEURISTIC, VERBS_TO_KEEP, COMPUTE_OVERALL_SENTIMENT_SCORE,MOST_FREQ_LOC_HEURISTIC,UseSenticNet=True, UseSentiWordNet=True, UseFigas=True): 1085 | vect = [] 1086 | vect_sentiment = [] 1087 | vect_spans = [] 1088 | vect_text = [] 1089 | vect_tense = [] 1090 | if MOST_FREQ_LOC_HEURISTIC is True: 1091 | locations = [loc.lemma_.lower() for loc in doc.ents if 1092 | (loc.label_ == "GPE" or loc.label_ == "NORP" or loc.label_ == "LOC" or loc.label_ == "ORG" )] 1093 | locations = [LOCATION_SYNONYMS_FOR_HEURISTIC[0].lower() if ((x in LOCATION_SYNONYMS_FOR_HEURISTIC) or ( 1094 | removearticles(x) in LOCATION_SYNONYMS_FOR_HEURISTIC)) else x for x in locations] 1095 | most_frequent_loc = Most_Common(locations) 1096 | else: 1097 | most_frequent_loc = None 1098 | sentencealreadyseen = "" 1099 | for t in doc: 1100 | vec_for_term, vec_for_sent, spansse, texttse, tensesse = keep_token_IE(t,singleNOUNs,singleCompoundedHITs,singleCompoundedHITs_toEXCLUDE,most_frequent_loc, LOCATION_SYNONYMS_FOR_HEURISTIC, VERBS_TO_KEEP, COMPUTE_OVERALL_SENTIMENT_SCORE, MOST_FREQ_LOC_HEURISTIC,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 1101 | if vec_for_term: 1102 | if len(vec_for_term) > 0: 1103 | if COMPUTE_OVERALL_SENTIMENT_SCORE == True: 1104 | thissentence = str(t.sent.text) 1105 | if (thissentence == sentencealreadyseen): 1106 | continue 1107 | else: 1108 | sentencealreadyseen = str(t.sent.text) 1109 | vect.extend(vec_for_term) 1110 | vect_sentiment.extend(vec_for_sent) 1111 | vect_spans.append(spansse) 1112 | vect_text.append(texttse) 1113 | vect_tense.append(tensesse) 1114 | else: 1115 | vect.extend(vec_for_term) 1116 | vect_sentiment.extend(vec_for_sent) 1117 | vect_spans.append(spansse) 1118 | vect_text.append(texttse) 1119 | vect_tense.append(tensesse) 1120 | return vect, vect_sentiment, vect_spans, vect_text, vect_tense 1121 | 1122 | 1123 | def CheckLeapYear(year): 1124 | isleap = False 1125 | if (year % 4) == 0: 1126 | if (year % 100) == 0: 1127 | if (year % 400) == 0: 1128 | isleap = True 1129 | else: 1130 | isleap = False 1131 | else: 1132 | isleap = True 1133 | else: 1134 | isleap = False 1135 | return isleap 1136 | 1137 | 1138 | def get_sentiment(text, include, exclude=None, location=None, tense=['past', 'present', 'future', 'NaN'], oss=False, UseSenticNet=True, UseSentiWordNet=True, UseFigas=True): 1139 | # text = ['Today is a beautiful day', 'The economy is slowing down and it is a rainy day'] 1140 | # include = ['day', 'economy'] 1141 | # exclude=None 1142 | # location=None 1143 | # tense=['past', 'present', 'future', 'NaN'] 1144 | # oss=False 1145 | toINCLUDE = include 1146 | singleCompoundedHITs_toEXCLUDE = exclude 1147 | LOCATION_SYNONYMS_FOR_HEURISTIC = location 1148 | VERBS_TO_KEEP = tense 1149 | COMPUTE_OVERALL_SENTIMENT_SCORE = oss 1150 | for i in range(len(text)): 1151 | text[i] = re.sub("\n \\n", " ", str(text[i])) 1152 | 1153 | if LOCATION_SYNONYMS_FOR_HEURISTIC and len(LOCATION_SYNONYMS_FOR_HEURISTIC) > 0: 1154 | MOST_FREQ_LOC_HEURISTIC = True 1155 | else: 1156 | MOST_FREQ_LOC_HEURISTIC = False 1157 | singleNOUNs = [] 1158 | singleCompoundedHITs = [] 1159 | for ii in toINCLUDE: 1160 | if " " in ii: 1161 | singleCompoundedHITs.append(ii) 1162 | else: 1163 | singleNOUNs.append(ii) 1164 | 1165 | currentDT = datetime.now() 1166 | spacy_model_name_EN = 'en_core_web_lg' 1167 | # from timeit import default_timer as timer 1168 | # start = timer() 1169 | # print("spaCy is loading the en_core_web_lg model ...") 1170 | nlp_EN = spacy.load(spacy_model_name_EN) ## this operation takes approximately 10seconds 1171 | # print(timer()-start) ## elapsed time in seconds 1172 | LA_target = 'en' 1173 | docs_lemma = [] 1174 | docs_lemma_sentiment = [] 1175 | docsspans = [] 1176 | docstexttt = [] 1177 | docstense = [] 1178 | DF_ExtractionsSummary = [] 1179 | 1180 | for j in range(len(text)): 1181 | nlp_COUNTRYdoc = nlp_EN(text[j]) 1182 | lemmatized_doc, lemmatized_doc_sent, spanss, texttt, tensesss = lemmatize_doc_IE_Sentiment( 1183 | nlp_COUNTRYdoc, singleNOUNs, singleCompoundedHITs, singleCompoundedHITs_toEXCLUDE, 1184 | LOCATION_SYNONYMS_FOR_HEURISTIC, VERBS_TO_KEEP, COMPUTE_OVERALL_SENTIMENT_SCORE, MOST_FREQ_LOC_HEURISTIC,UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 1185 | docs_lemma.append(lemmatized_doc) 1186 | docs_lemma_sentiment.append(lemmatized_doc_sent) 1187 | docsspans.append(spanss) 1188 | docstexttt.append(texttt) 1189 | docstense.append(tensesss) 1190 | for i in range(len(docstexttt[j])): 1191 | includedNOUN = [] 1192 | check = (singleNOUNs + singleCompoundedHITs) 1193 | for k in check: 1194 | if k in str(docsspans[j][i]).lower(): 1195 | includedNOUN.append(k) 1196 | DF_ExtractionsSummary.append([j, docstexttt[j][i], docsspans[j][i], docs_lemma[j][i], 1197 | docs_lemma_sentiment[j][i], docstense[j][i], includedNOUN]) 1198 | 1199 | DF_ExtractionsSummary = pd.DataFrame(DF_ExtractionsSummary, columns=['Doc_id', 'Text', 'SpannedText', 'Chunk', 1200 | 'Sentiment', 'Tense', 'Include']) 1201 | return DF_ExtractionsSummary 1202 | 1203 | 1204 | ##### 1205 | #text = ['Unemployment is rising at high speed', 'The economy is slowing down and unemployment is booming'] 1206 | #include = ['unemployment', 'economy'] 1207 | # 1208 | #oss=True 1209 | # 1210 | #UseSenticNet=True 1211 | #UseSentiWordNet=True 1212 | #UseFigas=False 1213 | # 1214 | #resp = get_sentiment(text = text, include = include, oss=oss, UseSenticNet=UseSenticNet, UseSentiWordNet=UseSentiWordNet, UseFigas=UseFigas) 1215 | #print(resp.values) 1216 | 1217 | #print("\nEND\n") --------------------------------------------------------------------------------