├── .gitignore
├── README.md
├── datasets
    ├── ABSA-SemEval2014
    │   ├── Laptop_Train_v2.xml
    │   ├── Laptops_Test_Data_PhaseA.xml
    │   ├── Laptops_Test_Data_phaseB.xml
    │   ├── README.txt
    │   ├── Restaurants_Test_Data_PhaseA.xml
    │   ├── Restaurants_Test_Data_phaseB.xml
    │   ├── Restaurants_Train.xml
    │   ├── Restaurants_Train_v2.xml
    │   ├── SemEvalSchema.xsd
    │   ├── baselinesystemdescription.pdf
    │   ├── eval.jar
    │   ├── laptops-trial.xml
    │   ├── restaurants-trial.xml
    │   ├── semeval14_absa_annotationguidelines.pdf
    │   ├── semeval_base.py
    │   └── submission-guidelines.pdf
    ├── ABSA-SemEval2015
    │   ├── ABSA-15_Laptops_Train_Data.xml
    │   ├── ABSA-15_Restaurants_Train_Final.xml
    │   ├── ABSA15_Hotels_Test.xml
    │   ├── ABSA15_Laptops_Test.xml
    │   ├── ABSA15_Restaurants_Test.xml
    │   ├── absa-2015_laptops_trial.xml
    │   ├── absa-2015_restaurants_trial.xml
    │   └── guidelines
    │   │   ├── SemEval2015_ABSA_Laptops_AnnotationGuidelines.pdf
    │   │   └── semeval2015_absa_restaurants_annotationguidelines.pdf
    ├── ABSA-SemEval2016
    │   └── Training_Data
    │   │   ├── ABSA16FR-RestaurantsTrain
    │   │       ├── ABSA16FR-download.jar
    │   │       ├── ABSA16FR_Restaurants_Train.xml
    │   │       ├── ABSA16FR_Restaurants_guidelines.pdf
    │   │       ├── ABSA16FR_Restaurants_index.txt
    │   │       └── README.txt
    │   │   ├── ABSA16_Laptops_Train_English_SB2.xml
    │   │   ├── ABSA16_Laptops_Train_SB1_v2.xml
    │   │   ├── ABSA16_Restaurants_Train_English_SB2.xml
    │   │   ├── ABSA16_Restaurants_Train_SB1_v2.xml
    │   │   ├── restaurants_dutch_training.xml
    │   │   └── restaurants_dutch_training_textlevel.xml
    ├── Aspect-based-Sentiment-Analysis-Dataset.ipynb
    └── CR
    │   ├── Apex AD2600 Progressive-scan DVD player.txt
    │   ├── Canon G3.txt
    │   ├── Creative Labs Nomad Jukebox Zen Xtra 40GB.txt
    │   ├── Nikon coolpix 4300.txt
    │   ├── Nokia 6610.txt
    │   └── Readme.txt
├── libraries
    ├── __init__.py
    └── baselines.py
├── run.py
└── stanford_corenlp_python
    ├── LICENSE
    ├── README.md
    ├── __init__.py
    ├── client.py
    ├── corenlp.py
    ├── default.properties
    ├── jsonrpc.py
    └── progressbar.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # 3rd party libs
 2 | stanford_corenlp_python/stanford-corenlp-full
 3 | stanford_corenlp_python/stanford-corenlp-full-2015-12-09/
 4 | 
 5 | # pytchon bytecode files
 6 | *.pyc
 7 | 
 8 | # zip file
 9 | *.zip
10 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Aspect-based Sentiment Analysis
2 | 
3 | [Sentiment Analysis Datasets](https://www.w3.org/community/sentiment/wiki/Datasets): collection of many different data sets for Sentiment Analysis
4 | 
5 | [TASS 2016](http://www.sepln.org/workshops/tass/2016/tass2016.php): workshop for sentiment analysis focused on Spanish
6 | 
7 | [Yelp Dataset](https://www.yelp.com/dataset_challenge): Yelp reviews dataset
8 | 


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2014/README.txt:
--------------------------------------------------------------------------------
 1 | Aspect Based Sentiment Analysis (ABSA)
 2 | Task 4 of SemEval 2014
 3 | -----------------------------------------------------
 4 | 
 5 | This folder contains scripts/code for:
 6 | 
 7 | A. Running the ABSA baselines.
 8 | B. Evaluating the output of your system.
 9 | C. Validating the XML file that you will submit to ABSA 2014.
10 | 
11 | 
12 | Running the Baselines
13 | -----------------------
14 | 
15 | The semeval_base.py script is an implementation of the baselines of SemEval Task 4 (Aspect Based Sentiment Analysis).
16 | A high level description of them can be found at the following address:
17 | 
18 | http://alt.qcri.org/semeval2014/task4/data/uploads/baselinesystemdescription.pdf
19 | 
20 | By running python semeval_base.py from you shell, a list of possible options will be displayed.
21 | (**Caution: We tested semeval_base.py only in Linux.)
22 | 
23 | Assuming that rest.xml and lap.xml are the training data for the restaurants and laptops 
24 | domain, respectively, we recommend you run the baselines as follows:
25 | 
26 | -- restaurants
27 | 
28 | python semeval_base.py --train rest.xml --task 5
29 | 
30 | It reads the given data (rest.xml) and splits them in a train (absa--train.xml) and a test part (absa--test.xml) using a 80:20 ratio.
31 | Then, it tags the sentences of the test part with the found aspect terms and categories and stores the result to absa--test.predicted-stageI.xml.
32 | absa--test.gold.xml contains the gold (correct) aspect terms and categories.
33 | 
34 | 
35 | python semeval_base.py --train rest.xml --task 6
36 | 
37 | It reads the given data (rest.xml) splits them in a train (absa--train.xml) and a test part (absa--test.xml) using a 80:20 ratio.
38 | Then, it finds the polarity for the aspect terms and categories of the test part and stores the result to absa--test.predicted-stageII.xml.
39 | absa--test.gold.xml contains the gold (correct) polarities.
40 | 
41 | -- laptops
42 | 
43 | python semeval_base.py --train lap.xml --task 1
44 | 
45 | It reads the given data (lap.xml), splits them in a train (absa--train.xml) and a test part (absa--test.xml) using a 80:20 ratio.
46 | Then, it tags the sentences of the test part with the found aspect terms and stores the result to absa--test.predicted-aspect.xml.
47 | absa--test.gold.xml contains the gold (correct) aspect terms and categories
48 | 
49 | python semeval_base.py --train lap.xml --task 3
50 | 
51 | It reads the given data (rest.xml), splits them in a train (absa--train.xml) and a test part (absa--test.xml) using a 80:20 ratio.
52 | Then, it finds the polarity for the aspect terms of the test part and stores the result to absa--test.predicted-stageII.xml.
53 | absa--test.gold.xml contains the gold (correct) polarities.
54 | 
55 | 
56 | In all cases above, the baseline script calculates and displays evaluation scores (precision, recall, and F1 for aspect term and aspect category extraction; accuracy for aspect term and aspect category polarity detection).
57 | 
58 | 
59 | Evaluation
60 | -----------------------
61 | 
62 | java -cp ./eval.jar Main.Aspects test.xml ref.xml
63 | 
64 | It calculates and displays the precision, recall and F1 for aspect term and category extraction for a system that generated test.xml, comparing it to ref.xml that contains
65 | the gold correct annotations. The same measures are also calculated and displayed by semeval_base.py.
66 | 
67 | java -cp ./eval.jar Main.Polarity test.xml  ref.xml
68 | 
69 | In contrast to semeval_base.py that calculates only the overall accuracy for the polarity detection task, the above command also calculates F1, Precision and Recall 
70 | for all labels (positive|negative|neutral|conflict). As previously, test.xml is the file that the system generated and ref.xml
71 | is the one that contains the gold (correct) annotations.
72 | 
73 | 
74 | Submit your system
75 | -----------------------
76 | 
77 | The Aspect Based Sentiment Analysis task will run in two stages. 
78 | 
79 | In the first stage, you will be provided with a XML file that will contain a set of sentences.
80 | If you want to participate in this stage, you have to return a file tagged with the aspect terms and categories in the same way they are tagged in the training data.  
81 | 
82 | In the second stage, we will provide you with the correct aspect terms and categories  
83 | and you will have to find their polarity (positive|negative|neutral|conflict) and tag them as in the training data.
84 | 
85 | 
86 | Before uploading your results (for stage one or two), we highly recommend you validate (as shown below) the XML your system produced against the provided XSD schema (SemEvalSchema.xsd). 
87 | This way you will verify that your XML output is well-formed and can be processed/parsed by our evaluation scripts.
88 | 
89 | java -cp ./eval.jar Main.Valid test.xml  SemEvalSchema.xsd
90 | 	
91 | 
92 | 
93 | 
94 | 


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2014/SemEvalSchema.xsd:
--------------------------------------------------------------------------------
 1 | <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
 2 | <xs:schema version="1.0" xmlns:xs="http://www.w3.org/2001/XMLSchema">
 3 | 
 4 |   <xs:element name="aspectCategories" type="semEval14AspectCategories"/>
 5 | 
 6 |   <xs:element name="aspectCategory" type="semEval14AspectCategory"/>
 7 | 
 8 |   <xs:element name="aspectTerm" type="semEval14AspectTerm"/>
 9 | 
10 |   <xs:element name="aspectTerms" type="semEval14AspectTerms"/>
11 | 
12 |   <xs:element name="sentence" type="semEval14Sentence"/>
13 | 
14 |   <xs:element name="sentences" type="semEval14Sentences"/>
15 | 
16 |   <xs:complexType name="semEval14Sentences">
17 |     <xs:sequence>
18 |       <xs:element ref="sentence" minOccurs="0" maxOccurs="unbounded"/>
19 |     </xs:sequence>
20 |   </xs:complexType>
21 | 
22 |   <xs:complexType name="semEval14Sentence">
23 |     <xs:sequence>
24 |       <xs:element name="text" type="xs:string" minOccurs="1" maxOccurs="1"/>
25 |       <xs:element ref="aspectTerms" minOccurs="0"/>
26 |       <xs:element ref="aspectCategories" minOccurs="0"/>
27 |     </xs:sequence>
28 |     <xs:attribute name="id" type="xs:string" use="required"/>
29 |   </xs:complexType>
30 | 
31 |   <xs:complexType name="semEval14AspectTerms">
32 |     <xs:sequence>
33 |       <xs:element ref="aspectTerm" minOccurs="0" maxOccurs="unbounded"/>
34 |     </xs:sequence>
35 |   </xs:complexType>
36 | 
37 |   <xs:complexType name="semEval14AspectTerm">
38 |     <xs:sequence/>
39 |     <xs:attribute name="term" type="xs:string" use="required"/>
40 | 	<xs:attribute name="polarity" type="xs:string" use="required"/>
41 |     <xs:attribute name="from" type="xs:int" use="required"/>
42 |     <xs:attribute name="to" type="xs:int" use="required"/>
43 |   </xs:complexType>
44 | 
45 |   <xs:complexType name="semEval14AspectCategories">
46 |     <xs:sequence>
47 |       <xs:element ref="aspectCategory" minOccurs="0" maxOccurs="unbounded"/>
48 |     </xs:sequence>
49 |   </xs:complexType>
50 | 
51 |   <xs:complexType name="semEval14AspectCategory">
52 |     <xs:sequence/>
53 |     <xs:attribute name="category" type="xs:string" use="required"/>
54 | 	<xs:attribute name="polarity" type="xs:string" use="required"/>
55 |   </xs:complexType>
56 | </xs:schema>
57 | 
58 | 


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2014/baselinesystemdescription.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/datasets/ABSA-SemEval2014/baselinesystemdescription.pdf


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2014/eval.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/datasets/ABSA-SemEval2014/eval.jar


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2014/laptops-trial.xml:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  2 | <sentences>
  3 |     <sentence id="2128">
  4 |         <text>I liked the aluminum body.</text>
  5 |         <aspectTerms>
  6 |             <aspectTerm term="aluminum body" polarity="positive" from="12" to="25"/>
  7 |         </aspectTerms>
  8 |     </sentence>
  9 |     <sentence id="81">
 10 |         <text>Lightweight and the screen is beautiful!</text>
 11 |         <aspectTerms>
 12 |             <aspectTerm term="screen" polarity="positive" from="20" to="26"/>
 13 |         </aspectTerms>
 14 |     </sentence>
 15 |     <sentence id="89">
 16 |         <text>Buy it, love it, and I promise you won't regret it.</text>
 17 |     </sentence>
 18 |     <sentence id="353">
 19 |         <text>From the build quality to the performance, everything about it has been sub-par from what I would have expected from Apple.</text>
 20 |         <aspectTerms>
 21 |             <aspectTerm term="build quality" polarity="negative" from="9" to="22"/>
 22 |             <aspectTerm term="performance" polarity="negative" from="30" to="41"/>
 23 |         </aspectTerms>
 24 |     </sentence>
 25 |     <sentence id="347">
 26 |         <text>pretty much everything else about the computer is good it just stops working out of no were.</text>
 27 |     </sentence>
 28 |     <sentence id="1813">
 29 |         <text>Originally bought it for my wife.</text>
 30 |     </sentence>
 31 |     <sentence id="655">
 32 |         <text>It was truly a great computer costing less than one thousand bucks before tax.</text>
 33 |         <aspectTerms>
 34 |             <aspectTerm term="costing" polarity="positive" from="30" to="37"/>
 35 |         </aspectTerms>
 36 |     </sentence>
 37 |     <sentence id="1615">
 38 |         <text>I bought this laptop on Saturday and am completely in love with it!</text>
 39 |     </sentence>
 40 |     <sentence id="1670">
 41 |         <text>If you don't like fingerprints, this might not be the laptop for you.</text>
 42 |     </sentence>
 43 |     <sentence id="2443">
 44 |         <text>Boots up fast and runs great!</text>
 45 |         <aspectTerms>
 46 |             <aspectTerm term="Boots up" polarity="positive" from="0" to="8"/>
 47 |             <aspectTerm term="runs" polarity="positive" from="18" to="22"/>
 48 |         </aspectTerms>
 49 |     </sentence>
 50 |     <sentence id="764">
 51 |         <text>Call tech support, standard email the form and fax it back in to us.</text>
 52 |         <aspectTerms>
 53 |             <aspectTerm term="tech support" polarity="neutral" from="5" to="17"/>
 54 |         </aspectTerms>
 55 |     </sentence>
 56 |     <sentence id="177">
 57 |         <text>I did contact HP and share how unhappy I am.</text>
 58 |     </sentence>
 59 |     <sentence id="3012">
 60 |         <text>This is my second MacBook.</text>
 61 |     </sentence>
 62 |     <sentence id="1479">
 63 |         <text>The service I received from Toshiba went above and beyond the call of duty.</text>
 64 |         <aspectTerms>
 65 |             <aspectTerm term="service" polarity="positive" from="4" to="11"/>
 66 |         </aspectTerms>
 67 |     </sentence>
 68 |     <sentence id="2937">
 69 |         <text>I would recommend it just because of the internet speed probably because thats the only thing i really care about.</text>
 70 |         <aspectTerms>
 71 |             <aspectTerm term="internet speed" polarity="positive" from="41" to="55"/>
 72 |         </aspectTerms>
 73 |     </sentence>
 74 |     <sentence id="439">
 75 |         <text>This is my 3rd Apple Laptop and first MacBook Pro.</text>
 76 |     </sentence>
 77 |     <sentence id="2925">
 78 |         <text>I have had this laptop for a few months now and i would say im pretty satisfied.</text>
 79 |     </sentence>
 80 |     <sentence id="929">
 81 |         <text>The love part of my relationship with this laptop doesn't take very long.</text>
 82 |     </sentence>
 83 |     <sentence id="2077">
 84 |         <text>The screen shows great colors.</text>
 85 |         <aspectTerms>
 86 |             <aspectTerm term="screen" polarity="positive" from="4" to="10"/>
 87 |         </aspectTerms>
 88 |     </sentence>
 89 |     <sentence id="225">
 90 |         <text>Dells are ok, HPs aren't that good, but Macs or Fantastic.</text>
 91 |     </sentence>
 92 |     <sentence id="2756">
 93 |         <text>The battery life has not decreased since I bought it, so i'm thrilled with that.</text>
 94 |         <aspectTerms>
 95 |             <aspectTerm term="battery life" polarity="positive" from="4" to="16"/>
 96 |         </aspectTerms>
 97 |     </sentence>
 98 |     <sentence id="2863">
 99 |         <text>The price and features more than met my needs.</text>
100 |         <aspectTerms>
101 |             <aspectTerm term="price" polarity="positive" from="4" to="9"/>
102 |             <aspectTerm term="features" polarity="positive" from="14" to="22"/>
103 |         </aspectTerms>
104 |     </sentence>
105 |     <sentence id="1393">
106 |         <text>the mouse buttons are hard to push.</text>
107 |         <aspectTerms>
108 |             <aspectTerm term="mouse buttons" polarity="negative" from="4" to="17"/>
109 |         </aspectTerms>
110 |     </sentence>
111 |     <sentence id="1837">
112 |         <text>Just don't waste your time and money on this.</text>
113 |     </sentence>
114 |     <sentence id="921">
115 |         <text>And I'm still paying the bloody financing, for a product which didn't last me at least three years!</text>
116 |     </sentence>
117 |     <sentence id="2817">
118 |         <text>Bought it to use mostly for oline classes.</text>
119 |     </sentence>
120 |     <sentence id="1896">
121 |         <text>I purchased an HP right after my high school graduation.</text>
122 |     </sentence>
123 |     <sentence id="341">
124 |         <text>Me and my boyfriend bought the Gateway NV78 in nov of 09.</text>
125 |     </sentence>
126 |     <sentence id="36">
127 |         <text>This is the complete opposite to an ergonomic design.</text>
128 |         <aspectTerms>
129 |             <aspectTerm term="design" polarity="negative" from="46" to="52"/>
130 |         </aspectTerms>
131 |     </sentence>
132 |     <sentence id="990">
133 |         <text>The technical service for dell is so 3rd world it might as well not even bother.</text>
134 |         <aspectTerms>
135 |             <aspectTerm term="technical service" polarity="negative" from="4" to="21"/>
136 |         </aspectTerms>
137 |     </sentence>
138 |     <sentence id="2275">
139 |         <text>The built in camera is very useful when chatting with other techs in remote buildings on our campus.</text>
140 |         <aspectTerms>
141 |             <aspectTerm term="built in camera" polarity="positive" from="4" to="19"/>
142 |         </aspectTerms>
143 |     </sentence>
144 |     <sentence id="2908">
145 |         <text>Not super fancy, but not super expensive either.</text>
146 |     </sentence>
147 |     <sentence id="214">
148 |         <text>Keyboard good sized and wasy to use.</text>
149 |         <aspectTerms>
150 |             <aspectTerm term="Keyboard" polarity="positive" from="0" to="8"/>
151 |         </aspectTerms>
152 |     </sentence>
153 |     <sentence id="2777">
154 |         <text>Great wifi too.</text>
155 |         <aspectTerms>
156 |             <aspectTerm term="wifi" polarity="positive" from="6" to="10"/>
157 |         </aspectTerms>
158 |     </sentence>
159 |     <sentence id="467">
160 |         <text>The Dell mini was the first Dell product that I had ever purchased.</text>
161 |     </sentence>
162 |     <sentence id="47">
163 |         <text>My Mac has gone from being a trusted friend to an adversary.</text>
164 |     </sentence>
165 |     <sentence id="361">
166 |         <text>I have been a mac user since the mid 90s.</text>
167 |     </sentence>
168 |     <sentence id="168">
169 |         <text>My HP is very heavy.</text>
170 |     </sentence>
171 |     <sentence id="2642">
172 |         <text>big mistake!</text>
173 |     </sentence>
174 |     <sentence id="1000">
175 |         <text>Best Buy was great as always and accepted the return and gave me another model 1764.</text>
176 |     </sentence>
177 |     <sentence id="2135">
178 |         <text>I would buy this lap top over and over again!</text>
179 |     </sentence>
180 |     <sentence id="1733">
181 |         <text>Games being the main issue.</text>
182 |         <aspectTerms>
183 |             <aspectTerm term="Games" polarity="negative" from="0" to="5"/>
184 |         </aspectTerms>
185 |     </sentence>
186 |     <sentence id="1822">
187 |         <text>My previous purchases were with Dell and HP.</text>
188 |     </sentence>
189 |     <sentence id="3064">
190 |         <text>The price is another driving influence that made me purchase this laptop.</text>
191 |         <aspectTerms>
192 |             <aspectTerm term="price" polarity="positive" from="4" to="9"/>
193 |         </aspectTerms>
194 |     </sentence>
195 |     <sentence id="1646">
196 |         <text>But see the macbook pro is different because it may have a huge price tag but it comes with the full software that you would actually need and most of it has free future updates.</text>
197 |         <aspectTerms>
198 |             <aspectTerm term="price tag" polarity="negative" from="64" to="73"/>
199 |             <aspectTerm term="software" polarity="positive" from="101" to="109"/>
200 |             <aspectTerm term="updates" polarity="positive" from="170" to="177"/>
201 |         </aspectTerms>
202 |     </sentence>
203 |     <sentence id="2988">
204 |         <text>It's a great product for a great price!</text>
205 |         <aspectTerms>
206 |             <aspectTerm term="price" polarity="positive" from="33" to="38"/>
207 |         </aspectTerms>
208 |     </sentence>
209 |     <sentence id="2828">
210 |         <text>Excellent speed for processing data.</text>
211 |         <aspectTerms>
212 |             <aspectTerm term="speed" polarity="positive" from="10" to="15"/>
213 |         </aspectTerms>
214 |     </sentence>
215 |     <sentence id="121">
216 |         <text>All in all, I'm incredibly dissatisfied with this laptop, and with HP as a whole.</text>
217 |     </sentence>
218 |     <sentence id="1967">
219 |         <text>In my opinion it was not as user friendly as I expected either.</text>
220 |     </sentence>
221 |     <sentence id="1776">
222 |         <text>The Macbook arrived in a nice twin packing and sealed in the box, all the functions works great.</text>
223 |         <aspectTerms>
224 |             <aspectTerm term="twin packing" polarity="positive" from="30" to="42"/>
225 |             <aspectTerm term="functions" polarity="positive" from="74" to="83"/>
226 |         </aspectTerms>
227 |     </sentence>
228 |     <sentence id="2626">
229 |         <text>The switchable graphic card is pretty sweet when you want gaming on the laptop.</text>
230 |         <aspectTerms>
231 |             <aspectTerm term="switchable graphic card" polarity="positive" from="4" to="27"/>
232 |             <aspectTerm term="gaming" polarity="neutral" from="58" to="64"/>
233 |         </aspectTerms>
234 |     </sentence>
235 |     <sentence id="2967">
236 |         <text>I would like at least a 4 hr. battery life.</text>
237 |         <aspectTerms>
238 |             <aspectTerm term="battery life" polarity="neutral" from="30" to="42"/>
239 |         </aspectTerms>
240 |     </sentence>
241 |     <sentence id="1800">
242 |         <text>Looking online, many people are having the same problem.</text>
243 |     </sentence>
244 |     <sentence id="2993">
245 |         <text>It has good speed and plenty of hard drive space.</text>
246 |         <aspectTerms>
247 |             <aspectTerm term="speed" polarity="positive" from="12" to="17"/>
248 |             <aspectTerm term="hard drive space" polarity="positive" from="32" to="48"/>
249 |         </aspectTerms>
250 |     </sentence>
251 |     <sentence id="1037">
252 |         <text>The driver updates don't fix the issue, very frustrating.</text>
253 |         <aspectTerms>
254 |             <aspectTerm term="driver updates" polarity="negative" from="4" to="18"/>
255 |         </aspectTerms>
256 |     </sentence>
257 |     <sentence id="1270">
258 |         <text>Dell wanted to charge us for everything everytime I called them with a problem.</text>
259 |     </sentence>
260 |     <sentence id="1834">
261 |         <text>THIS HAS BEEN NOTHING BUT A HEADACHE SINCE WE PURCHASED IT.</text>
262 |     </sentence>
263 |     <sentence id="2451">
264 |         <text>Im glad that it has such great features in it.</text>
265 |         <aspectTerms>
266 |             <aspectTerm term="features" polarity="positive" from="31" to="39"/>
267 |         </aspectTerms>
268 |     </sentence>
269 |     <sentence id="605">
270 |         <text>it's just a great toy to have around.</text>
271 |     </sentence>
272 |     <sentence id="1900">
273 |         <text>When this happened I would have to completely power off my computer and restart it.</text>
274 |     </sentence>
275 |     <sentence id="2871">
276 |         <text>I always have used a tower home PC and jumped to the laptop and have been very satisfied with its performance.</text>
277 |         <aspectTerms>
278 |             <aspectTerm term="performance" polarity="positive" from="98" to="109"/>
279 |         </aspectTerms>
280 |     </sentence>
281 |     <sentence id="844">
282 |         <text>I asked how they would determine that since there are no scratches, dents or other signs of damage and was told that was the only way this type of damage could happen.</text>
283 |     </sentence>
284 |     <sentence id="1634">
285 |         <text>I had finally reached my limit and broke down.</text>
286 |     </sentence>
287 |     <sentence id="200">
288 |         <text>and dell and best buy both refused to take it back after i only had it for 1 hour....</text>
289 |     </sentence>
290 |     <sentence id="416">
291 |         <text>I burned my leg, after lifting it from my desk, and for less than 5 second putting it on my lap to clean my coffee table, so I can place it there.</text>
292 |     </sentence>
293 |     <sentence id="68">
294 |         <text>and its really cheap and you wont regret buying it.</text>
295 |     </sentence>
296 |     <sentence id="2746">
297 |         <text>It was over rated!</text>
298 |     </sentence>
299 |     <sentence id="282">
300 |         <text>I dont understand how anyone can think this is a great product worth purchasing.</text>
301 |     </sentence>
302 |     <sentence id="1532">
303 |         <text>My sister has the same Mac as me and she is in a band and uses GarageBand to record and edit.</text>
304 |         <aspectTerms>
305 |             <aspectTerm term="GarageBand" polarity="neutral" from="63" to="73"/>
306 |         </aspectTerms>
307 |     </sentence>
308 |     <sentence id="2238">
309 |         <text>and looks very sexyy:D really the mac book pro is the best laptop specially for students in college if you are not caring about price.</text>
310 |         <aspectTerms>
311 |             <aspectTerm term="price" polarity="negative" from="128" to="133"/>
312 |         </aspectTerms>
313 |     </sentence>
314 |     <sentence id="692">
315 |         <text>They definitely have a superior product!</text>
316 |     </sentence>
317 |     <sentence id="78">
318 |         <text>This is a great laptop and I would recommend it to anyone.</text>
319 |     </sentence>
320 |     <sentence id="2245">
321 |         <text>It is very user friendly and not hard to figure out at all.</text>
322 |     </sentence>
323 |     <sentence id="415">
324 |         <text>They are wonderful, but very dangerous when it comes to emitting heat.</text>
325 |     </sentence>
326 |     <sentence id="2710">
327 |         <text>Came fully loaded -good.</text>
328 |     </sentence>
329 |     <sentence id="1671">
330 |         <text>It super shiny, so you can see the fingerprints easily.</text>
331 |     </sentence>
332 |     <sentence id="2703">
333 |         <text>It's a great prodcut to handle basic computing needs.</text>
334 |     </sentence>
335 |     <sentence id="66">
336 |         <text>the laptop was really good and it goes really fast just the way i thought it would of run.</text>
337 |     </sentence>
338 |     <sentence id="285">
339 |         <text>It doesnt work worth a damn.</text>
340 |     </sentence>
341 |     <sentence id="1609">
342 |         <text>It was definelty a smart move.</text>
343 |     </sentence>
344 |     <sentence id="2661">
345 |         <text>Overall though, for the money spent it's a great deal.</text>
346 |     </sentence>
347 |     <sentence id="1577">
348 |         <text>A little pricey but it is well, well worth it.</text>
349 |     </sentence>
350 |     <sentence id="1287">
351 |         <text>It is meant to be PORTABLE.</text>
352 |     </sentence>
353 |     <sentence id="2523">
354 |         <text>I am totally satisfied with my little toshie!</text>
355 |     </sentence>
356 |     <sentence id="2785">
357 |         <text>Also, the extended warranty was a problem.</text>
358 |         <aspectTerms>
359 |             <aspectTerm term="extended warranty" polarity="negative" from="10" to="27"/>
360 |         </aspectTerms>
361 |     </sentence>
362 |     <sentence id="832">
363 |         <text>My opinion of Sony has been dropping as fast as the stock market, given their horrible support, but this machine just caused another plunge.</text>
364 |         <aspectTerms>
365 |             <aspectTerm term="support" polarity="negative" from="87" to="94"/>
366 |         </aspectTerms>
367 |     </sentence>
368 |     <sentence id="2037">
369 |         <text>I waited and waited and no laptop.</text>
370 |     </sentence>
371 |     <sentence id="1429">
372 |         <text>Everything is falling apart internally and externally.</text>
373 |     </sentence>
374 |     <sentence id="2130">
375 |         <text>I also liked the glass screen.</text>
376 |         <aspectTerms>
377 |             <aspectTerm term="glass screen" polarity="positive" from="17" to="29"/>
378 |         </aspectTerms>
379 |     </sentence>
380 |     <sentence id="1446">
381 |         <text>When I called Toshiba, they would not do anything and even tried to charge me $35 for the phone call, even though they didn't offer any technical support.</text>
382 |         <aspectTerms>
383 |             <aspectTerm term="technical support" polarity="negative" from="136" to="153"/>
384 |         </aspectTerms>
385 |     </sentence>
386 |     <sentence id="2259">
387 |         <text>I can actually get work done with this MAC, and not fight with it like my tired old PC laptop.</text>
388 |     </sentence>
389 |     <sentence id="2695">
390 |         <text>If you're not wanting to be mobile, this is a good laptop to sit on a desk.</text>
391 |     </sentence>
392 |     <sentence id="1506">
393 |         <text>I also travel with it and it never gives me any problems.</text>
394 |     </sentence>
395 |     <sentence id="3047">
396 |         <text>(Beware, their staff could send you back making you feel that only they know what a computer is.</text>
397 |         <aspectTerms>
398 |             <aspectTerm term="staff" polarity="negative" from="15" to="20"/>
399 |         </aspectTerms>
400 |     </sentence>
401 |     <sentence id="1042">
402 |         <text>Other Thoughts: Do not purchase this product.</text>
403 |     </sentence>
404 |     <sentence id="2151">
405 |         <text>The computer was delivered as promised.</text>
406 |     </sentence>
407 |     <sentence id="1732">
408 |         <text>The only thing that I don't like about my mac is that sometimes there are programs that I want to be able to run and I am not able to.</text>
409 |         <aspectTerms>
410 |             <aspectTerm term="programs" polarity="negative" from="74" to="82"/>
411 |         </aspectTerms>
412 |     </sentence>
413 |     <sentence id="1063">
414 |         <text>Wireless has not been a issue for me, like some others have meantioned.</text>
415 |         <aspectTerms>
416 |             <aspectTerm term="Wireless" polarity="positive" from="0" to="8"/>
417 |         </aspectTerms>
418 |     </sentence>
419 |     <sentence id="271">
420 |         <text>MacBook Notebooks quickly die out because of their short battery life, as well as the many background programs that run without the user's knowlede.</text>
421 |         <aspectTerms>
422 |             <aspectTerm term="battery life" polarity="negative" from="57" to="69"/>
423 |             <aspectTerm term="background programs" polarity="negative" from="91" to="110"/>
424 |         </aspectTerms>
425 |     </sentence>
426 |     <sentence id="1674">
427 |         <text>All for such a great price.</text>
428 |         <aspectTerms>
429 |             <aspectTerm term="price" polarity="positive" from="21" to="26"/>
430 |         </aspectTerms>
431 |     </sentence>
432 | </sentences>
433 | 


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2014/semeval14_absa_annotationguidelines.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/datasets/ABSA-SemEval2014/semeval14_absa_annotationguidelines.pdf


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2014/semeval_base.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | '''
  4 | **Baseline methods for the 4th task of SemEval 2014**
  5 | 
  6 | Run a task from the terminal::
  7 | >>> python baselines.py -t file -m taskNum
  8 | 
  9 | or, import within python. E.g., for Aspect Term Extraction::
 10 | from baselines import *
 11 | corpus = Corpus(ET.parse(trainfile).getroot().findall('sentence'))
 12 | unseen = Corpus(ET.parse(testfile).getroot().findall('sentence'))
 13 | b1 = BaselineAspectExtractor(corpus)
 14 | predicted = b1.tag(unseen.corpus)
 15 | corpus.write_out('%s--test.predicted-aspect.xml'%domain_name, predicted, short=False)
 16 | 
 17 | Similarly, for Aspect Category Detection, Aspect Term Polarity Estimation, and Aspect Category Polarity Estimation.
 18 | '''
 19 | 
 20 | __author__ = "J. Pavlopoulos"
 21 | __credits__ = "J. Pavlopoulos, D. Galanis, I. Androutsopoulos"
 22 | __license__ = "GPL"
 23 | __version__ = "1.0.1"
 24 | __maintainer__ = "John Pavlopoulos"
 25 | __email__ = "annis@aueb.gr"
 26 | 
 27 | try:
 28 |     import xml.etree.ElementTree as ET, getopt, logging, sys, random, re, copy
 29 |     from xml.sax.saxutils import escape
 30 | except:
 31 |     sys.exit('Some package is missing... Perhaps <re>?')
 32 | 
 33 | logging.basicConfig(level=logging.INFO)
 34 | logger = logging.getLogger(__name__)
 35 | 
 36 | # Stopwords, imported from NLTK (v 2.0.4)
 37 | stopwords = set(
 38 |     ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves',
 39 |      'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their',
 40 |      'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was',
 41 |      'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the',
 42 |      'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against',
 43 |      'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in',
 44 |      'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why',
 45 |      'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only',
 46 |      'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now'])
 47 | 
 48 | 
 49 | def fd(counts):
 50 |     '''Given a list of occurrences (e.g., [1,1,1,2]), return a dictionary of frequencies (e.g., {1:3, 2:1}.)'''
 51 |     d = {}
 52 |     for i in counts: d[i] = d[i] + 1 if i in d else 1
 53 |     return d
 54 | 
 55 | 
 56 | freq_rank = lambda d: sorted(d, key=d.get, reverse=True)
 57 | '''Given a map, return ranked the keys based on their values.'''
 58 | 
 59 | 
 60 | def fd2(counts):
 61 |     '''Given a list of 2-uplets (e.g., [(a,pos), (a,pos), (a,neg), ...]), form a dict of frequencies of specific items (e.g., {a:{pos:2, neg:1}, ...}).'''
 62 |     d = {}
 63 |     for i in counts:
 64 |         # If the first element of the 2-uplet is not in the map, add it.
 65 |         if i[0] in d:
 66 |             if i[1] in d[i[0]]:
 67 |                 d[i[0]][i[1]] += 1
 68 |             else:
 69 |                 d[i[0]][i[1]] = 1
 70 |         else:
 71 |             d[i[0]] = {i[1]: 1}
 72 |     return d
 73 | 
 74 | 
 75 | def validate(filename):
 76 |     '''Validate an XML file, w.r.t. the format given in the 4th task of **SemEval '14**.'''
 77 |     elements = ET.parse(filename).getroot().findall('sentence')
 78 |     aspects = []
 79 |     for e in elements:
 80 |         for eterms in e.findall('aspectTerms'):
 81 |             if eterms is not None:
 82 |                 for a in eterms.findall('aspectTerm'):
 83 |                     aspects.append(Aspect('', '', []).create(a).term)
 84 |     return elements, aspects
 85 | 
 86 | 
 87 | fix = lambda text: escape(text.encode('utf8')).replace('\"','&quot;')
 88 | '''Simple fix for writing out text.'''
 89 | 
 90 | # Dice coefficient
 91 | def dice(t1, t2, stopwords=[]):
 92 |     tokenize = lambda t: set([w for w in t.split() if (w not in stopwords)])
 93 |     t1, t2 = tokenize(t1), tokenize(t2)
 94 |     return 2. * len(t1.intersection(t2)) / (len(t1) + len(t2))
 95 | 
 96 | 
 97 | class Category:
 98 |     '''Category objects contain the term and polarity (i.e., pos, neg, neu, conflict) of the category (e.g., food, price, etc.) of a sentence.'''
 99 | 
100 |     def __init__(self, term='', polarity=''):
101 |         self.term = term
102 |         self.polarity = polarity
103 | 
104 |     def create(self, element):
105 |         self.term = element.attrib['category']
106 |         self.polarity = element.attrib['polarity']
107 |         return self
108 | 
109 |     def update(self, term='', polarity=''):
110 |         self.term = term
111 |         self.polarity = polarity
112 | 
113 | 
114 | class Aspect:
115 |     '''Aspect objects contain the term (e.g., battery life) and polarity (i.e., pos, neg, neu, conflict) of an aspect.'''
116 | 
117 |     def __init__(self, term, polarity, offsets):
118 |         self.term = term
119 |         self.polarity = polarity
120 |         self.offsets = offsets
121 | 
122 |     def create(self, element):
123 |         self.term = element.attrib['term']
124 |         self.polarity = element.attrib['polarity']
125 |         self.offsets = {'from': str(element.attrib['from']), 'to': str(element.attrib['to'])}
126 |         return self
127 | 
128 |     def update(self, term='', polarity=''):
129 |         self.term = term
130 |         self.polarity = polarity
131 | 
132 | 
133 | class Instance:
134 |     '''An instance is a sentence, modeled out of XML (pre-specified format, based on the 4th task of SemEval 2014).
135 |     It contains the text, the aspect terms, and any aspect categories.'''
136 | 
137 |     def __init__(self, element):
138 |         self.text = element.find('text').text
139 |         self.id = element.get('id')
140 |         self.aspect_terms = [Aspect('', '', offsets={'from': '', 'to': ''}).create(e) for es in
141 |                              element.findall('aspectTerms') for e in es if
142 |                              es is not None]
143 |         self.aspect_categories = [Category(term='', polarity='').create(e) for es in element.findall('aspectCategories')
144 |                                   for e in es if
145 |                                   es is not None]
146 | 
147 |     def get_aspect_terms(self):
148 |         return [a.term.lower() for a in self.aspect_terms]
149 | 
150 |     def get_aspect_categories(self):
151 |         return [c.term.lower() for c in self.aspect_categories]
152 | 
153 |     def add_aspect_term(self, term, polarity='', offsets={'from': '', 'to': ''}):
154 |         a = Aspect(term, polarity, offsets)
155 |         self.aspect_terms.append(a)
156 | 
157 |     def add_aspect_category(self, term, polarity=''):
158 |         c = Category(term, polarity)
159 |         self.aspect_categories.append(c)
160 | 
161 | 
162 | class Corpus:
163 |     '''A corpus contains instances, and is useful for training algorithms or splitting to train/test files.'''
164 | 
165 |     def __init__(self, elements):
166 |         self.corpus = [Instance(e) for e in elements]
167 |         self.size = len(self.corpus)
168 |         self.aspect_terms_fd = fd([a for i in self.corpus for a in i.get_aspect_terms()])
169 |         self.top_aspect_terms = freq_rank(self.aspect_terms_fd)
170 |         self.texts = [t.text for t in self.corpus]
171 | 
172 |     def echo(self):
173 |         print '%d instances\n%d distinct aspect terms' % (len(self.corpus), len(self.top_aspect_terms))
174 |         print 'Top aspect terms: %s' % (', '.join(self.top_aspect_terms[:10]))
175 | 
176 |     def clean_tags(self):
177 |         for i in range(len(self.corpus)):
178 |             self.corpus[i].aspect_terms = []
179 | 
180 |     def split(self, threshold=0.8, shuffle=False):
181 |         '''Split to train/test, based on a threshold. Turn on shuffling for randomizing the elements beforehand.'''
182 |         clone = copy.deepcopy(self.corpus)
183 |         if shuffle: random.shuffle(clone)
184 |         train = clone[:int(threshold * self.size)]
185 |         test = clone[int(threshold * self.size):]
186 |         return train, test
187 | 
188 |     def write_out(self, filename, instances, short=True):
189 |         with open(filename, 'w') as o:
190 |             o.write('<sentences>\n')
191 |             for i in instances:
192 |                 o.write('\t<sentence id="%s">\n' % (i.id))
193 |                 o.write('\t\t<text>%s</text>\n' % fix(i.text))
194 |                 o.write('\t\t<aspectTerms>\n')
195 |                 if not short:
196 |                     for a in i.aspect_terms:
197 |                         o.write('\t\t\t<aspectTerm term="%s" polarity="%s" from="%s" to="%s"/>\n' % (
198 |                             fix(a.term), a.polarity, a.offsets['from'], a.offsets['to']))
199 |                 o.write('\t\t</aspectTerms>\n')
200 |                 o.write('\t\t<aspectCategories>\n')
201 |                 if not short:
202 |                     for c in i.aspect_categories:
203 |                         o.write('\t\t\t<aspectCategory category="%s" polarity="%s"/>\n' % (fix(c.term), c.polarity))
204 |                 o.write('\t\t</aspectCategories>\n')
205 |                 o.write('\t</sentence>\n')
206 |             o.write('</sentences>')
207 | 
208 | 
209 | class BaselineAspectExtractor():
210 |     '''Extract the aspects from a text.
211 |     Use the aspect terms from the train data, to tag any new (i.e., unseen) instances.'''
212 | 
213 |     def __init__(self, corpus):
214 |         self.candidates = [a.lower() for a in corpus.top_aspect_terms]
215 | 
216 |     def find_offsets_quickly(self, term, text):
217 |         start = 0
218 |         while True:
219 |             start = text.find(term, start)
220 |             if start == -1: return
221 |             yield start
222 |             start += len(term)
223 | 
224 |     def find_offsets(self, term, text):
225 |         offsets = [(i, i + len(term)) for i in list(self.find_offsets_quickly(term, text))]
226 |         return offsets
227 | 
228 |     def tag(self, test_instances):
229 |         clones = []
230 |         for i in test_instances:
231 |             i_ = copy.deepcopy(i)
232 |             i_.aspect_terms = []
233 |             for c in set(self.candidates):
234 |                 if c in i_.text:
235 |                     offsets = self.find_offsets(' ' + c + ' ', i.text)
236 |                     for start, end in offsets: i_.add_aspect_term(term=c,
237 |                                                                   offsets={'from': str(start + 1), 'to': str(end - 1)})
238 |             clones.append(i_)
239 |         return clones
240 | 
241 | 
242 | class BaselineCategoryDetector():
243 |     '''Detect the category (or categories) of an instance.
244 |     For any new (i.e., unseen) instance, fetch the k-closest instances from the train data, and vote for the number of categories and the categories themselves.'''
245 | 
246 |     def __init__(self, corpus):
247 |         self.corpus = corpus
248 | 
249 |     # Fetch k-neighbors (i.e., similar texts), using the Dice coefficient, and vote for #categories and category values
250 |     def fetch_k_nn(self, text, k=5, multi=False):
251 |         neighbors = dict([(i, dice(text, n, stopwords)) for i, n in enumerate(self.corpus.texts)])
252 |         ranked = freq_rank(neighbors)
253 |         topk = [self.corpus.corpus[i] for i in ranked[:k]]
254 |         num_of_cats = 1 if not multi else int(sum([len(i.aspect_categories) for i in topk]) / float(k))
255 |         cats = freq_rank(fd([c for i in topk for c in i.get_aspect_categories()]))
256 |         categories = [cats[i] for i in range(num_of_cats)]
257 |         return categories
258 | 
259 |     def tag(self, test_instances):
260 |         clones = []
261 |         for i in test_instances:
262 |             i_ = copy.deepcopy(i)
263 |             i_.aspect_categories = [Category(term=c) for c in self.fetch_k_nn(i.text)]
264 |             clones.append(i_)
265 |         return clones
266 | 
267 | 
268 | class BaselineStageI():
269 |     '''Stage I: Aspect Term Extraction and Aspect Category Detection.'''
270 | 
271 |     def __init__(self, b1, b2):
272 |         self.b1 = b1
273 |         self.b2 = b2
274 | 
275 |     def tag(self, test_instances):
276 |         clones = []
277 |         for i in test_instances:
278 |             i_ = copy.deepcopy(i)
279 |             i_.aspect_categories, i_.aspect_terms = [], []
280 |             for a in set(self.b1.candidates):
281 |                 offsets = self.b1.find_offsets(' ' + a + ' ', i_.text)
282 |                 for start, end in offsets:
283 |                     i_.add_aspect_term(term=a, offsets={'from': str(start + 1), 'to': str(end - 1)})
284 |             for c in self.b2.fetch_k_nn(i_.text):
285 |                 i_.aspect_categories.append(Category(term=c))
286 |             clones.append(i_)
287 |         return clones
288 | 
289 | 
290 | class BaselineAspectPolarityEstimator():
291 |     '''Estimate the polarity of an instance's aspects.
292 |     This is a majority baseline.
293 |     Form the <aspect,polarity> tuples from the train data, and measure frequencies.
294 |     Then, given a new instance, vote for the polarities of the aspect terms (given).'''
295 | 
296 |     def __init__(self, corpus):
297 |         self.corpus = corpus
298 |         self.fd = fd2([(a.term, a.polarity) for i in self.corpus.corpus for a in i.aspect_terms])
299 |         self.major = freq_rank(fd([a.polarity for i in self.corpus.corpus for a in i.aspect_terms]))[0]
300 | 
301 |     # Fetch k-neighbors (i.e., similar texts), using the Dice coefficient, and vote for aspect's polarity
302 |     def k_nn(self, text, aspect, k=5):
303 |         neighbors = dict([(i, dice(text, next.text, stopwords)) for i, next in enumerate(self.corpus.corpus) if
304 |                           aspect in next.get_aspect_terms()])
305 |         ranked = freq_rank(neighbors)
306 |         topk = [self.corpus.corpus[i] for i in ranked[:k]]
307 |         return freq_rank(fd([a.polarity for i in topk for a in i.aspect_terms]))
308 | 
309 |     def majority(self, text, aspect):
310 |         if aspect not in self.fd:
311 |             return self.major
312 |         else:
313 |             polarities = self.k_nn(text, aspect, k=5)
314 |             if polarities:
315 |                 return polarities[0]
316 |             else:
317 |                 return self.major
318 | 
319 |     def tag(self, test_instances):
320 |         clones = []
321 |         for i in test_instances:
322 |             i_ = copy.deepcopy(i)
323 |             for j in i_.aspect_terms: j.polarity = self.majority(i_.text, j.term)
324 |             clones.append(i_)
325 |         return clones
326 | 
327 | 
328 | class BaselineAspectCategoryPolarityEstimator():
329 |     '''Estimate the polarity of an instance's category (or categories).
330 |     This is a majority baseline.
331 |     Form the <category,polarity> tuples from the train data, and measure frequencies.
332 |     Then, given a new instance, vote for the polarities of the categories (given).'''
333 | 
334 |     def __init__(self, corpus):
335 |         self.corpus = corpus
336 |         self.fd = fd2([(c.term, c.polarity) for i in self.corpus.corpus for c in i.aspect_categories])
337 | 
338 |     # Fetch k-neighbors (i.e., similar texts), using the Dice coefficient, and vote for aspect's polarity
339 |     def k_nn(self, text, k=5):
340 |         neighbors = dict([(i, dice(text, next.text, stopwords)) for i, next in enumerate(self.corpus.corpus)])
341 |         ranked = freq_rank(neighbors)
342 |         topk = [self.corpus.corpus[i] for i in ranked[:k]]
343 |         return freq_rank(fd([c.polarity for i in topk for c in i.aspect_categories]))
344 | 
345 |     def majority(self, text):
346 |         return self.k_nn(text)[0]
347 | 
348 |     def tag(self, test_instances):
349 |         clones = []
350 |         for i in test_instances:
351 |             i_ = copy.deepcopy(i)
352 |             for j in i_.aspect_categories:
353 |                 j.polarity = self.majority(i_.text)
354 |             clones.append(i_)
355 |         return clones
356 | 
357 | 
358 | class BaselineStageII():
359 |     '''Stage II: Aspect Term and Aspect Category Polarity Estimation.
360 |     Terms and categories are assumed given.'''
361 | 
362 |     # Baselines 3 and 4 are assumed given.
363 |     def __init__(self, b3, b4):
364 |         self.b3 = b3
365 |         self.b4 = b4
366 | 
367 |     # Tag sentences with aspects and categories with their polarities
368 |     def tag(self, test_instances):
369 |         clones = []
370 |         for i in test_instances:
371 |             i_ = copy.deepcopy(i)
372 |             for j in i_.aspect_terms: j.polarity=self.b3.majority(i_.text, j.term)
373 |             for j in i_.aspect_categories: j.polarity = self.b4.majority(i_.text)
374 |             clones.append(i_)
375 |         return clones
376 | 
377 | 
378 | class Evaluate():
379 |     '''Evaluation methods, per subtask of the 4th task of SemEval '14.'''
380 | 
381 |     def __init__(self, correct, predicted):
382 |         self.size = len(correct)
383 |         self.correct = correct
384 |         self.predicted = predicted
385 | 
386 |     # Aspect Extraction (no offsets considered)
387 |     def aspect_extraction(self, b=1):
388 |         common, relevant, retrieved = 0., 0., 0.
389 |         for i in range(self.size):
390 |             cor = [a.offsets for a in self.correct[i].aspect_terms]
391 |             pre = [a.offsets for a in self.predicted[i].aspect_terms]
392 |             common += len([a for a in pre if a in cor])
393 |             retrieved += len(pre)
394 |             relevant += len(cor)
395 |         p = common / retrieved if retrieved > 0 else 0.
396 |         r = common / relevant
397 |         f1 = (1 + (b ** 2)) * p * r / ((p * b ** 2) + r) if p > 0 and r > 0 else 0.
398 |         return p, r, f1, common, retrieved, relevant
399 | 
400 |     # Aspect Category Detection
401 |     def category_detection(self, b=1):
402 |         common, relevant, retrieved = 0., 0., 0.
403 |         for i in range(self.size):
404 |             cor = self.correct[i].get_aspect_categories()
405 |             # Use set to avoid duplicates (i.e., two times the same category)
406 |             pre = set(self.predicted[i].get_aspect_categories())
407 |             common += len([c for c in pre if c in cor])
408 |             retrieved += len(pre)
409 |             relevant += len(cor)
410 |         p = common / retrieved if retrieved > 0 else 0.
411 |         r = common / relevant
412 |         f1 = (1 + b ** 2) * p * r / ((p * b ** 2) + r) if p > 0 and r > 0 else 0.
413 |         return p, r, f1, common, retrieved, relevant
414 | 
415 |     def aspect_polarity_estimation(self, b=1):
416 |         common, relevant, retrieved = 0., 0., 0.
417 |         for i in range(self.size):
418 |             cor = [a.polarity for a in self.correct[i].aspect_terms]
419 |             pre = [a.polarity for a in self.predicted[i].aspect_terms]
420 |             common += sum([1 for j in range(len(pre)) if pre[j] == cor[j]])
421 |             retrieved += len(pre)
422 |         acc = common / retrieved
423 |         return acc, common, retrieved
424 | 
425 |     def aspect_category_polarity_estimation(self, b=1):
426 |         common, relevant, retrieved = 0., 0., 0.
427 |         for i in range(self.size):
428 |             cor = [a.polarity for a in self.correct[i].aspect_categories]
429 |             pre = [a.polarity for a in self.predicted[i].aspect_categories]
430 |             common += sum([1 for j in range(len(pre)) if pre[j] == cor[j]])
431 |             retrieved += len(pre)
432 |         acc = common / retrieved
433 |         return acc, common, retrieved
434 | 
435 | 
436 | def main(argv=None):
437 |     # Parse the input
438 |     opts, args = getopt.getopt(argv, "hg:dt:om:k:", ["help", "grammar", "train=", "task=", "test="])
439 |     trainfile, testfile, task = None, None, 1
440 |     use_msg = 'Use as:\n">>> python baselines.py --train file.xml --task 1|2|3|4(|5|6)"\n\nThis will parse a train set, examine whether is valid, split to train and test (80/20 %), write the new train, test and unseen test files, perform ABSA for task 1, 2, 3, or 4 (5 and 6 perform jointly tasks 1 & 2, and 3 & 4, respectively), and write out a file with the predictions.'
441 |     if len(opts) == 0: sys.exit(use_msg)
442 |     for opt, arg in opts:
443 |         if opt in ("-h", "--help"):
444 |             sys.exit(use_msg)
445 |         elif opt in ('-t', "--train"):
446 |             trainfile = arg
447 |         elif opt in ('-m', "--task"):
448 |             task = int(arg)
449 |         elif opt in ('-k', "--test"):
450 |             testfile = arg
451 | 
452 |     # Examine if the file is in proper XML format for further use.
453 |     print 'Validating the file...'
454 |     try:
455 |         elements, aspects = validate(trainfile)
456 |         print 'PASSED! This corpus has: %d sentences, %d aspect term occurrences, and %d distinct aspect terms.' % (
457 |             len(elements), len(aspects), len(list(set(aspects))))
458 |     except:
459 |         print "Unexpected error:", sys.exc_info()[0]
460 |         raise
461 | 
462 |     # Get the corpus and split into train/test.
463 |     corpus = Corpus(ET.parse(trainfile).getroot().findall('sentence'))
464 |     domain_name = 'laptops' if 'laptop' in trainfile else ('restaurants' if 'restau' in trainfile else 'absa')
465 |     if testfile:
466 |         traincorpus = corpus
467 |         seen = Corpus(ET.parse(testfile).getroot().findall('sentence'))
468 |     else:
469 |         train, seen = corpus.split()
470 |         # Store train/test files and clean up the test files (no aspect terms or categories are present); then, parse back the files back.
471 |         corpus.write_out('%s--train.xml' % domain_name, train, short=False)
472 |         traincorpus = Corpus(ET.parse('%s--train.xml' % domain_name).getroot().findall('sentence'))
473 |         corpus.write_out('%s--test.gold.xml' % domain_name, seen, short=False)
474 |         seen = Corpus(ET.parse('%s--test.gold.xml' % domain_name).getroot().findall('sentence'))
475 | 
476 |     corpus.write_out('%s--test.xml' % domain_name, seen.corpus)
477 |     unseen = Corpus(ET.parse('%s--test.xml' % domain_name).getroot().findall('sentence'))
478 | 
479 |     # Perform the tasks, asked by the user and print the files with the predicted responses.
480 |     if task == 1:
481 |         b1 = BaselineAspectExtractor(traincorpus)
482 |         print 'Extracting aspect terms...'
483 |         predicted = b1.tag(unseen.corpus)
484 |         corpus.write_out('%s--test.predicted-aspect.xml' % domain_name, predicted, short=False)
485 |         print 'P = %f -- R = %f -- F1 = %f (#correct: %d, #retrieved: %d, #relevant: %d)' % Evaluate(seen.corpus,
486 |                                                                                                      predicted).aspect_extraction()
487 |     if task == 2:
488 |         print 'Detecting aspect categories...'
489 |         b2 = BaselineCategoryDetector(traincorpus)
490 |         predicted = b2.tag(unseen.corpus)
491 |         print 'P = %f -- R = %f -- F1 = %f (#correct: %d, #retrieved: %d, #relevant: %d)' % Evaluate(seen.corpus,
492 |                                                                                                      predicted).category_detection()
493 |         corpus.write_out('%s--test.predicted-category.xml' % domain_name, predicted, short=False)
494 |     if task == 3:
495 |         print 'Estimating aspect term polarity...'
496 |         b3 = BaselineAspectPolarityEstimator(traincorpus)
497 |         predicted = b3.tag(seen.corpus)
498 |         corpus.write_out('%s--test.predicted-aspectPolar.xml' % domain_name, predicted, short=False)
499 |         print 'Accuracy = %f, #Correct/#All: %d/%d' % Evaluate(seen.corpus, predicted).aspect_polarity_estimation()
500 |     if task == 4:
501 |         print 'Estimating aspect category polarity...'
502 |         b4 = BaselineAspectCategoryPolarityEstimator(traincorpus)
503 |         predicted = b4.tag(seen.corpus)
504 |         print 'Accuracy = %f, #Correct/#All: %d/%d' % Evaluate(seen.corpus,
505 |                                                                predicted).aspect_category_polarity_estimation()
506 |         corpus.write_out('%s--test.predicted-categoryPolar.xml' % domain_name, predicted, short=False)
507 |         # Perform tasks 1 & 2, and output an XML file with the predictions
508 |     if task == 5:
509 |         print 'Task 1 & 2: Aspect Term and Category Detection'
510 |         b1 = BaselineAspectExtractor(traincorpus)
511 |         b2 = BaselineCategoryDetector(traincorpus)
512 |         b12 = BaselineStageI(b1, b2)
513 |         predicted = b12.tag(unseen.corpus)
514 |         corpus.write_out('%s--test.predicted-stageI.xml' % domain_name, predicted, short=False)
515 |         print 'Task 1: P = %f -- R = %f -- F1 = %f (#correct: %d, #retrieved: %d, #relevant: %d)' % Evaluate(
516 |             seen.corpus, predicted).aspect_extraction()
517 |         print 'Task 2: P = %f -- R = %f -- F1 = %f (#correct: %d, #retrieved: %d, #relevant: %d)' % Evaluate(
518 |             seen.corpus, predicted).category_detection()
519 |         # Perform tasks 3 & 4, and output an XML file with the predictions
520 |     if task == 6:
521 |         print 'Aspect Term and Category Polarity Estimation'
522 |         b3 = BaselineAspectPolarityEstimator(traincorpus)
523 |         b4 = BaselineAspectCategoryPolarityEstimator(traincorpus)
524 |         b34 = BaselineStageII(b3, b4)
525 |         predicted = b34.tag(seen.corpus)
526 |         corpus.write_out('%s--test.predicted-stageII.xml' % domain_name, predicted, short=False)
527 |         print 'Task 3: Accuracy = %f (#Correct/#All: %d/%d)' % Evaluate(seen.corpus,
528 |                                                                         predicted).aspect_polarity_estimation()
529 |         print 'Task 4: Accuracy = %f (#Correct/#All: %d/%d)' % Evaluate(seen.corpus,
530 |                                                                         predicted).aspect_category_polarity_estimation()
531 | 
532 | 
533 | if __name__ == "__main__": main(sys.argv[1:])
534 | 


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2014/submission-guidelines.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/datasets/ABSA-SemEval2014/submission-guidelines.pdf


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2015/absa-2015_laptops_trial.xml:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  2 | <Reviews>
  3 |      <Review rid="79">
  4 |         <sentences>
  5 |             <sentence id="79:0">
  6 |                 <text>Being a PC user my whole life....</text>
  7 |             </sentence>
  8 |             <sentence id="79:1">
  9 |                 <text>This computer is absolutely AMAZING!!!</text>
 10 |                 <Opinions>
 11 |                     <Opinion category="LAPTOP#GENERAL" polarity="positive"/>
 12 |                 </Opinions>
 13 |             </sentence>
 14 |             <sentence id="79:2">
 15 |                 <text>10 plus hours of battery...</text>
 16 |                 <Opinions>
 17 |                     <Opinion category="BATTERY#OPERATION_PERFORMANCE" polarity="positive"/>
 18 |                 </Opinions>
 19 |             </sentence>
 20 |             <sentence id="79:3">
 21 |                 <text>super fast processor and really nice graphics card..</text>
 22 |                 <Opinions>
 23 |                     <Opinion category="CPU#OPERATION_PERFORMANCE" polarity="positive"/>
 24 |                     <Opinion category="GRAPHICS#GENERAL" polarity="positive"/>
 25 |                 </Opinions>
 26 |             </sentence>
 27 |             <sentence id="79:4">
 28 |                 <text>and plenty of storage with 250 gb(though I will upgrade this and the ram..)</text>
 29 |                 <Opinions>
 30 |                     <Opinion category="HARD_DISC#DESIGN_FEATURES" polarity="positive"/>
 31 |                 </Opinions>
 32 |             </sentence>
 33 |             <sentence id="79:5">
 34 |                 <text>This computer is really fast and I'm shocked as to how easy it is to get used to...</text>
 35 |                 <Opinions>
 36 |                     <Opinion category="LAPTOP#OPERATION_PERFORMANCE" polarity="positive"/>
 37 |                     <Opinion category="LAPTOP#USABILITY" polarity="positive"/>
 38 |                 </Opinions>
 39 |             </sentence>
 40 |             <sentence id="79:6">
 41 |                 <text>I've only had mine a day but I'm already used to it...</text>
 42 |                 <Opinions>
 43 |                     <Opinion category="LAPTOP#USABILITY" polarity="positive"/>
 44 |                 </Opinions>
 45 |             </sentence>
 46 |             <sentence id="79:7" OutOfScope="TRUE">
 47 |                 <text>MACS ARE AMAZING!!!</text>
 48 |             </sentence>
 49 |             <sentence id="79:8">
 50 |                 <text>GET THIS COMPUTER FOR PORTABILITY AND FAST PROCESSING!!!</text>
 51 |                 <Opinions>
 52 |                     <Opinion category="LAPTOP#PORTABILITY" polarity="positive"/>
 53 |                     <Opinion category="CPU#OPERATION_PERFORMANCE" polarity="positive"/>
 54 |                 </Opinions>
 55 |             </sentence>
 56 |         </sentences>
 57 |     </Review>
 58 |     <Review rid="237">
 59 |         <sentences>
 60 |             <sentence id="237:0">
 61 |                 <text>It's fast and has excellent battery life.</text>
 62 |                 <Opinions>
 63 |                     <Opinion category="LAPTOP#OPERATION_PERFORMANCE" polarity="positive"/>
 64 |                     <Opinion category="BATTERY#OPERATION_PERFORMANCE" polarity="positive"/>
 65 |                 </Opinions>
 66 |             </sentence>
 67 |             <sentence id="237:1">
 68 |                 <text>The screen shows great colors.</text>
 69 |                 <Opinions>
 70 |                     <Opinion category="DISPLAY#QUALITY" polarity="positive"/>
 71 |                 </Opinions>
 72 |             </sentence>
 73 |         </sentences>
 74 |     </Review>
 75 |     <Review rid="175">
 76 |         <sentences>
 77 |             <sentence id="175:0">
 78 |                 <text>From the moment I opened the box to the present it has been a great joy.</text>
 79 |                 <Opinions>
 80 |                     <Opinion category="LAPTOP#GENERAL" polarity="positive"/>
 81 |                 </Opinions>
 82 |             </sentence>
 83 |             <sentence id="175:1">
 84 |                 <text>It is always reliable, never bugged and responds well.</text>
 85 |                 <Opinions>
 86 |                     <Opinion category="LAPTOP#QUALITY" polarity="positive"/>
 87 |                     <Opinion category="LAPTOP#OPERATION_PERFORMANCE" polarity="positive"/>
 88 |                 </Opinions>
 89 |             </sentence>
 90 |             <sentence id="175:2">
 91 |                 <text>I love the operating system and the preloaded software.</text>
 92 |                 <Opinions>
 93 |                     <Opinion category="OS#GENERAL" polarity="positive"/>
 94 |                     <Opinion category="SOFTWARE#GENERAL" polarity="positive"/>
 95 |                 </Opinions>
 96 |             </sentence>
 97 |         </sentences>
 98 |     </Review>
 99 |     <Review rid="302">
100 |         <sentences>
101 |             <sentence id="302:0" OutOfScope="TRUE">
102 |                 <text>Well, I have to say since I bought my Mac, I won't ever go back to any Windows.</text>
103 |             </sentence>
104 |             <sentence id="302:1">
105 |                 <text>It's solid.</text>
106 |                 <Opinions>
107 |                     <Opinion category="LAPTOP#QUALITY" polarity="positive"/>
108 |                 </Opinions>
109 |             </sentence>
110 |             <sentence id="302:2">
111 |                 <text>Love the stability of the Mac software and operating system.</text>
112 |                 <Opinions>
113 |                     <Opinion category="SOFTWARE#OPERATION_PERFORMANCE" polarity="positive"/>
114 |                     <Opinion category="OS#OPERATION_PERFORMANCE" polarity="positive"/>
115 |                 </Opinions>
116 |             </sentence>
117 |             <sentence id="302:3">
118 |                 <text>The only downfall is a lot of the software I have won't work with Mac and iWork is not worth the price of it.</text>
119 |                 <Opinions>
120 |                     <Opinion category="SOFTWARE#USABILITY" polarity="negative"/>
121 |                     <Opinion category="SOFTWARE#GENERAL" polarity="negative"/>
122 |                     <Opinion category="SOFTWARE#PRICE" polarity="negative"/>
123 |                 </Opinions>
124 |             </sentence>
125 |             <sentence id="302:4">
126 |                 <text>It seems to be incompatible with everything else.</text>
127 |                 <Opinions>
128 |                     <Opinion category="SOFTWARE#USABILITY" polarity="negative"/>
129 |                 </Opinions>
130 |             </sentence>
131 |             <sentence id="302:5">
132 |                 <text>But the machine is awesome and iLife is great and I love Snow Leopard X.</text>
133 |                 <Opinions>
134 |                     <Opinion category="LAPTOP#GENERAL" polarity="positive"/>
135 |                     <Opinion category="SOFTWARE#GENERAL" polarity="positive"/>
136 |                     <Opinion category="OS#GENERAL" polarity="positive"/>
137 |                 </Opinions>
138 |             </sentence>
139 |         </sentences>
140 |     </Review>	
141 |     <Review rid="127">
142 |         <sentences>
143 |             <sentence id="127:0">
144 |                 <text>Ever since I bought this laptop, so far I've experience nothing but constant break downs of the laptop and bad customer services I received over the phone with toshiba customer services hotlines.</text>
145 |                 <Opinions>
146 |                     <Opinion category="LAPTOP#QUALITY" polarity="negative"/>
147 |                     <Opinion category="SUPPORT#QUALITY" polarity="negative"/>
148 |                 </Opinions>
149 |             </sentence>
150 |             <sentence id="127:1">
151 |                 <text>I constantly had to send my laptop in for services every 3 months and it always seems to be the same problem that they said they had already fixed.</text>
152 |                 <Opinions>
153 |                     <Opinion category="LAPTOP#QUALITY" polarity="negative"/>
154 |                     <Opinion category="SUPPORT#QUALITY" polarity="negative"/>
155 |                 </Opinions>
156 |             </sentence>
157 |             <sentence id="127:2">
158 |                 <text>Toshiba customer services will indirectly deal with your problems by constantly tranferring you from one country to another, and I am not kidding you, I called different hours of the day and you'll get someone else from another country trying to get you to tell them your life story all over again, since they make it sound like they don't have your history list of your calls right in front of them.</text>
159 |                 <Opinions>
160 |                     <Opinion category="SUPPORT#QUALITY" polarity="negative"/>
161 |                 </Opinions>
162 |             </sentence>
163 |             <sentence id="127:3">
164 |                 <text>It's a long and tirring process that after a while it seems like their game plan was to wear you out so you would want to give up on contacting them.</text>
165 |                 <Opinions>
166 |                     <Opinion category="SUPPORT#QUALITY" polarity="negative"/>
167 |                 </Opinions>
168 |             </sentence>
169 |             <sentence id="127:4">
170 |                 <text>And at one point, they blame me for installing a bad memory stick when I upgrade my memory for the first time during my purchase of the laptop (I bought the memory stick they recomended).</text>
171 |                 <Opinions>
172 |                     <Opinion category="SUPPORT#QUALITY" polarity="negative"/>
173 |                 </Opinions>
174 |             </sentence>
175 |             <sentence id="127:5">
176 |                 <text>Long story short, since I experience so many problems with my laptop every since I bought it from day one, I didn't ask for a new laptop or a refund of what I pay for a crapy laptop, but just an extension of my laptop warranty for another year, they made a big deal of out that and after so many calls and complaints about their products and services, they finally gave in.</text>
177 |                 <Opinions>
178 |                     <Opinion category="LAPTOP#QUALITY" polarity="negative"/>
179 |                     <Opinion category="SUPPORT#QUALITY" polarity="negative"/>
180 |                 </Opinions>
181 |             </sentence>
182 |             <sentence id="127:6">
183 |                 <text>Was this product worth my time and money to ever want to purchase another products that is toshiba or relating to toshiba?</text>
184 |               <Opinions>
185 |                     <Opinion category="LAPTOP#GENERAL" polarity="negative"/>
186 |                     <Opinion category="COMPANY#GENERAL" polarity="negative"/>
187 |                 </Opinions>
188 | 			</sentence>
189 |             <sentence id="127:7">
190 |                 <text>Probably not ever again.</text>
191 |             </sentence>
192 |             <sentence id="127:8">
193 |                 <text>I'll rather be out of date then spend more money on toshiba.</text>
194 |                 <Opinions>
195 |                     <Opinion category="COMPANY#GENERAL" polarity="negative"/>
196 |                 </Opinions>
197 |             </sentence>
198 |             <sentence id="127:9">
199 |                 <text>Remember to do your research first before consider buying a toshiba product.</text>
200 |                 <Opinions>
201 |                     <Opinion category="COMPANY#GENERAL" polarity="negative"/>
202 |                 </Opinions>
203 |             </sentence>
204 |         </sentences>
205 |     </Review>
206 |     <Review rid="160">
207 |         <sentences>
208 |             <sentence id="160:0">
209 |                 <text>Purchased as a gift for a friend.</text>
210 |             </sentence>
211 |             <sentence id="160:1">
212 |                 <text>The service I received from Toshiba went above and beyond the call of duty.</text>
213 |                 <Opinions>
214 |                     <Opinion category="SUPPORT#QUALITY" polarity="positive"/>
215 |                 </Opinions>
216 |             </sentence>
217 |             <sentence id="160:2">
218 |                 <text>My friend reports the notebook is astonishing in performance, picture quality, and ease of use.</text>
219 |                 <Opinions>
220 |                     <Opinion category="LAPTOP#OPERATION_PERFORMANCE" polarity="positive"/>
221 |                     <Opinion category="DISPLAY#QUALITY" polarity="positive"/>
222 |                     <Opinion category="LAPTOP#USABILITY" polarity="positive"/>
223 |                 </Opinions>
224 |             </sentence>
225 |             <sentence id="160:3">
226 |                 <text>It is extremely portable and easily connects to WIFI at the library and elsewhere.</text>
227 |                 <Opinions>
228 |                     <Opinion category="LAPTOP#PORTABILITY" polarity="positive"/>
229 |                     <Opinion category="LAPTOP#CONNECTIVITY" polarity="positive"/>
230 |                 </Opinions>
231 |             </sentence>
232 |             <sentence id="160:4">
233 |                 <text>Just what the doctor ordered.</text>
234 |                 <Opinions>
235 |                     <Opinion category="LAPTOP#GENERAL" polarity="positive"/>
236 |                 </Opinions>
237 |             </sentence>
238 |         </sentences>
239 |     </Review>
240 |     <Review rid="315">
241 |         <sentences>
242 |             <sentence id="315:0">
243 |                 <text>the key bindings take a little getting used to, but have loved the Macbook Pro.</text>
244 |                 <Opinions>
245 |                     <Opinion category="KEYBOARD#USABILITY" polarity="neutral"/>
246 |                     <Opinion category="LAPTOP#GENERAL" polarity="positive"/>
247 |                 </Opinions>
248 |             </sentence>
249 |             <sentence id="315:1">
250 |                 <text>Delivery was early too.</text>
251 |                 <Opinions>
252 |                     <Opinion category="SHIPPING#QUALITY" polarity="positive"/>
253 |                 </Opinions>
254 |             </sentence>
255 |         </sentences>
256 |     </Review>
257 |     <Review rid="348">
258 |         <sentences>
259 |             <sentence id="348:0">
260 |                 <text>Most everything is fine with this machine: speed, capacity, build.</text>
261 |                 <Opinions>
262 |                     <Opinion category="LAPTOP#GENERAL" polarity="positive"/>
263 |                     <Opinion category="LAPTOP#OPERATION_PERFORMANCE" polarity="positive"/>
264 |                     <Opinion category="HARD_DISC#DESIGN_FEATURES" polarity="positive"/>
265 |                     <Opinion category="LAPTOP#QUALITY" polarity="positive"/>
266 |                 </Opinions>
267 |             </sentence>
268 |             <sentence id="348:1">
269 |                 <text>The only thing I don't understand is that the resolution of the screen isn't high enough for some pages, such as Yahoo!Mail.</text>
270 |                 <Opinions>
271 |                     <Opinion category="DISPLAY#QUALITY" polarity="negative"/>
272 |                 </Opinions>
273 |             </sentence>
274 |             <sentence id="348:2">
275 |                 <text>Yes, I have it on the highest available setting.</text>
276 |             </sentence>
277 |         </sentences>
278 |     </Review>
279 |     <Review rid="339">
280 |         <sentences>
281 |             <sentence id="339:0">
282 |                 <text>Plain and simple, it(laptop) runs great and loads fast.</text>
283 |                 <Opinions>
284 |                     <Opinion category="LAPTOP#OPERATION_PERFORMANCE" polarity="positive"/>
285 |                 </Opinions>
286 |             </sentence>
287 |             <sentence id="339:1">
288 |                 <text>Easy to carry, can be taken anywhere, can be hooked up to printers,headsets.</text>
289 |                 <Opinions>
290 |                     <Opinion category="LAPTOP#PORTABILITY" polarity="positive"/>
291 |                     <Opinion category="LAPTOP#CONNECTIVITY" polarity="positive"/>
292 |                 </Opinions>
293 |             </sentence>
294 |             <sentence id="339:2" OutOfScope="TRUE">
295 |                 <text>Love that it doesn't take up space like a regular computer.</text>
296 |             </sentence>
297 |         </sentences>
298 |     </Review>
299 | 	<Review rid="16">
300 |         <sentences>
301 |             <sentence id="16:0">
302 |                 <text>This computer gets very hot, before shutting down.</text>
303 |                 <Opinions>
304 |                     <Opinion category="LAPTOP#QUALITY" polarity="negative"/>
305 |                 </Opinions>
306 |             </sentence>
307 |             <sentence id="16:1">
308 |                 <text>It is not ideal for children because of the temp.</text>
309 |                 <Opinions>
310 |                     <Opinion category="LAPTOP#QUALITY" polarity="negative"/>
311 |                     <Opinion category="LAPTOP#MISCELLANEOUS" polarity="negative"/>
312 |                 </Opinions>
313 |             </sentence>
314 |             <sentence id="16:2">
315 |                 <text>I Contacted HP  about this situation and was given excuses, without results.</text>
316 |                 <Opinions>
317 |                     <Opinion category="SUPPORT#QUALITY" polarity="negative"/>
318 |                 </Opinions>
319 |             </sentence>
320 |             <sentence id="16:3">
321 |                 <text>They didn't even try to assist me or even offer a replacement.</text>
322 |                 <Opinions>
323 |                     <Opinion category="SUPPORT#QUALITY" polarity="negative"/>
324 |                 </Opinions>
325 |             </sentence>
326 |             <sentence id="16:4">
327 |                 <text>I will never purchase a HP again ever.</text>
328 |                 <Opinions>
329 |                     <Opinion category="COMPANY#GENERAL" polarity="negative"/>
330 |                 </Opinions>
331 |             </sentence>
332 |         </sentences>
333 |     </Review>
334 | </Reviews>


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2015/absa-2015_restaurants_trial.xml:
--------------------------------------------------------------------------------
  1 | <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
  2 | <Reviews>
  3 |     <Review rid="1004293">
  4 |         <sentences>
  5 |             <sentence id="1004293:0">
  6 |                 <text>Judging from previous posts this used to be a good place, but not any longer.</text>
  7 |                 <Opinions>
  8 |                     <Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="51" to="56"/>
  9 |                 </Opinions>
 10 |             </sentence>
 11 |             <sentence id="1004293:1">
 12 |                 <text>We, there were four of us, arrived at noon - the place was empty - and the staff acted like we were imposing on them and they were very rude.</text>
 13 |                 <Opinions>
 14 |                     <Opinion target="staff" category="SERVICE#GENERAL" polarity="negative" from="75" to="80"/>
 15 |                 </Opinions>
 16 |             </sentence>
 17 |             <sentence id="1004293:2">
 18 |                 <text>They never brought us complimentary noodles, ignored repeated requests for sugar, and threw our dishes on the table.</text>
 19 |                 <Opinions>
 20 |                     <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
 21 |                 </Opinions>
 22 |             </sentence>
 23 |             <sentence id="1004293:3">
 24 |                 <text>The food was lousy - too sweet or too salty and the portions tiny.</text>
 25 |                 <Opinions>
 26 |                     <Opinion target="food" category="FOOD#QUALITY" polarity="negative" from="4" to="8"/>
 27 |                     <Opinion target="portions" category="FOOD#STYLE_OPTIONS" polarity="negative" from="52" to="60"/>
 28 |                 </Opinions>
 29 |             </sentence>
 30 |             <sentence id="1004293:4">
 31 |                 <text>After all that, they complained to me about the small tip.</text>
 32 |                 <Opinions>
 33 |                     <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
 34 |                 </Opinions>
 35 |             </sentence>
 36 |             <sentence id="1004293:5">
 37 |                 <text>Avoid this place!</text>
 38 |                 <Opinions>
 39 |                     <Opinion target="place" category="RESTAURANT#GENERAL" polarity="negative" from="11" to="16"/>
 40 |                 </Opinions>
 41 |             </sentence>
 42 |         </sentences>
 43 |     </Review>
 44 |     <Review rid="1032695">
 45 |         <sentences>
 46 |             <sentence id="1032695:0">
 47 |                 <text>Every time in New York I make it a point to visit Restaurant Saul on Smith Street.</text>
 48 |                 <Opinions>
 49 |                     <Opinion target="Restaurant Saul" category="RESTAURANT#GENERAL" polarity="positive" from="50" to="65"/>
 50 |                 </Opinions>
 51 |             </sentence>
 52 |             <sentence id="1032695:1">
 53 |                 <text>Everything is always cooked to perfection, the service is excellent, the decor cool and understated.</text>
 54 |                 <Opinions>
 55 |                     <Opinion target="NULL" category="FOOD#QUALITY" polarity="positive" from="0" to="0"/>
 56 |                     <Opinion target="service" category="SERVICE#GENERAL" polarity="positive" from="47" to="54"/>
 57 |                     <Opinion target="decor" category="AMBIENCE#GENERAL" polarity="positive" from="73" to="78"/>
 58 |                 </Opinions>
 59 |             </sentence>
 60 |             <sentence id="1032695:2">
 61 |                 <text>I had the duck breast special on my last visit and it was incredible.</text>
 62 |                 <Opinions>
 63 |                     <Opinion target="duck breast special" category="FOOD#QUALITY" polarity="positive" from="10" to="29"/>
 64 |                 </Opinions>
 65 |             </sentence>
 66 |             <sentence id="1032695:3">
 67 |                 <text>Can't wait wait for my next visit.</text>
 68 |                 <Opinions>
 69 |                     <Opinion target="NULL" category="RESTAURANT#GENERAL" polarity="positive" from="0" to="0"/>
 70 |                 </Opinions>
 71 |             </sentence>
 72 |         </sentences>
 73 |     </Review>
 74 |     <Review rid="1053543">
 75 |         <sentences>
 76 |             <sentence id="1053543:0">
 77 |                 <text>We ate outside at Haru's Sake bar because Haru's restaurant next door was overflowing.</text>
 78 |             </sentence>
 79 |             <sentence id="1053543:1">
 80 |                 <text>What's the difference between the two?</text>
 81 |             </sentence>
 82 |             <sentence id="1053543:2">
 83 |                 <text>Their sake list was extensive, but we were looking for Purple Haze, which wasn't listed but made for us upon request!</text>
 84 |                 <Opinions>
 85 |                     <Opinion target="sake list" category="DRINKS#STYLE_OPTIONS" polarity="positive" from="6" to="15"/>
 86 |                     <Opinion target="NULL" category="SERVICE#GENERAL" polarity="positive" from="0" to="0"/>
 87 |                 </Opinions>
 88 |             </sentence>
 89 |             <sentence id="1053543:3">
 90 |                 <text>The spicy tuna roll was unusually good and the rock shrimp tempura was awesome, great appetizer to share!</text>
 91 |                 <Opinions>
 92 |                     <Opinion target="spicy tuna roll" category="FOOD#QUALITY" polarity="positive" from="4" to="19"/>
 93 |                     <Opinion target="rock shrimp tempura" category="FOOD#QUALITY" polarity="positive" from="47" to="66"/>
 94 |                 </Opinions>
 95 |             </sentence>
 96 |             <sentence id="1053543:4">
 97 |                 <text>We went around 9:30 on a Friday and it had died down a bit by then so the service was great!</text>
 98 |                 <Opinions>
 99 |                     <Opinion target="service" category="SERVICE#GENERAL" polarity="positive" from="74" to="81"/>
100 |                 </Opinions>
101 |             </sentence>
102 |         </sentences>
103 |     </Review>
104 |     <Review rid="1055910">
105 |         <sentences>
106 |             <sentence id="1055910:0">
107 |                 <text>we love th pink pony.</text>
108 |                 <Opinions>
109 |                     <Opinion target="pink pony" category="RESTAURANT#GENERAL" polarity="positive" from="11" to="20"/>
110 |                 </Opinions>
111 |             </sentence>
112 |             <sentence id="1055910:1">
113 |                 <text>THe perfect spot.</text>
114 |                 <Opinions>
115 |                     <Opinion target="spot" category="RESTAURANT#GENERAL" polarity="positive" from="12" to="16"/>
116 |                 </Opinions>
117 |             </sentence>
118 |             <sentence id="1055910:2">
119 |                 <text>Food-awesome.</text>
120 |                 <Opinions>
121 |                     <Opinion target="Food" category="FOOD#QUALITY" polarity="positive" from="0" to="4"/>
122 |                 </Opinions>
123 |             </sentence>
124 |             <sentence id="1055910:3">
125 |                 <text>Service- friendly and attentive.</text>
126 |                 <Opinions>
127 |                     <Opinion target="Service" category="SERVICE#GENERAL" polarity="positive" from="0" to="7"/>
128 |                 </Opinions>
129 |             </sentence>
130 |             <sentence id="1055910:4">
131 |                 <text>Ambiance- relaxed and stylish.</text>
132 |                 <Opinions>
133 |                     <Opinion target="Ambiance" category="AMBIENCE#GENERAL" polarity="positive" from="0" to="8"/>
134 |                 </Opinions>
135 |             </sentence>
136 |             <sentence id="1055910:5">
137 |                 <text>Don't judge this place prima facie, you have to try it to believe it, a home away from home for the literate heart.</text>
138 |                 <Opinions>
139 |                     <Opinion target="place" category="RESTAURANT#GENERAL" polarity="positive" from="17" to="22"/>
140 |                 </Opinions>
141 |             </sentence>
142 |         </sentences>
143 |     </Review>
144 |     <Review rid="1062641">
145 |         <sentences>
146 |             <sentence id="1062641:0">
147 |                 <text>This place has got to be the best japanese restaurant in the new york area.</text>
148 |                 <Opinions>
149 |                     <Opinion target="place" category="RESTAURANT#GENERAL" polarity="positive" from="5" to="10"/>
150 |                 </Opinions>
151 |             </sentence>
152 |             <sentence id="1062641:1">
153 |                 <text>I had a great experience.</text>
154 |                 <Opinions>
155 |                     <Opinion target="NULL" category="RESTAURANT#GENERAL" polarity="positive" from="0" to="0"/>
156 |                 </Opinions>
157 |             </sentence>
158 |             <sentence id="1062641:2">
159 |                 <text>Food is great.</text>
160 |                 <Opinions>
161 |                     <Opinion target="Food" category="FOOD#QUALITY" polarity="positive" from="0" to="4"/>
162 |                 </Opinions>
163 |             </sentence>
164 |             <sentence id="1062641:3">
165 |                 <text>Service is top notch.</text>
166 |                 <Opinions>
167 |                     <Opinion target="Service" category="SERVICE#GENERAL" polarity="positive" from="0" to="7"/>
168 |                 </Opinions>
169 |             </sentence>
170 |             <sentence id="1062641:4">
171 |                 <text>I have been going back again and again.</text>
172 |                 <Opinions>
173 |                     <Opinion target="NULL" category="RESTAURANT#GENERAL" polarity="positive" from="0" to="0"/>
174 |                 </Opinions>
175 |             </sentence>
176 |         </sentences>
177 |     </Review>
178 |     <Review rid="1090587">
179 |         <sentences>
180 |             <sentence id="1090587:0">
181 |                 <text>Just went here for my girlfriends 23rd bday.</text>
182 |             </sentence>
183 |             <sentence id="1090587:1">
184 |                 <text>If you've ever been along the river in Weehawken you have an idea of the top of view the chart house has to offer.</text>
185 |                 <Opinions>
186 |                     <Opinion target="view" category="LOCATION#GENERAL" polarity="positive" from="80" to="84"/>
187 |                 </Opinions>
188 |             </sentence>
189 |             <sentence id="1090587:2">
190 |                 <text>Add to that great service and great food at a reasonable price and you have yourself the beginning of a great evening.</text>
191 |                 <Opinions>
192 |                     <Opinion target="service" category="SERVICE#GENERAL" polarity="positive" from="18" to="25"/>
193 |                     <Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="36" to="40"/>
194 |                     <Opinion target="food" category="FOOD#PRICES" polarity="positive" from="36" to="40"/>
195 |                 </Opinions>
196 |             </sentence>
197 |             <sentence id="1090587:3">
198 |                 <text>The lava cake dessert was incredible and I recommend it.</text>
199 |                 <Opinions>
200 |                     <Opinion target="lava cake dessert" category="FOOD#QUALITY" polarity="positive" from="4" to="21"/>
201 |                 </Opinions>
202 |             </sentence>
203 |         </sentences>
204 |     </Review>
205 |     <Review rid="1118167">
206 |         <sentences>
207 |             <sentence id="1118167:0">
208 |                 <text>I have never eaten in the restaurant, however, upon reading the reviews I got take out last week.</text>
209 |             </sentence>
210 |             <sentence id="1118167:1">
211 |                 <text>IT WAS HORRIBLE.</text>
212 |                 <Opinions>
213 |                     <Opinion target="NULL" category="RESTAURANT#GENERAL" polarity="negative" from="0" to="0"/>
214 |                 </Opinions>
215 |             </sentence>
216 |             <sentence id="1118167:2">
217 |                 <text>The pizza was delivered cold and the cheese wasn't even fully melted!</text>
218 |                 <Opinions>
219 |                     <Opinion target="pizza" category="FOOD#QUALITY" polarity="negative" from="4" to="9"/>
220 |                     <Opinion target="cheese" category="FOOD#QUALITY" polarity="negative" from="37" to="43"/>
221 |                 </Opinions>
222 |             </sentence>
223 |             <sentence id="1118167:3">
224 |                 <text>It looked like shredded cheese partly done - still in strips.</text>
225 |                 <Opinions>
226 |                     <Opinion target="NULL" category="FOOD#QUALITY" polarity="negative" from="0" to="0"/>
227 |                 </Opinions>
228 |             </sentence>
229 |             <sentence id="1118167:4" OutOfScope="TRUE">
230 |                 <text>I have eaten at many pizza places around NYC and this is hands down the worst.</text>
231 |             </sentence>
232 |         </sentences>
233 |     </Review>
234 |     <Review rid="1145510">
235 |         <sentences>
236 |             <sentence id="1145510:0">
237 |                 <text>This is a fun restaurant to go to.</text>
238 |                 <Opinions>
239 |                     <Opinion target="restaurant" category="RESTAURANT#GENERAL" polarity="positive" from="14" to="24"/>
240 |                 </Opinions>
241 |             </sentence>
242 |             <sentence id="1145510:1">
243 |                 <text>The pizza is yummy and I like the atmoshpere.</text>
244 |                 <Opinions>
245 |                     <Opinion target="pizza" category="FOOD#QUALITY" polarity="positive" from="4" to="9"/>
246 |                     <Opinion target="atmoshpere" category="AMBIENCE#GENERAL" polarity="positive" from="34" to="44"/>
247 |                 </Opinions>
248 |             </sentence>
249 |             <sentence id="1145510:2">
250 |                 <text>But the pizza is way to expensive.</text>
251 |                 <Opinions>
252 |                     <Opinion target="pizza" category="FOOD#PRICES" polarity="negative" from="8" to="13"/>
253 |                 </Opinions>
254 |             </sentence>
255 |             <sentence id="1145510:3">
256 |                 <text>A large is $20, and toppings are about $3 each.</text>
257 |             </sentence>
258 |         </sentences>
259 |     </Review>
260 |     <Review rid="1162037">
261 |         <sentences>
262 |             <sentence id="1162037:0">
263 |                 <text>Planet Thailand has always been a hit with me , I go there usually for the sushi, which is great , the thai food is excellent too .</text>
264 |                 <Opinions>
265 |                     <Opinion target="sushi" category="FOOD#QUALITY" polarity="positive" from="75" to="80"/>
266 |                     <Opinion target="thai food" category="FOOD#QUALITY" polarity="positive" from="103" to="112"/>
267 |                     <Opinion target="Planet Thailand" category="RESTAURANT#GENERAL" polarity="positive" from="0" to="15"/>
268 |                 </Opinions>
269 |             </sentence>
270 |             <sentence id="1162037:1">
271 |                 <text>With the great variety on the menu , I eat here often and never get bored .</text>
272 |                 <Opinions>
273 |                     <Opinion target="menu" category="FOOD#STYLE_OPTIONS" polarity="positive" from="30" to="34"/>
274 |                 </Opinions>
275 |             </sentence>
276 |             <sentence id="1162037:2">
277 |                 <text>The atmosphere isn't the greatest , but I suppose that's how they keep the prices down .</text>
278 |                 <Opinions>
279 |                     <Opinion target="atmosphere" category="AMBIENCE#GENERAL" polarity="negative" from="4" to="14"/>
280 |                     <Opinion target="NULL" category="RESTAURANT#PRICES" polarity="positive" from="0" to="0"/>
281 |                 </Opinions>
282 |             </sentence>
283 |             <sentence id="1162037:3">
284 |                 <text>It's all about the food !!</text>
285 |                 <Opinions>
286 |                     <Opinion target="food" category="FOOD#QUALITY" polarity="positive" from="19" to="23"/>
287 |                 </Opinions>
288 |             </sentence>
289 |         </sentences>
290 |     </Review>
291 |     <Review rid="1212346">
292 |         <sentences>
293 |             <sentence id="1212346:0">
294 |                 <text>Moules were excellent, lobster ravioli was VERY salty!</text>
295 |                 <Opinions>
296 |                     <Opinion target="Moules" category="FOOD#QUALITY" polarity="positive" from="0" to="6"/>
297 |                     <Opinion target="lobster ravioli" category="FOOD#QUALITY" polarity="negative" from="23" to="38"/>
298 |                 </Opinions>
299 |             </sentence>
300 |             <sentence id="1212346:1">
301 |                 <text>Took my mom for Mother's Day, and the maitre d' was pretty rude.</text>
302 |                 <Opinions>
303 |                     <Opinion target="maitre d'" category="SERVICE#GENERAL" polarity="negative" from="38" to="47"/>
304 |                 </Opinions>
305 |             </sentence>
306 |             <sentence id="1212346:2">
307 |                 <text>Told us to sit anywhere, and when we sat he said the table was reserved.</text>
308 |                 <Opinions>
309 |                     <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
310 |                 </Opinions>
311 |             </sentence>
312 |             <sentence id="1212346:3">
313 |                 <text>Stepped on my foot on the SECOND time he reached over me to adjust lighting.</text>
314 |                 <Opinions>
315 |                     <Opinion target="NULL" category="SERVICE#GENERAL" polarity="negative" from="0" to="0"/>
316 |                 </Opinions>
317 |             </sentence>
318 |             <sentence id="1212346:4">
319 |                 <text>Tiny dessert was $8.00...just plain overpriced for what it is.</text>
320 |                 <Opinions>
321 |                     <Opinion target="dessert" category="FOOD#QUALITY" polarity="negative" from="5" to="12"/>
322 |                     <Opinion target="dessert" category="FOOD#STYLE_OPTIONS" polarity="negative" from="5" to="12"/>
323 |                     <Opinion target="dessert" category="FOOD#PRICES" polarity="negative" from="5" to="12"/>
324 |                 </Opinions>
325 |             </sentence>
326 |         </sentences>
327 |     </Review>
328 | </Reviews>


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2015/guidelines/SemEval2015_ABSA_Laptops_AnnotationGuidelines.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/datasets/ABSA-SemEval2015/guidelines/SemEval2015_ABSA_Laptops_AnnotationGuidelines.pdf


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2015/guidelines/semeval2015_absa_restaurants_annotationguidelines.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/datasets/ABSA-SemEval2015/guidelines/semeval2015_absa_restaurants_annotationguidelines.pdf


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2016/Training_Data/ABSA16FR-RestaurantsTrain/ABSA16FR-download.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/datasets/ABSA-SemEval2016/Training_Data/ABSA16FR-RestaurantsTrain/ABSA16FR-download.jar


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2016/Training_Data/ABSA16FR-RestaurantsTrain/ABSA16FR_Restaurants_guidelines.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/datasets/ABSA-SemEval2016/Training_Data/ABSA16FR-RestaurantsTrain/ABSA16FR_Restaurants_guidelines.pdf


--------------------------------------------------------------------------------
/datasets/ABSA-SemEval2016/Training_Data/ABSA16FR-RestaurantsTrain/README.txt:
--------------------------------------------------------------------------------
 1 | 
 2 | == SemEval-2016 Task 5 Aspect-Based Sentiment Analysis (ABSA) task for French ==
 3 | 
 4 | Training data for Subtask 1 (Sentence-level ABSA): annotated reviews for the Restaurant domain
 5 | ----------------------------------------------------------------------------------------------
 6 | 
 7 | 
 8 | The French review annotations are distributed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence (https://creativecommons.org/licenses/by-nc-nd/4.0/). 
 9 | The software is distributed under licence CeCILL (http://www.cecill.info/licences/Licence_CeCILL_V2.1-en.html)
10 | 
11 | 
12 | ------------------------
13 | Contents of this package
14 | ------------------------
15 | 
16 | The release package contains four files:
17 | 
18 |  - ABSA16FR_Restaurants_Train.xml: an XML file compliant with the ABSA dataset format, containing review ids and opinion annotations. <text> tags in the XML file are empty for copyright reasons.
19 |  - ABSA16FR-download.jar: a jar file for downloading the reviews from the Web and filling the <text> tags in the XML file with the corresponding sentences.
20 |  - ABSA16FR_Restaurants_index.txt: a mapping between review ids and their URLs. The number of sentences in a review is given at the end of each line. 
21 |  - ABSA16FR_Restaurants_guidelines.pdf: guidelines for French restaurant review annotation
22 |  - README.txt: this file
23 | 
24 | 
25 | -------------------------
26 | How to obtain the dataset
27 | -------------------------
28 | 
29 | Requirements: java >= 1.5
30 | 
31 | Open a terminal, move to the directory containing these files, and run:
32 | 
33 | java -jar ABSA16FR-download.jar ABSA16FR_Restaurants_index.txt ABSA16FR_Restaurants_Train.xml
34 | 
35 | This will start the download and filling-in process and will write the output to a file named ABSA16FR_Restaurants_Train-withcontent.xml     
36 | This file is the complete French training dataset.
37 | 
38 | Please note that:
39 |  - as a courtesy to the web site we get the reviews from, there is a short waiting time between two downloads. As a result, completing the process can take a while.
40 |  - after each download, a <Review> tag in the output file is filled in. You can interrupt the process at any moment without losing the downloaded content. Just choose ‘K’ when the "Should we [O]verwrite, [K]eep already downloaded content or [C]ancel? [O/K/C]" question appears the next time you run the process and it will continue at the point it stopped. To overwrite the already downloaded content choose ‘O’.
41 | 
42 | Please report any issues (runtime errors, download errors or non coherent data) to the French ABSA-2016 organizers.
43 | 
44 | 
45 | -------
46 | Credits
47 | -------
48 |  - Reviews are extracted from HTML with HTMLCleaner.
49 |  - Sentence splitting is done with OpenNLP toolkit.
50 | 


--------------------------------------------------------------------------------
/datasets/Aspect-based-Sentiment-Analysis-Dataset.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 17,
  6 |    "metadata": {
  7 |     "collapsed": true
  8 |    },
  9 |    "outputs": [],
 10 |    "source": [
 11 |     "import re\n",
 12 |     "import os\n",
 13 |     "import random\n",
 14 |     "import numpy as np\n",
 15 |     "\n",
 16 |     "from collections import namedtuple"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "markdown",
 21 |    "metadata": {},
 22 |    "source": [
 23 |     "## Customer Review"
 24 |    ]
 25 |   },
 26 |   {
 27 |    "cell_type": "code",
 28 |    "execution_count": 10,
 29 |    "metadata": {},
 30 |    "outputs": [
 31 |     {
 32 |      "name": "stdout",
 33 |      "output_type": "stream",
 34 |      "text": [
 35 |       "Apex AD2600 Progressive-scan DVD player.txt\n",
 36 |       "Canon G3.txt\n",
 37 |       "Creative Labs Nomad Jukebox Zen Xtra 40GB.txt\n",
 38 |       "Nikon coolpix 4300.txt\n",
 39 |       "Nokia 6610.txt\n",
 40 |       "1727\n"
 41 |      ]
 42 |     }
 43 |    ],
 44 |    "source": [
 45 |     "# CR: http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html\n",
 46 |     "\n",
 47 |     "CustomerReview = namedtuple(\"Customer_Review\", \"aspects sentence\")\n",
 48 |     "data = []\n",
 49 |     "\n",
 50 |     "for filename in os.listdir('CR/'):\n",
 51 |     "    if filename=='Readme.txt':\n",
 52 |     "        continue\n",
 53 |     "    print(filename)\n",
 54 |     "    with open('CR/'+filename,'r') as f_input:\n",
 55 |     "        for line in f_input:\n",
 56 |     "            if line.startswith(\"*\"):\n",
 57 |     "                continue\n",
 58 |     "            # select only lines with an opinion over a feature of the product\n",
 59 |     "            m = re.match(r'.*\\[(.[0-9])\\].*##.*',line)\n",
 60 |     "            if m:\n",
 61 |     "                sentence = line.split('##')[1].strip()\n",
 62 |     "                aspects_string = line.split('##')[0]\n",
 63 |     "                aspects = aspects_string.split(\",\")\n",
 64 |     "                data.append(CustomerReview(aspects, sentence))\n",
 65 |     "print(len(data))"
 66 |    ]
 67 |   },
 68 |   {
 69 |    "cell_type": "code",
 70 |    "execution_count": 11,
 71 |    "metadata": {
 72 |     "scrolled": false
 73 |    },
 74 |    "outputs": [
 75 |     {
 76 |      "name": "stdout",
 77 |      "output_type": "stream",
 78 |      "text": [
 79 |       "['player[+2]', 'sound[-1]']\n",
 80 |       "['look[+3]', 'panel button layout[+3]', 'feature[+2]']\n",
 81 |       "['forward[+2]', ' rewind[+2]']\n",
 82 |       "['play[+2]', ' dvd-r[+1]']\n",
 83 |       "['remote control[-1]', ' mp3 filename[-1]']\n",
 84 |       "['play[-2]', ' disney movie[-2]']\n",
 85 |       "['player[+1]', ' look[+3]']\n",
 86 |       "['customer service[-3]', ' technical support[-3]']\n",
 87 |       "['dvd[-2]', ' read[-2]', ' play[-2]']\n",
 88 |       "['picture quality[+2]', 'feature[+2]']\n",
 89 |       "['look[+2]', ' feature[+2]']\n",
 90 |       "['play[-2]', ' dvd[-2]']\n",
 91 |       "['jpeg slideshow[+2]', 'mpeg1[+1]']\n",
 92 |       "['picture[-2]', ' player[-3][p]']\n",
 93 |       "['play[-2]', ' disc[-2]']\n",
 94 |       "['play[-2]', ' dvd[-2]']\n",
 95 |       "['picture[-2]', ' player[-2][p]']\n",
 96 |       "['dvd[-2]', ' play[-2]']\n",
 97 |       "['dvd[-2]', ' play[-2]']\n",
 98 |       "['format[+2][u]', 'progressive scan[+2]']\n",
 99 |       "['play[+2]', ' different file[+2]']\n",
100 |       "['play[+2]', ' mpeg[+2]']\n",
101 |       "['no disc[-2]', ' screen[-3][u]']\n",
102 |       "['player[+3]', ' format[+2]']\n",
103 |       "['progressive scan[+2]', 'remote[+3]']\n",
104 |       "['play[-2]', ' dvd[-2]', ' no disc[-2]']\n",
105 |       "['look[+2][u]', 'panel[+2]']\n",
106 |       "['dvd[-2]', ' picture[-2]', ' play[-2]']\n",
107 |       "['picture[-2]', ' sound[-2]']\n",
108 |       "['dvd[-2]', ' read[-2]', ' no disc[-2]']\n",
109 |       "['progressive scan player[+2]', 'price[+2]']\n",
110 |       "['dvd[+2]', 'sound[+2]']\n",
111 |       "['read[+2]', ' svcd[+2]', ' vbr mp3 cd[+2]']\n",
112 |       "['disc[-1]', ' recognize[-2]']\n",
113 |       "['use[+2]', ' size[+2]', 'look[+2]']\n",
114 |       "['player[-3][u]', 'customer service[-3]']\n",
115 |       "['design[+2]', 'player[+2]']\n",
116 |       "['read[-2]', ' dvd[-2]']\n",
117 |       "['run[+2]', ' dvd[+2]']\n",
118 |       "['sound[+3]', 'picture clarity[+3]']\n",
119 |       "['case[+2]', 'price[+2][u]']\n",
120 |       "['size[+1]', 'machine[+2]']\n",
121 |       "['picture[+2]', 'sound[+2]']\n",
122 |       "['read[-2]', ' disc[-2]']\n",
123 |       "['set up[+1]', 'player[-2]']\n",
124 |       "['feature[+2]', 'reliability[-2]']\n",
125 |       "['unit[+3]', ' format[+2]']\n",
126 |       "['play[-2]', ' dvd[-2]']\n",
127 |       "['look[+2]', 'feature[+2]']\n",
128 |       "['player[-3][p]', ' motor[-2]', ' disc[-2]']\n",
129 |       "['ad-1600[-3]', ' ad-1220[-2]']\n",
130 |       "['size[+2]', 'weight[+2]']\n",
131 |       "['player[-2]', ' no disc[-2]']\n",
132 |       "['freeze[-2]', ' player[-2]']\n",
133 |       "['dvd[-3]', 'cd[-3]', ' no disc[-2]']\n",
134 |       "['size[+1]', 'remote layout[+1]']\n",
135 |       "['noise[-1]', ' external display[-2]']\n",
136 |       "['look[+2]', 'feature[+2]']\n",
137 |       "['dvd[-3]', ' read[-2]']\n",
138 |       "['dvd disc[-3]', ' read[-3]']\n",
139 |       "['read[+2]', ' cd audio disc[+2]']\n",
140 |       "['jpeg[-2]', 'dvd[-2]']\n",
141 |       "['play[+2]', ' cd[+2]', ' mp3[+2]', ' jpeg[+2]']\n",
142 |       "['play[-2]', ' windows media[-2]', ' divx rip[-2]']\n",
143 |       "['read[-2]', ' dvd[-2]', ' vcd[-2]']\n",
144 |       "['play[+2]', ' dvd-r[+2]']\n",
145 |       "['build quality[+2]', 'picture[+2]', 'sound[+2]']\n",
146 |       "['look[+3][u]', ' player[+1][p]']\n",
147 |       "['sound[+1]', 'weight[+1]']\n",
148 |       "['work[+3]', ' player[+2][u]']\n",
149 |       "['play[+2]', ' dvd[+2]']\n",
150 |       "['camera[+2]', ' use[+2]', ' feature[+1]']\n",
151 |       "['picture quality[+3]', ' use[+1]', ' option[+1]']\n",
152 |       "['speed[+2]', 'picture quality[+2]', 'function[+2]']\n",
153 |       "['exposure control[+2]', 'auto setting[+2]']\n",
154 |       "['size[-2][u]', 'weight[-2][u]']\n",
155 |       "['feature[+2]', ' ']\n",
156 |       "['optical zoom[+2]', 'digital zoom[+1]']\n",
157 |       "['lens cap[-1]', 'viewfinder[-1]']\n",
158 |       "['menu[+1]', 'button[+1]']\n",
159 |       "['zoom[+2]', 'lense[+2]']\n",
160 |       "['picture[+3]', ' control[+2]', ' battery[+2]', ' software[+2]']\n",
161 |       "['photo quality[+2]', ' auto mode[+2]']\n",
162 |       "['control[+2]', 'auto mode[+2]']\n",
163 |       "['feel[+2]', 'weight[+2]']\n",
164 |       "['picture[+2]', ' auto mode[+2]']\n",
165 |       "['color[+2]', 'picture[+2]', 'white balance[+2]']\n",
166 |       "['flash photo[-3]', 'noise[-2]']\n",
167 |       "['lag time[-2]', 'flash[-2]']\n",
168 |       "['g3[+3]', 'software[-2]']\n",
169 |       "['manual function[+2]', 'picture quality[+3]']\n",
170 |       "['viewfinder[-1]', 'lcd[+2]', 'camera[+3]']\n",
171 |       "['automode[+2]', 'manual mode[+2]']\n",
172 |       "['photo[+3]', ' use[+3]', ' software[+2]']\n",
173 |       "['focus[-2]', ' shoot[-2]']\n",
174 |       "['control[+2]', 'menu[+2]']\n",
175 |       "['optical zoom[+3]', 'viewfinder[+1]']\n",
176 |       "['design[+2]', ' feature[+2]', 'use[+1][u]', 'battery[+2]']\n",
177 |       "['use[+1][u]', 'design[+1][u]']\n",
178 |       "['use[+3]', 'control[+2]']\n",
179 |       "['option[+1}', ' control[+1]']\n",
180 |       "['macro[+2]', ' auto mode[+2]']\n",
181 |       "['picture[+3]', 'learning curve[+1]']\n",
182 |       "['picture[+2]', 'feel[+1][u]']\n",
183 |       "['lens[+2]', 'optical zoom[+1]']\n",
184 |       "['camera[+3]', 'use[+1]', 'photo[+3]']\n",
185 |       "['quality[+2]', 'lens[+2]']\n",
186 |       "['weight[-1]', 'camera[+2]']\n",
187 |       "['size[+2]', 'weight[+2]', 'navigational system[+2]', ' sound[+3]']\n",
188 |       "['battery[+2]', 'leather case[+2]']\n",
189 |       "['battery life[+2]', 'price[+2]']\n",
190 |       "['sound[+3]', ' headphone[-2]']\n",
191 |       "['battery[+2]', 'construction[+2]', 'size[+2]', 'weight[+2]']\n",
192 |       "['size[+1][u]', 'sound[+2]', 'price[+2]']\n",
193 |       "['player[+]', ' price[+3]']\n",
194 |       "['player[+3]', 'use[+3]', ' software[+2]', 'sound[+3]']\n",
195 |       "['setup[-3]', 'interface[-3]']\n",
196 |       "['earphone[-1]', 'software[-1]']\n",
197 |       "['screen[+2]', 'switch[-2]']\n",
198 |       "['headphone[-1]', ' sound[+3]']\n",
199 |       "['deal[+3]', ' mp3 player[+3]']\n",
200 |       "['player[+3][p]', 'look[+2][u]']\n",
201 |       "['sound[+3]', 'earphone[-3]']\n",
202 |       "['size[-2][u]', 'weight[-2][u]', 'look[-1]', 'folder structure[-1]']\n",
203 |       "['interface[+3][p]', 'software[+3][p]']\n",
204 |       "['sound quality[+2]', 'look[+3]', 'screen[+2]', ' battery[+3]']\n",
205 |       "['size[+1][u]', 'weight[+1]']\n",
206 |       "['product[+2][u]', 'price[+3]']\n",
207 |       "['setup[2]', ' transfer[+2]']\n",
208 |       "['use[+3]', ' playback quality[+3]', 'price[+2]']\n",
209 |       "['size[-1]', 'weight[-1]']\n",
210 |       "['price[+2]', 'battery[+2]']\n",
211 |       "['price[+3]', 'feature[+3]']\n",
212 |       "['battery life[+2]', 'online music service[+2]', 'pc compatibility[+2]']\n",
213 |       "['sound[+2]', 'battery life[+2]', 'price[+2][u]']\n",
214 |       "['style[+1]', 'size[+1][u]', 'control[+1]']\n",
215 |       "['player[+2][p]', 'button[+2]']\n",
216 |       "['player[+3]', 'software[-2]']\n",
217 |       "['look[+3]', 'size[+2][u]', 'weight[+2][u]']\n",
218 |       "['sound quality[+3]', 'size[+2]', 'price[+2]']\n",
219 |       "['sound[+2]', ' price[+2]', ' player[-1]']\n",
220 |       "['button[+3]', 'interface[+3]']\n",
221 |       "['click buttons[+3]', ' display[+2]']\n",
222 |       "['size[+1][u]', 'weight[+1][u]', 'look[+2]', 'display[+2]']\n",
223 |       "['player[+2]', 'sound[+3]']\n",
224 |       "['software[+3]', 'player[+3]']\n",
225 |       "['navigation[+2]', 'scroll[-2]']\n",
226 |       "['price[+2][u]', ' player[+2]']\n",
227 |       "['sound[+3]', 'size[-1][u]']\n",
228 |       "['weight[-1][u]', 'battery life[+1]']\n",
229 |       "['tag[-1]', 'software[-1]']\n",
230 |       "['control[-2]', 'look[-2]']\n",
231 |       "['screen[+3]', 'equilizer[+2]']\n",
232 |       "['size[-1][u]', 'weight[+1]', 'software[-1]']\n",
233 |       "['price[+2]', 'sound[+2]']\n",
234 |       "['weight[-2][u]', 'size[-2][u]']\n",
235 |       "['price[+2]', 'capacity[+2]']\n",
236 |       "['design[+2]', 'interface[+1]']\n",
237 |       "['player[+2]', 'software[-2]', ' rip[-2]', ' transfer[-2]']\n",
238 |       "['battery[+2]', 'sound[+3]']\n",
239 |       "['software[-2]', 'case[-2]']\n",
240 |       "['rip[+2]', ' quality[+2]']\n",
241 |       "['playlist[+1]', 'cd rip[+1]']\n",
242 |       "['sound[+2]', 'volume[+2]']\n",
243 |       "['player[-3]', 'software[-2]']\n",
244 |       "['player[+2]', 'sound quality[+2]']\n",
245 |       "['navigation[+2]', 'sync[+2]']\n",
246 |       "['sound[+2]', 'power output[+2]']\n",
247 |       "['user interface[+2]', 'navigation[+2]']\n",
248 |       "['look[+2]', 'build[+2]']\n",
249 |       "['fm[-1]', 'voice recording[-1]']\n",
250 |       "['sound[+2]', 'interface[+2]', 'battery[+2]', 'software[+2]', ' wake up[+2]', 'play mode[+2]']\n",
251 |       "['plug and play[-2]', 'id3[-2]', 'fm[-1]', 'recording[-1]']\n",
252 |       "['size[-2][u]', 'weight[-2][u]', 'wheel[-3]']\n",
253 |       "['product[+3]', ' sound[+3]', ' use[+2]']\n",
254 |       "['menue[-3]', 'control[-3]']\n",
255 |       "['storage[+3]', 'navigation[+2]', 'playlist[+2]', 'battery[+2]']\n",
256 |       "['battery life[-2]', 'manual[-2]', 'lock up[-2]', 'replacement battery[-2]']\n",
257 |       "['construction[-2]', 'support[-2]', 'scroll wheel[-2]', 'headphone jack[-2]']\n",
258 |       "['sound quality[+2]', 'battery life[+2]', 'price[+2]']\n",
259 |       "['size[+1]', 'value[+1]', 'design[+1]', 'software[-2]']\n",
260 |       "['sound[+2]', 'earbud[-2]']\n",
261 |       "['value[+2]', 'use[+2]']\n",
262 |       "['sound quality[+2]', 'volume[+2]']\n",
263 |       "['sound[+2]', 'battery life[+3]', 'battery[+2]', 'storage[+1]', 'screen[+2]', ' firmware[+1]', 'price[+2]']\n",
264 |       "['warranty[-2]', 'freeze up[-1]', 'navigation wheel[-2]']\n",
265 |       "['size[+3]', 'design[+3]']\n",
266 |       "['sound[+3]', 'battery life[-1]']\n",
267 |       "['control[-2]', 'software[-2]']\n",
268 |       "['picture[+3]', ' macro[+3]']\n",
269 |       "['auto focus[+2]', 'scene mode[+2]']\n",
270 |       "['camera[+2][p]', ' use[+1][u]', ' feature[+2]']\n",
271 |       "['auto mode[+1]', 'scene mode[+2]']\n",
272 |       "['macro mode[+3]', 'picture[+3]']\n",
273 |       "['camera[+3]', ' use[+1]']\n",
274 |       "['camera[+2]', ' picture[+2]', ' close-up shooting[+3]']\n",
275 |       "['camera[+2]', 'customer service[-2]']\n",
276 |       "['picture[+3]', 'delay[+1]']\n",
277 |       "['auto mode[+2]', ' manual mode[+2]', ' scene mode[+2]']\n",
278 |       "['digital zoom[+2]', 'optical zoom[+2]']\n",
279 |       "['touchup[+2]', ' redeye[+2]']\n",
280 |       "['use[+1][u]', 'quality[+2]', 'size[+1]']\n",
281 |       "['picture[+2]', 'ease of use[+2]']\n",
282 |       "['picture[+2]', 'print[+2]']\n",
283 |       "['4mp[+2]', 'optical zoom[+2]']\n",
284 |       "['software[+3]', ' online service[+2]']\n",
285 |       "['camera[+3]', ' feature[+2]']\n",
286 |       "['picture quality[+2]', 'function[+2]']\n",
287 |       "['camera[+3]', 'picture[+2]']\n",
288 |       "['picture[+2]', 'indoor shot[+1]']\n",
289 |       "['autofocus[+1]', 'scene mode[+1]', 'manual mode[+1]']\n",
290 |       "['use[+1]', ' accessory[+2]']\n",
291 |       "['picture quality[+3]', 'feature[+3]']\n",
292 |       "['design[+2]', 'construction[+2]', 'optic[+2]']\n",
293 |       "['picture quality[+3]', 'movie[+1]']\n",
294 |       "['use[+1]', 'feature[+2]', ' camera[+2]']\n",
295 |       "['weight[+2]', ' picture[+2][u]']\n",
296 |       "['photo quality[+3]', 'print[+2]']\n",
297 |       "['size[+2][u]', 'control[+2]']\n",
298 |       "['price[+2][u]', 'learn[+2]', 'image[+3]']\n",
299 |       "['auto mode[+2]', 'scene mode[+2]', 'manual mode[+2]']\n",
300 |       "['camera[+3]', ' print quality[+3]']\n",
301 |       "['closeup mode[+2]', ' battery[+2]']\n",
302 |       "['phone[+3]', ' work[+2]']\n",
303 |       "['speaker phone[+2]', 'radio[+2]', 'infrared[+2]']\n",
304 |       "['sprint plan[-2]', 'sprint customer service[-3]']\n",
305 |       "['size[+1][u]', ' sturdy[+2]']\n",
306 |       "['game[+2]', 'pim[+2]', 'radio[+2]']\n",
307 |       "['sound volume[+1]', ' ear[-2]']\n",
308 |       "['ringtone[+1]', 'background[+1]', 'screensaver[+1]', 'memory[-2]']\n",
309 |       "['battery life[+2]', 'size[+2]', 'volume[-1][u]']\n",
310 |       "['radio[+2]', ' radio[-1]']\n",
311 |       "['phone[+2]', 'warranty[-2]']\n",
312 |       "['sound quality[+2]', ' fm[+1]', ' earpiece[+1]']\n",
313 |       "['size[-2][u]', ' operate[-2][u]', ' button[-2]']\n",
314 |       "['speakerphone[+3]', ' reception[+2]']\n",
315 |       "['speakerphone[+3]', 'radio[+3]']\n",
316 |       "['size[+2]', 'weight[+2]']\n",
317 |       "['bluetooth[-1]', 'high speed internet[-1]']\n",
318 |       "['ringing tone[+3]', 'radio[+2]']\n",
319 |       "['phone[+2]', 'size[+2]']\n",
320 |       "['reception[+3]', 'sound quality[+3]']\n",
321 |       "['size[+2]', 'weight[+2]']\n",
322 |       "['voice dialing[-1]', 'headset jack[-1]']\n",
323 |       "['weight[+1]', ' design[+1]']\n",
324 |       "['picture[-1]', ' ringtone[-1]']\n",
325 |       "['quality[+3]', 'durability[+3]']\n",
326 |       "['screen[+1]', ' ring tone[+1]']\n",
327 |       "['weight[+1]', 'phone[+2]']\n",
328 |       "['phone[+2]', ' menu[+2]']\n",
329 |       "['wallpaper[+1]', 'tune[+2]']\n",
330 |       "['battery life[+2][u]', 'reception[+2]', ' application[+2]']\n",
331 |       "['size[+1][u]', 'game[+1]', 'ringtone[+1]']\n",
332 |       "['phone[+2]', 'gsm[-1]']\n",
333 |       "['phone[+2]', 'screen[+2]', 'ergonomics[+2]', 'size[+1][u]']\n",
334 |       "['size[+1][u]', 'weight[+1]', 'design[+3]']\n",
335 |       "['color screen[+1]', 'ringtone[+1]']\n",
336 |       "['sound[-2]', 'volume[-2]']\n",
337 |       "['phone[+2]', ' feature[+2]']\n",
338 |       "['weight[+2][u]', 'battery life[+2]']\n",
339 |       "['tone[+1]', 'wallpaper[+1]', 'application[+1]']\n",
340 |       "['message[+1]', 'picture sharing[+1]']\n",
341 |       "['size[+2]', 'feature[+2]']\n",
342 |       "['phone book[+2]', 'speakerphone[+2]']\n",
343 |       "['battery life[+2]', 'radio[+2]', 'signal[+2]', 'speakerphone[+2]', 'application[+1]']\n",
344 |       "['size[+1]', 'speakerphone[+1]', 'plan[+1]']\n",
345 |       "['phone[+3]', ' look[+2]']\n",
346 |       "['phone[+2]', 'use[+1]', 'network[+2]']\n",
347 |       "['service[+1]', 'ringtone[+2]']\n",
348 |       "['size[+2][u]', ' look{+1]']\n",
349 |       "['screen[+2]', 'sound[+2]']\n",
350 |       "['voice quality[+3]', 'reception[+2]']\n",
351 |       "['screen[+2]', 'command[+2]']\n",
352 |       "['resolution[+2]', ' color[+2]']\n",
353 |       "['gprs[-1]', 't-zone[+2]']\n",
354 |       "['menu[+2]', ' feature[+2]']\n",
355 |       "['phone[+2]', ' feature[+2]']\n",
356 |       "['design[+2]', 'screen[+2]']\n",
357 |       "['weight[+2]', 'signal[+2]']\n"
358 |      ]
359 |     }
360 |    ],
361 |    "source": [
362 |     "for msg in data:\n",
363 |     "    if len(msg.aspects) > 1:\n",
364 |     "        print(msg.aspects)"
365 |    ]
366 |   }
367 |  ],
368 |  "metadata": {
369 |   "kernelspec": {
370 |    "display_name": "Python 3",
371 |    "language": "python",
372 |    "name": "python3"
373 |   },
374 |   "language_info": {
375 |    "codemirror_mode": {
376 |     "name": "ipython",
377 |     "version": 3
378 |    },
379 |    "file_extension": ".py",
380 |    "mimetype": "text/x-python",
381 |    "name": "python",
382 |    "nbconvert_exporter": "python",
383 |    "pygments_lexer": "ipython3",
384 |    "version": "3.6.2"
385 |   }
386 |  },
387 |  "nbformat": 4,
388 |  "nbformat_minor": 2
389 | }
390 | 


--------------------------------------------------------------------------------
/datasets/CR/Readme.txt:
--------------------------------------------------------------------------------
 1 | *****************************************************************************
 2 | * Annotated by: Minqing Hu and Bing Liu, 2004.
 3 | *		Department of Computer Sicence
 4 | *               University of Illinois at Chicago
 5 | *
 6 | * Contact: Bing Liu, liub@cs.uic.edu
 7 | *          http://www.cs.uic.edu/~liub
 8 | *****************************************************************************
 9 | 
10 |                             Readme file
11 | 
12 | This folder contains annotated customer reviews of 5 products.
13 | 
14 | 	1. digital camera: Canon G3
15 | 	2. digital camera: Nikon coolpix 4300
16 | 	3. celluar phone:  Nokia 6610
17 | 	4. mp3 player:     Creative Labs Nomad Jukebox Zen Xtra 40GB
18 | 	5. dvd player:     Apex AD2600 Progressive-scan DVD player
19 | 
20 | All the reviews were from amazon.com. They were used in the following
21 | two papers:
22 | 
23 | Minqing Hu and Bing Liu. "Mining and summarizing customer reviews".
24 |    Proceedings of the ACM SIGKDD International Conference on
25 |    Knowledge Discovery & Data Mining (KDD-04), 2004.
26 | 
27 | Minqing Hu and Bing Liu. "Mining Opinion Features in Customer
28 |    Reviews." Proceedings of Nineteeth National Conference on
29 |    Artificial Intellgience (AAAI-2004), 2004.
30 | 
31 | Our project homepage: http://www.cs.uic.edu/~liub/FBS/FBS.html
32 | 
33 | 
34 | 
35 | Symbols used in the annotated reviews:
36 | 
37 |   [t]: the title of the review: Each [t] tag starts a review.
38 |        We did not use the title information in our papers.
39 |   xxxx[+|-n]: xxxx is a product feature.
40 |       [+n]: Positive opinion, n is the opinion strength: 3 strongest,
41 |             and 1 weakest. Note that the strength is quite subjective.
42 |             You may want ignore it, but only considering + and -
43 |       [-n]: Negative opinion
44 |   ##  : start of each sentence. Each line is a sentence.
45 |   [u] : feature not appeared in the sentence.
46 |   [p] : feature not appeared in the sentence. Pronoun resolution is needed.
47 |   [s] : suggestion or recommendation.
48 |   [cc]: comparison with a competing product from a different brand.
49 |   [cs]: comparison with a competing product from the same brand.
50 | 
51 | 
52 | Finally, tagging is a hard task. Errors and inconsistencies are inevitable.
53 | If you see some problems, please let us know. We also welcome your comments.
54 | 
55 | 
56 | 


--------------------------------------------------------------------------------
/libraries/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/libraries/__init__.py


--------------------------------------------------------------------------------
/libraries/baselines.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | '''
  4 | **Baseline methods for the 4th task of SemEval 2014**
  5 | 
  6 | Run a task from the terminal::
  7 | >>> python baselines.py -t file -m taskNum
  8 | 
  9 | or, import within python. E.g., for Aspect Term Extraction::
 10 | from baselines import *
 11 | corpus = Corpus(ET.parse(trainfile).getroot().findall('sentence'))
 12 | unseen = Corpus(ET.parse(testfile).getroot().findall('sentence'))
 13 | b1 = BaselineAspectExtractor(corpus)
 14 | predicted = b1.tag(unseen.corpus)
 15 | corpus.write_out('%s--test.predicted-aspect.xml'%domain_name, predicted, short=False)
 16 | 
 17 | Similarly, for Aspect Category Detection, Aspect Term Polarity Estimation, and Aspect Category Polarity Estimation.
 18 | '''
 19 | 
 20 | __author__ = "J. Pavlopoulos"
 21 | __credits__ = "J. Pavlopoulos, D. Galanis, I. Androutsopoulos"
 22 | __license__ = "GPL"
 23 | __version__ = "1.0.1"
 24 | __maintainer__ = "John Pavlopoulos"
 25 | __email__ = "annis@aueb.gr"
 26 | 
 27 | try:
 28 |     import xml.etree.ElementTree as ET, getopt, logging, sys, random, re, copy
 29 |     from xml.sax.saxutils import escape
 30 | except:
 31 |     sys.exit('Some package is missing... Perhaps <re>?')
 32 | 
 33 | logging.basicConfig(level=logging.INFO)
 34 | logger = logging.getLogger(__name__)
 35 | 
 36 | # Stopwords, imported from NLTK (v 2.0.4)
 37 | stopwords = set(
 38 |     ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves',
 39 |      'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their',
 40 |      'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was',
 41 |      'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the',
 42 |      'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against',
 43 |      'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in',
 44 |      'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why',
 45 |      'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only',
 46 |      'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now'])
 47 | 
 48 | 
 49 | def fd(counts):
 50 |     '''Given a list of occurrences (e.g., [1,1,1,2]), return a dictionary of frequencies (e.g., {1:3, 2:1}.)'''
 51 |     d = {}
 52 |     for i in counts: d[i] = d[i] + 1 if i in d else 1
 53 |     return d
 54 | 
 55 | 
 56 | freq_rank = lambda d: sorted(d, key=d.get, reverse=True)
 57 | '''Given a map, return ranked the keys based on their values.'''
 58 | 
 59 | 
 60 | def fd2(counts):
 61 |     '''Given a list of 2-uplets (e.g., [(a,pos), (a,pos), (a,neg), ...]), form a dict of frequencies of specific items (e.g., {a:{pos:2, neg:1}, ...}).'''
 62 |     d = {}
 63 |     for i in counts:
 64 |         # If the first element of the 2-uplet is not in the map, add it.
 65 |         if i[0] in d:
 66 |             if i[1] in d[i[0]]:
 67 |                 d[i[0]][i[1]] += 1
 68 |             else:
 69 |                 d[i[0]][i[1]] = 1
 70 |         else:
 71 |             d[i[0]] = {i[1]: 1}
 72 |     return d
 73 | 
 74 | 
 75 | def validate(filename):
 76 |     '''Validate an XML file, w.r.t. the format given in the 4th task of **SemEval '14**.'''
 77 |     elements = ET.parse(filename).getroot().findall('sentence')
 78 |     aspects = []
 79 |     for e in elements:
 80 |         for eterms in e.findall('aspectTerms'):
 81 |             if eterms is not None:
 82 |                 for a in eterms.findall('aspectTerm'):
 83 |                     aspects.append(Aspect('', '', []).create(a).term)
 84 |     return elements, aspects
 85 | 
 86 | 
 87 | fix = lambda text: escape(text.encode('utf8')).replace('\"','&quot;')
 88 | '''Simple fix for writing out text.'''
 89 | 
 90 | # Dice coefficient
 91 | def dice(t1, t2, stopwords=[]):
 92 |     tokenize = lambda t: set([w for w in t.split() if (w not in stopwords)])
 93 |     t1, t2 = tokenize(t1), tokenize(t2)
 94 |     return 2. * len(t1.intersection(t2)) / (len(t1) + len(t2))
 95 | 
 96 | 
 97 | class Category:
 98 |     '''Category objects contain the term and polarity (i.e., pos, neg, neu, conflict) of the category (e.g., food, price, etc.) of a sentence.'''
 99 | 
100 |     def __init__(self, term='', polarity=''):
101 |         self.term = term
102 |         self.polarity = polarity
103 | 
104 |     def create(self, element):
105 |         self.term = element.attrib['category']
106 |         self.polarity = element.attrib['polarity']
107 |         return self
108 | 
109 |     def update(self, term='', polarity=''):
110 |         self.term = term
111 |         self.polarity = polarity
112 | 
113 | 
114 | class Aspect:
115 |     '''Aspect objects contain the term (e.g., battery life) and polarity (i.e., pos, neg, neu, conflict) of an aspect.'''
116 | 
117 |     def __init__(self, term, polarity, offsets):
118 |         self.term = term
119 |         self.polarity = polarity
120 |         self.offsets = offsets
121 | 
122 |     def create(self, element):
123 |         self.term = element.attrib['term']
124 |         self.polarity = element.attrib['polarity']
125 |         self.offsets = {'from': str(element.attrib['from']), 'to': str(element.attrib['to'])}
126 |         return self
127 | 
128 |     def update(self, term='', polarity=''):
129 |         self.term = term
130 |         self.polarity = polarity
131 | 
132 | 
133 | class Instance:
134 |     '''An instance is a sentence, modeled out of XML (pre-specified format, based on the 4th task of SemEval 2014).
135 |     It contains the text, the aspect terms, and any aspect categories.'''
136 | 
137 |     def __init__(self, element):
138 |         self.text = element.find('text').text
139 |         self.id = element.get('id')
140 |         self.aspect_terms = [Aspect('', '', offsets={'from': '', 'to': ''}).create(e) for es in
141 |                              element.findall('aspectTerms') for e in es if
142 |                              es is not None]
143 |         self.aspect_categories = [Category(term='', polarity='').create(e) for es in element.findall('aspectCategories')
144 |                                   for e in es if
145 |                                   es is not None]
146 | 
147 |     def get_aspect_terms(self):
148 |         return [a.term.lower() for a in self.aspect_terms]
149 | 
150 |     def get_aspect_categories(self):
151 |         return [c.term.lower() for c in self.aspect_categories]
152 | 
153 |     def add_aspect_term(self, term, polarity='', offsets={'from': '', 'to': ''}):
154 |         a = Aspect(term, polarity, offsets)
155 |         self.aspect_terms.append(a)
156 | 
157 |     def add_aspect_category(self, term, polarity=''):
158 |         c = Category(term, polarity)
159 |         self.aspect_categories.append(c)
160 | 
161 | 
162 | class Corpus:
163 |     '''A corpus contains instances, and is useful for training algorithms or splitting to train/test files.'''
164 | 
165 |     def __init__(self, elements):
166 |         self.corpus = [Instance(e) for e in elements]
167 |         self.size = len(self.corpus)
168 |         self.aspect_terms_fd = fd([a for i in self.corpus for a in i.get_aspect_terms()])
169 |         self.top_aspect_terms = freq_rank(self.aspect_terms_fd)
170 |         self.texts = [t.text for t in self.corpus]
171 | 
172 |     def echo(self):
173 |         print '%d instances\n%d distinct aspect terms' % (len(self.corpus), len(self.top_aspect_terms))
174 |         print 'Top aspect terms: %s' % (', '.join(self.top_aspect_terms[:10]))
175 | 
176 |     def clean_tags(self):
177 |         for i in range(len(self.corpus)):
178 |             self.corpus[i].aspect_terms = []
179 | 
180 |     def split(self, threshold=0.8, shuffle=False):
181 |         '''Split to train/test, based on a threshold. Turn on shuffling for randomizing the elements beforehand.'''
182 |         clone = copy.deepcopy(self.corpus)
183 |         if shuffle: random.shuffle(clone)
184 |         train = clone[:int(threshold * self.size)]
185 |         test = clone[int(threshold * self.size):]
186 |         return train, test
187 | 
188 |     def write_out(self, filename, instances, short=True):
189 |         with open(filename, 'w') as o:
190 |             o.write('<sentences>\n')
191 |             for i in instances:
192 |                 o.write('\t<sentence id="%s">\n' % (i.id))
193 |                 o.write('\t\t<text>%s</text>\n' % fix(i.text))
194 |                 o.write('\t\t<aspectTerms>\n')
195 |                 if not short:
196 |                     for a in i.aspect_terms:
197 |                         o.write('\t\t\t<aspectTerm term="%s" polarity="%s" from="%s" to="%s"/>\n' % (
198 |                             fix(a.term), a.polarity, a.offsets['from'], a.offsets['to']))
199 |                 o.write('\t\t</aspectTerms>\n')
200 |                 o.write('\t\t<aspectCategories>\n')
201 |                 if not short:
202 |                     for c in i.aspect_categories:
203 |                         o.write('\t\t\t<aspectCategory category="%s" polarity="%s"/>\n' % (fix(c.term), c.polarity))
204 |                 o.write('\t\t</aspectCategories>\n')
205 |                 o.write('\t</sentence>\n')
206 |             o.write('</sentences>')
207 | 
208 | 
209 | class BaselineAspectExtractor():
210 |     '''Extract the aspects from a text.
211 |     Use the aspect terms from the train data, to tag any new (i.e., unseen) instances.'''
212 | 
213 |     def __init__(self, corpus):
214 |         self.candidates = [a.lower() for a in corpus.top_aspect_terms]
215 | 
216 |     def find_offsets_quickly(self, term, text):
217 |         start = 0
218 |         while True:
219 |             start = text.find(term, start)
220 |             if start == -1: return
221 |             yield start
222 |             start += len(term)
223 | 
224 |     def find_offsets(self, term, text):
225 |         offsets = [(i, i + len(term)) for i in list(self.find_offsets_quickly(term, text))]
226 |         return offsets
227 | 
228 |     def tag(self, test_instances):
229 |         clones = []
230 |         for i in test_instances:
231 |             i_ = copy.deepcopy(i)
232 |             i_.aspect_terms = []
233 |             for c in set(self.candidates):
234 |                 if c in i_.text:
235 |                     offsets = self.find_offsets(' ' + c + ' ', i.text)
236 |                     for start, end in offsets: i_.add_aspect_term(term=c,
237 |                                                                   offsets={'from': str(start + 1), 'to': str(end - 1)})
238 |             clones.append(i_)
239 |         return clones
240 | 
241 | 
242 | class BaselineCategoryDetector():
243 |     '''Detect the category (or categories) of an instance.
244 |     For any new (i.e., unseen) instance, fetch the k-closest instances from the train data, and vote for the number of categories and the categories themselves.'''
245 | 
246 |     def __init__(self, corpus):
247 |         self.corpus = corpus
248 | 
249 |     # Fetch k-neighbors (i.e., similar texts), using the Dice coefficient, and vote for #categories and category values
250 |     def fetch_k_nn(self, text, k=5, multi=False):
251 |         neighbors = dict([(i, dice(text, n, stopwords)) for i, n in enumerate(self.corpus.texts)])
252 |         ranked = freq_rank(neighbors)
253 |         topk = [self.corpus.corpus[i] for i in ranked[:k]]
254 |         num_of_cats = 1 if not multi else int(sum([len(i.aspect_categories) for i in topk]) / float(k))
255 |         cats = freq_rank(fd([c for i in topk for c in i.get_aspect_categories()]))
256 |         categories = [cats[i] for i in range(num_of_cats)]
257 |         return categories
258 | 
259 |     def tag(self, test_instances):
260 |         clones = []
261 |         for i in test_instances:
262 |             i_ = copy.deepcopy(i)
263 |             i_.aspect_categories = [Category(term=c) for c in self.fetch_k_nn(i.text)]
264 |             clones.append(i_)
265 |         return clones
266 | 
267 | 
268 | class BaselineStageI():
269 |     '''Stage I: Aspect Term Extraction and Aspect Category Detection.'''
270 | 
271 |     def __init__(self, b1, b2):
272 |         self.b1 = b1
273 |         self.b2 = b2
274 | 
275 |     def tag(self, test_instances):
276 |         clones = []
277 |         for i in test_instances:
278 |             i_ = copy.deepcopy(i)
279 |             i_.aspect_categories, i_.aspect_terms = [], []
280 |             for a in set(self.b1.candidates):
281 |                 offsets = self.b1.find_offsets(' ' + a + ' ', i_.text)
282 |                 for start, end in offsets:
283 |                     i_.add_aspect_term(term=a, offsets={'from': str(start + 1), 'to': str(end - 1)})
284 |             for c in self.b2.fetch_k_nn(i_.text):
285 |                 i_.aspect_categories.append(Category(term=c))
286 |             clones.append(i_)
287 |         return clones
288 | 
289 | 
290 | class BaselineAspectPolarityEstimator():
291 |     '''Estimate the polarity of an instance's aspects.
292 |     This is a majority baseline.
293 |     Form the <aspect,polarity> tuples from the train data, and measure frequencies.
294 |     Then, given a new instance, vote for the polarities of the aspect terms (given).'''
295 | 
296 |     def __init__(self, corpus):
297 |         self.corpus = corpus
298 |         self.fd = fd2([(a.term, a.polarity) for i in self.corpus.corpus for a in i.aspect_terms])
299 |         self.major = freq_rank(fd([a.polarity for i in self.corpus.corpus for a in i.aspect_terms]))[0]
300 | 
301 |     # Fetch k-neighbors (i.e., similar texts), using the Dice coefficient, and vote for aspect's polarity
302 |     def k_nn(self, text, aspect, k=5):
303 |         neighbors = dict([(i, dice(text, next.text, stopwords)) for i, next in enumerate(self.corpus.corpus) if
304 |                           aspect in next.get_aspect_terms()])
305 |         ranked = freq_rank(neighbors)
306 |         topk = [self.corpus.corpus[i] for i in ranked[:k]]
307 |         return freq_rank(fd([a.polarity for i in topk for a in i.aspect_terms]))
308 | 
309 |     def majority(self, text, aspect):
310 |         if aspect not in self.fd:
311 |             return self.major
312 |         else:
313 |             polarities = self.k_nn(text, aspect, k=5)
314 |             if polarities:
315 |                 return polarities[0]
316 |             else:
317 |                 return self.major
318 | 
319 |     def tag(self, test_instances):
320 |         clones = []
321 |         for i in test_instances:
322 |             i_ = copy.deepcopy(i)
323 |             for j in i_.aspect_terms: j.polarity = self.majority(i_.text, j.term)
324 |             clones.append(i_)
325 |         return clones
326 | 
327 | 
328 | class BaselineAspectCategoryPolarityEstimator():
329 |     '''Estimate the polarity of an instance's category (or categories).
330 |     This is a majority baseline.
331 |     Form the <category,polarity> tuples from the train data, and measure frequencies.
332 |     Then, given a new instance, vote for the polarities of the categories (given).'''
333 | 
334 |     def __init__(self, corpus):
335 |         self.corpus = corpus
336 |         self.fd = fd2([(c.term, c.polarity) for i in self.corpus.corpus for c in i.aspect_categories])
337 | 
338 |     # Fetch k-neighbors (i.e., similar texts), using the Dice coefficient, and vote for aspect's polarity
339 |     def k_nn(self, text, k=5):
340 |         neighbors = dict([(i, dice(text, next.text, stopwords)) for i, next in enumerate(self.corpus.corpus)])
341 |         ranked = freq_rank(neighbors)
342 |         topk = [self.corpus.corpus[i] for i in ranked[:k]]
343 |         return freq_rank(fd([c.polarity for i in topk for c in i.aspect_categories]))
344 | 
345 |     def majority(self, text):
346 |         return self.k_nn(text)[0]
347 | 
348 |     def tag(self, test_instances):
349 |         clones = []
350 |         for i in test_instances:
351 |             i_ = copy.deepcopy(i)
352 |             for j in i_.aspect_categories:
353 |                 j.polarity = self.majority(i_.text)
354 |             clones.append(i_)
355 |         return clones
356 | 
357 | 
358 | class BaselineStageII():
359 |     '''Stage II: Aspect Term and Aspect Category Polarity Estimation.
360 |     Terms and categories are assumed given.'''
361 | 
362 |     # Baselines 3 and 4 are assumed given.
363 |     def __init__(self, b3, b4):
364 |         self.b3 = b3
365 |         self.b4 = b4
366 | 
367 |     # Tag sentences with aspects and categories with their polarities
368 |     def tag(self, test_instances):
369 |         clones = []
370 |         for i in test_instances:
371 |             i_ = copy.deepcopy(i)
372 |             for j in i_.aspect_terms: j.polarity=self.b3.majority(i_.text, j.term)
373 |             for j in i_.aspect_categories: j.polarity = self.b4.majority(i_.text)
374 |             clones.append(i_)
375 |         return clones
376 | 
377 | 
378 | class Evaluate():
379 |     '''Evaluation methods, per subtask of the 4th task of SemEval '14.'''
380 | 
381 |     def __init__(self, correct, predicted):
382 |         self.size = len(correct)
383 |         self.correct = correct
384 |         self.predicted = predicted
385 | 
386 |     # Aspect Extraction (no offsets considered)
387 |     def aspect_extraction(self, b=1):
388 |         common, relevant, retrieved = 0., 0., 0.
389 |         for i in range(self.size):
390 |             cor = [a.offsets for a in self.correct[i].aspect_terms]
391 |             pre = [a.offsets for a in self.predicted[i].aspect_terms]
392 |             common += len([a for a in pre if a in cor])
393 |             retrieved += len(pre)
394 |             relevant += len(cor)
395 |         p = common / retrieved if retrieved > 0 else 0.
396 |         r = common / relevant
397 |         f1 = (1 + (b ** 2)) * p * r / ((p * b ** 2) + r) if p > 0 and r > 0 else 0.
398 |         return p, r, f1, common, retrieved, relevant
399 | 
400 |     # Aspect Category Detection
401 |     def category_detection(self, b=1):
402 |         common, relevant, retrieved = 0., 0., 0.
403 |         for i in range(self.size):
404 |             cor = self.correct[i].get_aspect_categories()
405 |             # Use set to avoid duplicates (i.e., two times the same category)
406 |             pre = set(self.predicted[i].get_aspect_categories())
407 |             common += len([c for c in pre if c in cor])
408 |             retrieved += len(pre)
409 |             relevant += len(cor)
410 |         p = common / retrieved if retrieved > 0 else 0.
411 |         r = common / relevant
412 |         f1 = (1 + b ** 2) * p * r / ((p * b ** 2) + r) if p > 0 and r > 0 else 0.
413 |         return p, r, f1, common, retrieved, relevant
414 | 
415 |     def aspect_polarity_estimation(self, b=1):
416 |         common, relevant, retrieved = 0., 0., 0.
417 |         for i in range(self.size):
418 |             cor = [a.polarity for a in self.correct[i].aspect_terms]
419 |             pre = [a.polarity for a in self.predicted[i].aspect_terms]
420 |             common += sum([1 for j in range(len(pre)) if pre[j] == cor[j]])
421 |             retrieved += len(pre)
422 |         acc = common / retrieved
423 |         return acc, common, retrieved
424 | 
425 |     def aspect_category_polarity_estimation(self, b=1):
426 |         common, relevant, retrieved = 0., 0., 0.
427 |         for i in range(self.size):
428 |             cor = [a.polarity for a in self.correct[i].aspect_categories]
429 |             pre = [a.polarity for a in self.predicted[i].aspect_categories]
430 |             common += sum([1 for j in range(len(pre)) if pre[j] == cor[j]])
431 |             retrieved += len(pre)
432 |         acc = common / retrieved
433 |         return acc, common, retrieved
434 | 
435 | 
436 | def main(argv=None):
437 |     # Parse the input
438 |     opts, args = getopt.getopt(argv, "hg:dt:om:k:", ["help", "grammar", "train=", "task=", "test="])
439 |     trainfile, testfile, task = None, None, 1
440 |     use_msg = 'Use as:\n">>> python baselines.py --train file.xml --task 1|2|3|4(|5|6)"\n\nThis will parse a train set, examine whether is valid, split to train and test (80/20 %), write the new train, test and unseen test files, perform ABSA for task 1, 2, 3, or 4 (5 and 6 perform jointly tasks 1 & 2, and 3 & 4, respectively), and write out a file with the predictions.'
441 |     if len(opts) == 0: sys.exit(use_msg)
442 |     for opt, arg in opts:
443 |         if opt in ("-h", "--help"):
444 |             sys.exit(use_msg)
445 |         elif opt in ('-t', "--train"):
446 |             trainfile = arg
447 |         elif opt in ('-m', "--task"):
448 |             task = int(arg)
449 |         elif opt in ('-k', "--test"):
450 |             testfile = arg
451 | 
452 |     # Examine if the file is in proper XML format for further use.
453 |     print 'Validating the file...'
454 |     try:
455 |         elements, aspects = validate(trainfile)
456 |         print 'PASSED! This corpus has: %d sentences, %d aspect term occurrences, and %d distinct aspect terms.' % (
457 |             len(elements), len(aspects), len(list(set(aspects))))
458 |     except:
459 |         print "Unexpected error:", sys.exc_info()[0]
460 |         raise
461 | 
462 |     # Get the corpus and split into train/test.
463 |     corpus = Corpus(ET.parse(trainfile).getroot().findall('sentence'))
464 |     domain_name = 'laptops' if 'laptop' in trainfile else ('restaurants' if 'restau' in trainfile else 'absa')
465 |     if testfile:
466 |         traincorpus = corpus
467 |         seen = Corpus(ET.parse(testfile).getroot().findall('sentence'))
468 |     else:
469 |         train, seen = corpus.split()
470 |         # Store train/test files and clean up the test files (no aspect terms or categories are present); then, parse back the files back.
471 |         corpus.write_out('%s--train.xml' % domain_name, train, short=False)
472 |         traincorpus = Corpus(ET.parse('%s--train.xml' % domain_name).getroot().findall('sentence'))
473 |         corpus.write_out('%s--test.gold.xml' % domain_name, seen, short=False)
474 |         seen = Corpus(ET.parse('%s--test.gold.xml' % domain_name).getroot().findall('sentence'))
475 | 
476 |     corpus.write_out('%s--test.xml' % domain_name, seen.corpus)
477 |     unseen = Corpus(ET.parse('%s--test.xml' % domain_name).getroot().findall('sentence'))
478 | 
479 |     # Perform the tasks, asked by the user and print the files with the predicted responses.
480 |     if task == 1:
481 |         b1 = BaselineAspectExtractor(traincorpus)
482 |         print 'Extracting aspect terms...'
483 |         predicted = b1.tag(unseen.corpus)
484 |         corpus.write_out('%s--test.predicted-aspect.xml' % domain_name, predicted, short=False)
485 |         print 'P = %f -- R = %f -- F1 = %f (#correct: %d, #retrieved: %d, #relevant: %d)' % Evaluate(seen.corpus,
486 |                                                                                                      predicted).aspect_extraction()
487 |     if task == 2:
488 |         print 'Detecting aspect categories...'
489 |         b2 = BaselineCategoryDetector(traincorpus)
490 |         predicted = b2.tag(unseen.corpus)
491 |         print 'P = %f -- R = %f -- F1 = %f (#correct: %d, #retrieved: %d, #relevant: %d)' % Evaluate(seen.corpus,
492 |                                                                                                      predicted).category_detection()
493 |         corpus.write_out('%s--test.predicted-category.xml' % domain_name, predicted, short=False)
494 |     if task == 3:
495 |         print 'Estimating aspect term polarity...'
496 |         b3 = BaselineAspectPolarityEstimator(traincorpus)
497 |         predicted = b3.tag(seen.corpus)
498 |         corpus.write_out('%s--test.predicted-aspectPolar.xml' % domain_name, predicted, short=False)
499 |         print 'Accuracy = %f, #Correct/#All: %d/%d' % Evaluate(seen.corpus, predicted).aspect_polarity_estimation()
500 |     if task == 4:
501 |         print 'Estimating aspect category polarity...'
502 |         b4 = BaselineAspectCategoryPolarityEstimator(traincorpus)
503 |         predicted = b4.tag(seen.corpus)
504 |         print 'Accuracy = %f, #Correct/#All: %d/%d' % Evaluate(seen.corpus,
505 |                                                                predicted).aspect_category_polarity_estimation()
506 |         corpus.write_out('%s--test.predicted-categoryPolar.xml' % domain_name, predicted, short=False)
507 |         # Perform tasks 1 & 2, and output an XML file with the predictions
508 |     if task == 5:
509 |         print 'Task 1 & 2: Aspect Term and Category Detection'
510 |         b1 = BaselineAspectExtractor(traincorpus)
511 |         b2 = BaselineCategoryDetector(traincorpus)
512 |         b12 = BaselineStageI(b1, b2)
513 |         predicted = b12.tag(unseen.corpus)
514 |         corpus.write_out('%s--test.predicted-stageI.xml' % domain_name, predicted, short=False)
515 |         print 'Task 1: P = %f -- R = %f -- F1 = %f (#correct: %d, #retrieved: %d, #relevant: %d)' % Evaluate(
516 |             seen.corpus, predicted).aspect_extraction()
517 |         print 'Task 2: P = %f -- R = %f -- F1 = %f (#correct: %d, #retrieved: %d, #relevant: %d)' % Evaluate(
518 |             seen.corpus, predicted).category_detection()
519 |         # Perform tasks 3 & 4, and output an XML file with the predictions
520 |     if task == 6:
521 |         print 'Aspect Term and Category Polarity Estimation'
522 |         b3 = BaselineAspectPolarityEstimator(traincorpus)
523 |         b4 = BaselineAspectCategoryPolarityEstimator(traincorpus)
524 |         b34 = BaselineStageII(b3, b4)
525 |         predicted = b34.tag(seen.corpus)
526 |         corpus.write_out('%s--test.predicted-stageII.xml' % domain_name, predicted, short=False)
527 |         print 'Task 3: Accuracy = %f (#Correct/#All: %d/%d)' % Evaluate(seen.corpus,
528 |                                                                         predicted).aspect_polarity_estimation()
529 |         print 'Task 4: Accuracy = %f (#Correct/#All: %d/%d)' % Evaluate(seen.corpus,
530 |                                                                         predicted).aspect_category_polarity_estimation()
531 | 
532 | 
533 | if __name__ == "__main__": main(sys.argv[1:])


--------------------------------------------------------------------------------
/run.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | # -*- coding: utf-8 -*-
 3 | 
 4 | import xml.etree.ElementTree as ET
 5 | from libraries.baselines import Corpus
 6 | 
 7 | from stanford_corenlp_python import jsonrpc
 8 | from simplejson import loads
 9 | 
10 | 
11 | def process_semeval_2015():
12 |     # the train set is composed by train and trial data set
13 |     corpora = dict()
14 |     corpora['laptop'] = dict()
15 |     train_filename = 'datasets/ABSA-SemEval2015/ABSA-15_Restaurants_Train_Final.xml'
16 |     trial_filename = 'datasets/ABSA-SemEval2015/absa-2015_restaurants_trial.xml'
17 | 
18 |     reviews = ET.parse(train_filename).getroot().findall('Review') + \
19 |               ET.parse(trial_filename).getroot().findall('Review')
20 | 
21 |     sentences = []
22 |     for r in reviews:
23 |         sentences += r.find('sentences').getchildren()
24 | 
25 |     # TODO: parser is not loading aspect words and opinioss
26 |     corpus = Corpus(sentences)
27 |     corpus.size()
28 | 
29 | 
30 | def process_semeval_2014():
31 |     # the train set is composed by train and trial dataset
32 |     corpora = dict()
33 |     corpora['restaurants'] = dict()
34 |     train_filename = 'datasets/ABSA-SemEval2014/Restaurants_Train_v2.xml'
35 |     trial_filename = 'datasets/ABSA-SemEval2014/restaurants-trial.xml'
36 |     corpus = Corpus(ET.parse(train_filename).getroot().findall('sentence') +
37 |                     ET.parse(trial_filename).getroot().findall('sentence'))
38 |     corpora['restaurants']['trainset'] = dict()
39 |     corpora['restaurants']['trainset']['corpus'] = corpus
40 |     return corpora
41 | 
42 | 
43 | def main():
44 |     # TODO: start corenlp server "python corenlp.py"
45 | 
46 |     # interface for Stanford-Core-NLP server
47 |     server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),
48 |                                  jsonrpc.TransportTcpIp(addr=("127.0.0.1",
49 |                                                               8080)))
50 | 
51 |     result = loads(server.parse("Hello world.  It is so beautiful"))
52 |     print "Result", result
53 | 
54 |     corpora = process_semeval_2014()
55 |     train_restaurants = corpora['restaurants']['trainset']['corpus']
56 | 
57 |     for s in train_restaurants.corpus:
58 |         print s.text
59 | 
60 |     """
61 |     print train_restaurants.size
62 |     print train_restaurants.aspect_terms_fd
63 |     """
64 | 
65 | if __name__ == '__main__':
66 |     main()
67 | 


--------------------------------------------------------------------------------
/stanford_corenlp_python/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU GENERAL PUBLIC LICENSE
  2 |                        Version 2, June 1991
  3 | 
  4 |  Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
  5 |  51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
  6 |  Everyone is permitted to copy and distribute verbatim copies
  7 |  of this license document, but changing it is not allowed.
  8 | 
  9 |                             Preamble
 10 | 
 11 |   The licenses for most software are designed to take away your
 12 | freedom to share and change it.  By contrast, the GNU General Public
 13 | License is intended to guarantee your freedom to share and change free
 14 | software--to make sure the software is free for all its users.  This
 15 | General Public License applies to most of the Free Software
 16 | Foundation's software and to any other program whose authors commit to
 17 | using it.  (Some other Free Software Foundation software is covered by
 18 | the GNU Lesser General Public License instead.)  You can apply it to
 19 | your programs, too.
 20 | 
 21 |   When we speak of free software, we are referring to freedom, not
 22 | price.  Our General Public Licenses are designed to make sure that you
 23 | have the freedom to distribute copies of free software (and charge for
 24 | this service if you wish), that you receive source code or can get it
 25 | if you want it, that you can change the software or use pieces of it
 26 | in new free programs; and that you know you can do these things.
 27 | 
 28 |   To protect your rights, we need to make restrictions that forbid
 29 | anyone to deny you these rights or to ask you to surrender the rights.
 30 | These restrictions translate to certain responsibilities for you if you
 31 | distribute copies of the software, or if you modify it.
 32 | 
 33 |   For example, if you distribute copies of such a program, whether
 34 | gratis or for a fee, you must give the recipients all the rights that
 35 | you have.  You must make sure that they, too, receive or can get the
 36 | source code.  And you must show them these terms so they know their
 37 | rights.
 38 | 
 39 |   We protect your rights with two steps: (1) copyright the software, and
 40 | (2) offer you this license which gives you legal permission to copy,
 41 | distribute and/or modify the software.
 42 | 
 43 |   Also, for each author's protection and ours, we want to make certain
 44 | that everyone understands that there is no warranty for this free
 45 | software.  If the software is modified by someone else and passed on, we
 46 | want its recipients to know that what they have is not the original, so
 47 | that any problems introduced by others will not reflect on the original
 48 | authors' reputations.
 49 | 
 50 |   Finally, any free program is threatened constantly by software
 51 | patents.  We wish to avoid the danger that redistributors of a free
 52 | program will individually obtain patent licenses, in effect making the
 53 | program proprietary.  To prevent this, we have made it clear that any
 54 | patent must be licensed for everyone's free use or not licensed at all.
 55 | 
 56 |   The precise terms and conditions for copying, distribution and
 57 | modification follow.
 58 | 
 59 |                     GNU GENERAL PUBLIC LICENSE
 60 |    TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 61 | 
 62 |   0. This License applies to any program or other work which contains
 63 | a notice placed by the copyright holder saying it may be distributed
 64 | under the terms of this General Public License.  The "Program", below,
 65 | refers to any such program or work, and a "work based on the Program"
 66 | means either the Program or any derivative work under copyright law:
 67 | that is to say, a work containing the Program or a portion of it,
 68 | either verbatim or with modifications and/or translated into another
 69 | language.  (Hereinafter, translation is included without limitation in
 70 | the term "modification".)  Each licensee is addressed as "you".
 71 | 
 72 | Activities other than copying, distribution and modification are not
 73 | covered by this License; they are outside its scope.  The act of
 74 | running the Program is not restricted, and the output from the Program
 75 | is covered only if its contents constitute a work based on the
 76 | Program (independent of having been made by running the Program).
 77 | Whether that is true depends on what the Program does.
 78 | 
 79 |   1. You may copy and distribute verbatim copies of the Program's
 80 | source code as you receive it, in any medium, provided that you
 81 | conspicuously and appropriately publish on each copy an appropriate
 82 | copyright notice and disclaimer of warranty; keep intact all the
 83 | notices that refer to this License and to the absence of any warranty;
 84 | and give any other recipients of the Program a copy of this License
 85 | along with the Program.
 86 | 
 87 | You may charge a fee for the physical act of transferring a copy, and
 88 | you may at your option offer warranty protection in exchange for a fee.
 89 | 
 90 |   2. You may modify your copy or copies of the Program or any portion
 91 | of it, thus forming a work based on the Program, and copy and
 92 | distribute such modifications or work under the terms of Section 1
 93 | above, provided that you also meet all of these conditions:
 94 | 
 95 |     a) You must cause the modified files to carry prominent notices
 96 |     stating that you changed the files and the date of any change.
 97 | 
 98 |     b) You must cause any work that you distribute or publish, that in
 99 |     whole or in part contains or is derived from the Program or any
100 |     part thereof, to be licensed as a whole at no charge to all third
101 |     parties under the terms of this License.
102 | 
103 |     c) If the modified program normally reads commands interactively
104 |     when run, you must cause it, when started running for such
105 |     interactive use in the most ordinary way, to print or display an
106 |     announcement including an appropriate copyright notice and a
107 |     notice that there is no warranty (or else, saying that you provide
108 |     a warranty) and that users may redistribute the program under
109 |     these conditions, and telling the user how to view a copy of this
110 |     License.  (Exception: if the Program itself is interactive but
111 |     does not normally print such an announcement, your work based on
112 |     the Program is not required to print an announcement.)
113 | 
114 | These requirements apply to the modified work as a whole.  If
115 | identifiable sections of that work are not derived from the Program,
116 | and can be reasonably considered independent and separate works in
117 | themselves, then this License, and its terms, do not apply to those
118 | sections when you distribute them as separate works.  But when you
119 | distribute the same sections as part of a whole which is a work based
120 | on the Program, the distribution of the whole must be on the terms of
121 | this License, whose permissions for other licensees extend to the
122 | entire whole, and thus to each and every part regardless of who wrote it.
123 | 
124 | Thus, it is not the intent of this section to claim rights or contest
125 | your rights to work written entirely by you; rather, the intent is to
126 | exercise the right to control the distribution of derivative or
127 | collective works based on the Program.
128 | 
129 | In addition, mere aggregation of another work not based on the Program
130 | with the Program (or with a work based on the Program) on a volume of
131 | a storage or distribution medium does not bring the other work under
132 | the scope of this License.
133 | 
134 |   3. You may copy and distribute the Program (or a work based on it,
135 | under Section 2) in object code or executable form under the terms of
136 | Sections 1 and 2 above provided that you also do one of the following:
137 | 
138 |     a) Accompany it with the complete corresponding machine-readable
139 |     source code, which must be distributed under the terms of Sections
140 |     1 and 2 above on a medium customarily used for software interchange; or,
141 | 
142 |     b) Accompany it with a written offer, valid for at least three
143 |     years, to give any third party, for a charge no more than your
144 |     cost of physically performing source distribution, a complete
145 |     machine-readable copy of the corresponding source code, to be
146 |     distributed under the terms of Sections 1 and 2 above on a medium
147 |     customarily used for software interchange; or,
148 | 
149 |     c) Accompany it with the information you received as to the offer
150 |     to distribute corresponding source code.  (This alternative is
151 |     allowed only for noncommercial distribution and only if you
152 |     received the program in object code or executable form with such
153 |     an offer, in accord with Subsection b above.)
154 | 
155 | The source code for a work means the preferred form of the work for
156 | making modifications to it.  For an executable work, complete source
157 | code means all the source code for all modules it contains, plus any
158 | associated interface definition files, plus the scripts used to
159 | control compilation and installation of the executable.  However, as a
160 | special exception, the source code distributed need not include
161 | anything that is normally distributed (in either source or binary
162 | form) with the major components (compiler, kernel, and so on) of the
163 | operating system on which the executable runs, unless that component
164 | itself accompanies the executable.
165 | 
166 | If distribution of executable or object code is made by offering
167 | access to copy from a designated place, then offering equivalent
168 | access to copy the source code from the same place counts as
169 | distribution of the source code, even though third parties are not
170 | compelled to copy the source along with the object code.
171 | 
172 |   4. You may not copy, modify, sublicense, or distribute the Program
173 | except as expressly provided under this License.  Any attempt
174 | otherwise to copy, modify, sublicense or distribute the Program is
175 | void, and will automatically terminate your rights under this License.
176 | However, parties who have received copies, or rights, from you under
177 | this License will not have their licenses terminated so long as such
178 | parties remain in full compliance.
179 | 
180 |   5. You are not required to accept this License, since you have not
181 | signed it.  However, nothing else grants you permission to modify or
182 | distribute the Program or its derivative works.  These actions are
183 | prohibited by law if you do not accept this License.  Therefore, by
184 | modifying or distributing the Program (or any work based on the
185 | Program), you indicate your acceptance of this License to do so, and
186 | all its terms and conditions for copying, distributing or modifying
187 | the Program or works based on it.
188 | 
189 |   6. Each time you redistribute the Program (or any work based on the
190 | Program), the recipient automatically receives a license from the
191 | original licensor to copy, distribute or modify the Program subject to
192 | these terms and conditions.  You may not impose any further
193 | restrictions on the recipients' exercise of the rights granted herein.
194 | You are not responsible for enforcing compliance by third parties to
195 | this License.
196 | 
197 |   7. If, as a consequence of a court judgment or allegation of patent
198 | infringement or for any other reason (not limited to patent issues),
199 | conditions are imposed on you (whether by court order, agreement or
200 | otherwise) that contradict the conditions of this License, they do not
201 | excuse you from the conditions of this License.  If you cannot
202 | distribute so as to satisfy simultaneously your obligations under this
203 | License and any other pertinent obligations, then as a consequence you
204 | may not distribute the Program at all.  For example, if a patent
205 | license would not permit royalty-free redistribution of the Program by
206 | all those who receive copies directly or indirectly through you, then
207 | the only way you could satisfy both it and this License would be to
208 | refrain entirely from distribution of the Program.
209 | 
210 | If any portion of this section is held invalid or unenforceable under
211 | any particular circumstance, the balance of the section is intended to
212 | apply and the section as a whole is intended to apply in other
213 | circumstances.
214 | 
215 | It is not the purpose of this section to induce you to infringe any
216 | patents or other property right claims or to contest validity of any
217 | such claims; this section has the sole purpose of protecting the
218 | integrity of the free software distribution system, which is
219 | implemented by public license practices.  Many people have made
220 | generous contributions to the wide range of software distributed
221 | through that system in reliance on consistent application of that
222 | system; it is up to the author/donor to decide if he or she is willing
223 | to distribute software through any other system and a licensee cannot
224 | impose that choice.
225 | 
226 | This section is intended to make thoroughly clear what is believed to
227 | be a consequence of the rest of this License.
228 | 
229 |   8. If the distribution and/or use of the Program is restricted in
230 | certain countries either by patents or by copyrighted interfaces, the
231 | original copyright holder who places the Program under this License
232 | may add an explicit geographical distribution limitation excluding
233 | those countries, so that distribution is permitted only in or among
234 | countries not thus excluded.  In such case, this License incorporates
235 | the limitation as if written in the body of this License.
236 | 
237 |   9. The Free Software Foundation may publish revised and/or new versions
238 | of the General Public License from time to time.  Such new versions will
239 | be similar in spirit to the present version, but may differ in detail to
240 | address new problems or concerns.
241 | 
242 | Each version is given a distinguishing version number.  If the Program
243 | specifies a version number of this License which applies to it and "any
244 | later version", you have the option of following the terms and conditions
245 | either of that version or of any later version published by the Free
246 | Software Foundation.  If the Program does not specify a version number of
247 | this License, you may choose any version ever published by the Free Software
248 | Foundation.
249 | 
250 |   10. If you wish to incorporate parts of the Program into other free
251 | programs whose distribution conditions are different, write to the author
252 | to ask for permission.  For software which is copyrighted by the Free
253 | Software Foundation, write to the Free Software Foundation; we sometimes
254 | make exceptions for this.  Our decision will be guided by the two goals
255 | of preserving the free status of all derivatives of our free software and
256 | of promoting the sharing and reuse of software generally.
257 | 
258 |                             NO WARRANTY
259 | 
260 |   11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
261 | FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
262 | OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
263 | PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
264 | OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
265 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
266 | TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
267 | PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
268 | REPAIR OR CORRECTION.
269 | 
270 |   12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
271 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
272 | REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
273 | INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
274 | OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
275 | TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
276 | YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
277 | PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
278 | POSSIBILITY OF SUCH DAMAGES.
279 | 
280 |                      END OF TERMS AND CONDITIONS
281 | 
282 |             How to Apply These Terms to Your New Programs
283 | 
284 |   If you develop a new program, and you want it to be of the greatest
285 | possible use to the public, the best way to achieve this is to make it
286 | free software which everyone can redistribute and change under these terms.
287 | 
288 |   To do so, attach the following notices to the program.  It is safest
289 | to attach them to the start of each source file to most effectively
290 | convey the exclusion of warranty; and each file should have at least
291 | the "copyright" line and a pointer to where the full notice is found.
292 | 
293 |     <one line to give the program's name and a brief idea of what it does.>
294 |     Copyright (C) <year>  <name of author>
295 | 
296 |     This program is free software; you can redistribute it and/or modify
297 |     it under the terms of the GNU General Public License as published by
298 |     the Free Software Foundation; either version 2 of the License, or
299 |     (at your option) any later version.
300 | 
301 |     This program is distributed in the hope that it will be useful,
302 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
303 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
304 |     GNU General Public License for more details.
305 | 
306 |     You should have received a copy of the GNU General Public License along
307 |     with this program; if not, write to the Free Software Foundation, Inc.,
308 |     51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
309 | 
310 | Also add information on how to contact you by electronic and paper mail.
311 | 
312 | If the program is interactive, make it output a short notice like this
313 | when it starts in an interactive mode:
314 | 
315 |     Gnomovision version 69, Copyright (C) year name of author
316 |     Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
317 |     This is free software, and you are welcome to redistribute it
318 |     under certain conditions; type `show c' for details.
319 | 
320 | The hypothetical commands `show w' and `show c' should show the appropriate
321 | parts of the General Public License.  Of course, the commands you use may
322 | be called something other than `show w' and `show c'; they could even be
323 | mouse-clicks or menu items--whatever suits your program.
324 | 
325 | You should also get your employer (if you work as a programmer) or your
326 | school, if any, to sign a "copyright disclaimer" for the program, if
327 | necessary.  Here is a sample; alter the names:
328 | 
329 |   Yoyodyne, Inc., hereby disclaims all copyright interest in the program
330 |   `Gnomovision' (which makes passes at compilers) written by James Hacker.
331 | 
332 |   <signature of Ty Coon>, 1 April 1989
333 |   Ty Coon, President of Vice
334 | 
335 | This General Public License does not permit incorporating your program into
336 | proprietary programs.  If your program is a subroutine library, you may
337 | consider it more useful to permit linking proprietary applications with the
338 | library.  If this is what you want to do, use the GNU Lesser General
339 | Public License instead of this License.
340 | 


--------------------------------------------------------------------------------
/stanford_corenlp_python/README.md:
--------------------------------------------------------------------------------
  1 | # Python interface to Stanford Core NLP tools v3.4.1
  2 | 
  3 | This is a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml).  It can either be imported as a module or run as a JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM on 64-bit machines and usually a few minutes loading time), most applications will probably want to run it as a server.
  4 | 
  5 | 
  6 |    * Python interface to Stanford CoreNLP tools: tagging, phrase-structure parsing, dependency parsing, [named-entity recognition](http://en.wikipedia.org/wiki/Named-entity_recognition), and [coreference resolution](http://en.wikipedia.org/wiki/Coreference).
  7 |    * Runs an JSON-RPC server that wraps the Java server and outputs JSON.
  8 |    * Outputs parse trees which can be used by [nltk](http://nltk.googlecode.com/svn/trunk/doc/howto/tree.html).
  9 | 
 10 | 
 11 | It depends on [pexpect](http://www.noah.org/wiki/pexpect) and includes and uses code from [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/).
 12 | 
 13 | It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON.  The parser will break if the output changes significantly, but it has been tested on **Core NLP tools version 3.4.1** released 2014-08-27.
 14 | 
 15 | ## Download and Usage
 16 | 
 17 | To use this program you must [download](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpack the compressed file containing Stanford's CoreNLP package.  By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.  In other words:
 18 | 
 19 | 	sudo pip install pexpect unidecode
 20 | 	git clone git://github.com/dasmith/stanford-corenlp-python.git
 21 | 	cd stanford-corenlp-python
 22 | 	wget http://nlp.stanford.edu/software/stanford-corenlp-full-2014-08-27.zip
 23 | 	unzip stanford-corenlp-full-2014-08-27.zip
 24 | 
 25 | Then launch the server:
 26 | 
 27 |     python corenlp.py
 28 | 
 29 | Optionally, you can specify a host or port:
 30 | 
 31 |     python corenlp.py -H 0.0.0.0 -p 3456
 32 | 
 33 | That will run a public JSON-RPC server on port 3456.
 34 | 
 35 | Assuming you are running on port 8080, the code in `client.py` shows an example parse: 
 36 | 
 37 |     import jsonrpc
 38 |     from simplejson import loads
 39 |     server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),
 40 |                                  jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))
 41 | 
 42 |     result = loads(server.parse("Hello world.  It is so beautiful"))
 43 |     print "Result", result
 44 | 
 45 | That returns a dictionary containing the keys `sentences` and `coref`. The key `sentences` contains a list of dictionaries for each sentence, which contain `parsetree`, `text`, `tuples` containing the dependencies, and `words`, containing information about parts of speech, recognized named-entities, etc:
 46 | 
 47 | 	{u'sentences': [{u'parsetree': u'(ROOT (S (VP (NP (INTJ (UH Hello)) (NP (NN world)))) (. !)))',
 48 | 	                 u'text': u'Hello world!',
 49 | 	                 u'tuples': [[u'dep', u'world', u'Hello'],
 50 | 	                             [u'root', u'ROOT', u'world']],
 51 | 	                 u'words': [[u'Hello',
 52 | 	                             {u'CharacterOffsetBegin': u'0',
 53 | 	                              u'CharacterOffsetEnd': u'5',
 54 | 	                              u'Lemma': u'hello',
 55 | 	                              u'NamedEntityTag': u'O',
 56 | 	                              u'PartOfSpeech': u'UH'}],
 57 | 	                            [u'world',
 58 | 	                             {u'CharacterOffsetBegin': u'6',
 59 | 	                              u'CharacterOffsetEnd': u'11',
 60 | 	                              u'Lemma': u'world',
 61 | 	                              u'NamedEntityTag': u'O',
 62 | 	                              u'PartOfSpeech': u'NN'}],
 63 | 	                            [u'!',
 64 | 	                             {u'CharacterOffsetBegin': u'11',
 65 | 	                              u'CharacterOffsetEnd': u'12',
 66 | 	                              u'Lemma': u'!',
 67 | 	                              u'NamedEntityTag': u'O',
 68 | 	                              u'PartOfSpeech': u'.'}]]},
 69 | 	                {u'parsetree': u'(ROOT (S (NP (PRP It)) (VP (VBZ is) (ADJP (RB so) (JJ beautiful))) (. .)))',
 70 | 	                 u'text': u'It is so beautiful.',
 71 | 	                 u'tuples': [[u'nsubj', u'beautiful', u'It'],
 72 | 	                             [u'cop', u'beautiful', u'is'],
 73 | 	                             [u'advmod', u'beautiful', u'so'],
 74 | 	                             [u'root', u'ROOT', u'beautiful']],
 75 | 	                 u'words': [[u'It',
 76 | 	                             {u'CharacterOffsetBegin': u'14',
 77 | 	                              u'CharacterOffsetEnd': u'16',
 78 | 	                              u'Lemma': u'it',
 79 | 	                              u'NamedEntityTag': u'O',
 80 | 	                              u'PartOfSpeech': u'PRP'}],
 81 | 	                            [u'is',
 82 | 	                             {u'CharacterOffsetBegin': u'17',
 83 | 	                              u'CharacterOffsetEnd': u'19',
 84 | 	                              u'Lemma': u'be',
 85 | 	                              u'NamedEntityTag': u'O',
 86 | 	                              u'PartOfSpeech': u'VBZ'}],
 87 | 	                            [u'so',
 88 | 	                             {u'CharacterOffsetBegin': u'20',
 89 | 	                              u'CharacterOffsetEnd': u'22',
 90 | 	                              u'Lemma': u'so',
 91 | 	                              u'NamedEntityTag': u'O',
 92 | 	                              u'PartOfSpeech': u'RB'}],
 93 | 	                            [u'beautiful',
 94 | 	                             {u'CharacterOffsetBegin': u'23',
 95 | 	                              u'CharacterOffsetEnd': u'32',
 96 | 	                              u'Lemma': u'beautiful',
 97 | 	                              u'NamedEntityTag': u'O',
 98 | 	                              u'PartOfSpeech': u'JJ'}],
 99 | 	                            [u'.',
100 | 	                             {u'CharacterOffsetBegin': u'32',
101 | 	                              u'CharacterOffsetEnd': u'33',
102 | 	                              u'Lemma': u'.',
103 | 	                              u'NamedEntityTag': u'O',
104 | 	                              u'PartOfSpeech': u'.'}]]}],
105 | 	u'coref': [[[[u'It', 1, 0, 0, 1], [u'Hello world', 0, 1, 0, 2]]]]}
106 |     
107 | To use it in a regular script (useful for debugging), load the module instead:
108 | 
109 |     from corenlp import *
110 |     corenlp = StanfordCoreNLP()  # wait a few minutes...
111 |     corenlp.parse("Parse this sentence.")
112 | 
113 | The server, `StanfordCoreNLP()`, takes an optional argument `corenlp_path` which specifies the path to the jar files.  The default value is `StanfordCoreNLP(corenlp_path="./stanford-corenlp-full-2014-08-27/")`.
114 | 
115 | ## Coreference Resolution
116 | 
117 | The library supports [coreference resolution](http://en.wikipedia.org/wiki/Coreference), which means pronouns can be "dereferenced."  If an entry in the `coref` list is, `[u'Hello world', 0, 1, 0, 2]`, the numbers mean:
118 | 
119 |   * 0 = The reference appears in the 0th sentence (e.g. "Hello world")
120 |   * 1 = The 2nd token, "world", is the [headword](http://en.wikipedia.org/wiki/Head_%28linguistics%29) of that sentence
121 |   * 0 = 'Hello world' begins at the 0th token in the sentence
122 |   * 2 = 'Hello world' ends before the 2nd token in the sentence.
123 | 
124 | <!--
125 | 
126 | 
127 | ## Adding WordNet
128 | 
129 | Note: wordnet doesn't seem to be supported using this approach.  Looks like you'll need Java.
130 | 
131 | Download WordNet-3.0 Prolog:  http://wordnetcode.princeton.edu/3.0/WNprolog-3.0.tar.gz
132 | tar xvfz WNprolog-3.0.tar.gz 
133 | 
134 | -->
135 | 
136 | 
137 | ## Questions 
138 | 
139 | **Stanford CoreNLP tools require a large amount of free memory**.  Java 5+ uses about 50% more RAM on 64-bit machines than 32-bit machines.  32-bit machine users can lower the memory requirements by changing `-Xmx3g` to `-Xmx2g` or even less.
140 | If pexpect timesout while loading models, check to make sure you have enough memory and can run the server alone without your kernel killing the java process:
141 | 
142 | 	java -cp stanford-corenlp-2014-08-27.jar:stanford-corenlp-3.4.1-models.jar:xom.jar:joda-time.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props default.properties
143 | 
144 | You can reach me, Dustin Smith, by sending a message on GitHub or through email (contact information is available [on my webpage](http://web.media.mit.edu/~dustin)).
145 | 
146 | 
147 | # License & Contributors
148 | 
149 | This is free and open source software and has benefited from the contribution and feedback of others.  Like Stanford's CoreNLP tools, it is covered under the [GNU General Public License v2 +](http://www.gnu.org/licenses/gpl-2.0.html), which in short means that modifications to this program must maintain the same free and open source distribution policy.
150 | 
151 | I gratefully welcome bug fixes and new features.  If you have forked this repository, please submit a [pull request](https://help.github.com/articles/using-pull-requests/) so others can benefit from your contributions.  This project has already benefited from contributions from these members of the open source community:
152 | 
153 |   * [Emilio Monti](https://github.com/emilmont)
154 |   * [Justin Cheng](https://github.com/jcccf) 
155 |   * Abhaya Agarwal
156 | 
157 | *Thank you!*
158 | 
159 | ## Related Projects
160 | 
161 | Maintainers of the Core NLP library at Stanford keep an [updated list of wrappers and extensions](http://nlp.stanford.edu/software/corenlp.shtml#Extensions).  See Brendan O'Connor's [stanford_corenlp_pywrapper](https://github.com/brendano/stanford_corenlp_pywrapper) for a different approach more suited to batch processing.
162 | 


--------------------------------------------------------------------------------
/stanford_corenlp_python/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/davidsbatista/Aspect-Based-Sentiment-Analysis/1d9c8ec1131993d924e96676fa212db6b53cb870/stanford_corenlp_python/__init__.py


--------------------------------------------------------------------------------
/stanford_corenlp_python/client.py:
--------------------------------------------------------------------------------
 1 | import json
 2 | from jsonrpc import ServerProxy, JsonRpc20, TransportTcpIp
 3 | from pprint import pprint
 4 | 
 5 | class StanfordNLP:
 6 |     def __init__(self):
 7 |         self.server = ServerProxy(JsonRpc20(),
 8 |                                   TransportTcpIp(addr=("127.0.0.1", 8080)))
 9 |     
10 |     def parse(self, text):
11 |         return json.loads(self.server.parse(text))
12 | 
13 | nlp = StanfordNLP()
14 | result = nlp.parse("Hello world!  It is so beautiful.")
15 | pprint(result)
16 | 
17 | from nltk.tree import Tree
18 | tree = Tree.parse(result['sentences'][0]['parsetree'])
19 | pprint(tree)
20 | 


--------------------------------------------------------------------------------
/stanford_corenlp_python/corenlp.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | #
  3 | # corenlp  - Python interface to Stanford Core NLP tools
  4 | # Copyright (c) 2014 Dustin Smith
  5 | #   https://github.com/dasmith/stanford-corenlp-python
  6 | #
  7 | # This program is free software; you can redistribute it and/or
  8 | # modify it under the terms of the GNU General Public License
  9 | # as published by the Free Software Foundation; either version 2
 10 | # of the License, or (at your option) any later version.
 11 | #
 12 | # This program is distributed in the hope that it will be useful,
 13 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
 14 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 15 | # GNU General Public License for more details.
 16 | #
 17 | # You should have received a copy of the GNU General Public License
 18 | # along with this program; if not, write to the Free Software
 19 | # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
 20 | 
 21 | import json
 22 | import optparse
 23 | import os, re, sys, time, traceback
 24 | import jsonrpc, pexpect
 25 | from progressbar import ProgressBar, Fraction
 26 | import logging
 27 | 
 28 | 
 29 | VERBOSE = True
 30 | 
 31 | STATE_START, STATE_TEXT, STATE_WORDS, STATE_TREE, STATE_DEPENDENCY, STATE_COREFERENCE = 0, 1, 2, 3, 4, 5
 32 | WORD_PATTERN = re.compile('\[([^\]]+)\]')
 33 | CR_PATTERN = re.compile(r"\((\d*),(\d)*,\[(\d*),(\d*)\]\) -> \((\d*),(\d)*,\[(\d*),(\d*)\]\), that is: \"(.*)\" -> \"(.*)\"")
 34 | 
 35 | # initialize logger
 36 | logging.basicConfig(level=logging.INFO)
 37 | logger = logging.getLogger(__name__)
 38 | 
 39 | 
 40 | def remove_id(word):
 41 |     """Removes the numeric suffix from the parsed recognized words: e.g. 'word-2' > 'word' """
 42 |     return word.count("-") == 0 and word or word[0:word.rindex("-")]
 43 | 
 44 | 
 45 | def parse_bracketed(s):
 46 |     '''Parse word features [abc=... def = ...]
 47 |     Also manages to parse out features that have XML within them
 48 |     '''
 49 |     word = None
 50 |     attrs = {}
 51 |     temp = {}
 52 |     # Substitute XML tags, to replace them later
 53 |     for i, tag in enumerate(re.findall(r"(<[^<>]+>.*<\/[^<>]+>)", s)):
 54 |         temp["^^^%d^^^" % i] = tag
 55 |         s = s.replace(tag, "^^^%d^^^" % i)
 56 |     # Load key-value pairs, substituting as necessary
 57 |     for attr, val in re.findall(r"([^=\s]*)=([^=\s]*)", s):
 58 |         if val in temp:
 59 |             val = temp[val]
 60 |         if attr == 'Text':
 61 |             word = val
 62 |         else:
 63 |             attrs[attr] = val
 64 |     return (word, attrs)
 65 | 
 66 | 
 67 | def parse_parser_results(text):
 68 |     """ This is the nasty bit of code to interact with the command-line
 69 |     interface of the CoreNLP tools.  Takes a string of the parser results
 70 |     and then returns a Python list of dictionaries, one for each parsed
 71 |     sentence.
 72 |     """
 73 |     results = {"sentences": []}
 74 |     state = STATE_START
 75 |     for line in text.encode('utf-8').split("\n"):
 76 |         line = line.strip()
 77 | 
 78 |         if line.startswith("Sentence #"):
 79 |             sentence = {'words':[], 'parsetree':[], 'dependencies':[]}
 80 |             results["sentences"].append(sentence)
 81 |             state = STATE_TEXT
 82 | 
 83 |         elif state == STATE_TEXT:
 84 |             sentence['text'] = line
 85 |             state = STATE_WORDS
 86 | 
 87 |         elif state == STATE_WORDS:
 88 |             if not line.startswith("[Text="):
 89 |                 raise Exception('Parse error. Could not find "[Text=" in: %s' % line)
 90 |             for s in WORD_PATTERN.findall(line):
 91 |                 sentence['words'].append(parse_bracketed(s))
 92 |             state = STATE_TREE
 93 | 
 94 |         elif state == STATE_TREE:
 95 |             if len(line) == 0:
 96 |                 state = STATE_DEPENDENCY
 97 |                 sentence['parsetree'] = " ".join(sentence['parsetree'])
 98 |             else:
 99 |                 sentence['parsetree'].append(line)
100 | 
101 |         elif state == STATE_DEPENDENCY:
102 |             if len(line) == 0:
103 |                 state = STATE_COREFERENCE
104 |             else:
105 |                 split_entry = re.split("\(|, ", line[:-1])
106 |                 if len(split_entry) == 3:
107 |                     rel, left, right = map(lambda x: remove_id(x), split_entry)
108 |                     sentence['dependencies'].append(tuple([rel,left,right]))
109 | 
110 |         elif state == STATE_COREFERENCE:
111 |             if "Coreference set" in line:
112 |                 if 'coref' not in results:
113 |                     results['coref'] = []
114 |                 coref_set = []
115 |                 results['coref'].append(coref_set)
116 |             else:
117 |                 for src_i, src_pos, src_l, src_r, sink_i, sink_pos, sink_l, sink_r, src_word, sink_word in CR_PATTERN.findall(line):
118 |                     src_i, src_pos, src_l, src_r = int(src_i)-1, int(src_pos)-1, int(src_l)-1, int(src_r)-1
119 |                     sink_i, sink_pos, sink_l, sink_r = int(sink_i)-1, int(sink_pos)-1, int(sink_l)-1, int(sink_r)-1
120 |                     coref_set.append(((src_word, src_i, src_pos, src_l, src_r), (sink_word, sink_i, sink_pos, sink_l, sink_r)))
121 | 
122 |     return results
123 | 
124 | 
125 | class StanfordCoreNLP(object):
126 |     """
127 |     Command-line interaction with Stanford's CoreNLP java utilities.
128 |     Can be run as a JSON-RPC server or imported as a module.
129 |     """
130 |     def __init__(self, corenlp_path=None):
131 |         """
132 |         Checks the location of the jar files.
133 |         Spawns the server as a process.
134 |         """
135 |         jars = ["stanford-corenlp-3.6.0.jar",
136 |                 "stanford-corenlp-3.6.0-models.jar",
137 |                 "joda-time.jar",
138 |                 "xom.jar",
139 |                 "jollyday.jar",
140 |                 "slf4j-api.jar"]
141 | 
142 |         # if CoreNLP libraries are in a different directory,
143 |         # change the corenlp_path variable to point to them
144 |         if not corenlp_path:
145 |             #corenlp_path = "./stanford-corenlp-full-2014-08-27/"
146 |             corenlp_path = "stanford-corenlp-full/"
147 | 
148 |         java_path = "java"
149 |         classname = "edu.stanford.nlp.pipeline.StanfordCoreNLP"
150 |         # include the properties file, so you can change defaults
151 |         # but any changes in output format will break parse_parser_results()
152 |         props = "-props default.properties"
153 | 
154 |         # add and check classpaths
155 |         jars = [corenlp_path + jar for jar in jars]
156 |         for jar in jars:
157 |             if not os.path.exists(jar):
158 |                 logger.error("Error! Cannot locate %s" % jar)
159 |                 sys.exit(1)
160 | 
161 |         # spawn the server
162 |         start_corenlp = "%s -Xmx1800m -cp %s %s %s" % (java_path, ':'.join(jars), classname, props)
163 |         if VERBOSE:
164 |             logger.debug(start_corenlp)
165 |         self.corenlp = pexpect.spawn(start_corenlp)
166 | 
167 |         # show progress bar while loading the models
168 |         widgets = ['Loading Models: ', Fraction()]
169 |         pbar = ProgressBar(widgets=widgets, maxval=5, force_update=True).start()
170 |         self.corenlp.expect("done.", timeout=20) # Load pos tagger model (~5sec)
171 |         pbar.update(1)
172 |         self.corenlp.expect("done.", timeout=200) # Load NER-all classifier (~33sec)
173 |         pbar.update(2)
174 |         self.corenlp.expect("done.", timeout=600) # Load NER-muc classifier (~60sec)
175 |         pbar.update(3)
176 |         self.corenlp.expect("done.", timeout=600) # Load CoNLL classifier (~50sec)
177 |         pbar.update(4)
178 |         self.corenlp.expect("done.", timeout=200) # Loading PCFG (~3sec)
179 |         pbar.update(5)
180 |         self.corenlp.expect("Entering interactive shell.")
181 |         pbar.finish()
182 | 
183 |     def _parse(self, text):
184 |         """
185 |         This is the core interaction with the parser.
186 | 
187 |         It returns a Python data-structure, while the parse()
188 |         function returns a JSON object
189 |         """
190 |         # clean up anything leftover
191 |         while True:
192 |             try:
193 |                 self.corenlp.read_nonblocking (4000, 0.3)
194 |             except pexpect.TIMEOUT:
195 |                 break
196 | 
197 |         self.corenlp.sendline(text)
198 | 
199 |         # How much time should we give the parser to parse it?
200 |         # the idea here is that you increase the timeout as a
201 |         # function of the text's length.
202 |         # anything longer than 5 seconds requires that you also
203 |         # increase timeout=5 in jsonrpc.py
204 |         max_expected_time = min(40, 3 + len(text) / 20.0)
205 |         end_time = time.time() + max_expected_time
206 | 
207 |         incoming = ""
208 |         while True:
209 |             # Time left, read more data
210 |             try:
211 |                 incoming += self.corenlp.read_nonblocking(2000, 1)
212 |                 if "\nNLP>" in incoming:
213 |                     break
214 |                 time.sleep(0.0001)
215 |             except pexpect.TIMEOUT:
216 |                 if end_time - time.time() < 0:
217 |                     logger.error("Error: Timeout with input '%s'" % (incoming))
218 |                     return {'error': "timed out after %f seconds" % max_expected_time}
219 |                 else:
220 |                     continue
221 |             except pexpect.EOF:
222 |                 break
223 | 
224 |         if VERBOSE:
225 |             logger.debug("%s\n%s" % ('='*40, incoming))
226 |         try:
227 |             results = parse_parser_results(incoming)
228 |         except Exception, e:
229 |             if VERBOSE:
230 |                 logger.debug(traceback.format_exc())
231 |             raise e
232 | 
233 |         return results
234 | 
235 |     def parse(self, text):
236 |         """
237 |         This function takes a text string, sends it to the Stanford parser,
238 |         reads in the result, parses the results and returns a list
239 |         with one dictionary entry for each parsed sentence, in JSON format.
240 |         """
241 |         response = self._parse(text)
242 |         logger.debug("Response: '%s'" % (response))
243 |         return json.dumps(response)
244 | 
245 | 
246 | if __name__ == '__main__':
247 |     """
248 |     The code below starts an JSONRPC server
249 |     """
250 |     parser = optparse.OptionParser(usage="%prog [OPTIONS]")
251 |     parser.add_option('-p', '--port', default='8080',
252 |                       help='Port to serve on (default: 8080)')
253 |     parser.add_option('-H', '--host', default='127.0.0.1',
254 |                       help='Host to serve on (default: 127.0.0.1. Use 0.0.0.0 to make public)')
255 |     options, args = parser.parse_args()
256 |     server = jsonrpc.Server(jsonrpc.JsonRpc20(),
257 |                             jsonrpc.TransportTcpIp(addr=(options.host, int(options.port))))
258 | 
259 |     nlp = StanfordCoreNLP()
260 |     server.register_function(nlp.parse)
261 | 
262 |     logger.info('Serving on http://%s:%s' % (options.host, options.port))
263 |     server.serve()
264 | 


--------------------------------------------------------------------------------
/stanford_corenlp_python/default.properties:
--------------------------------------------------------------------------------
 1 | annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref
 2 | 
 3 | # A true-casing annotator is also available (see below)
 4 | #annotators = tokenize, ssplit, pos, lemma, truecase
 5 | 
 6 | # A simple regex NER annotator is also available
 7 | # annotators = tokenize, ssplit, regexner
 8 | 
 9 | #Use these as EOS punctuation and discard them from the actual sentence content
10 | #These are HTML tags that get expanded internally to correct syntax, e.g., from "p" to "<p>", "</p>" etc.
11 | #Will have no effect if the "cleanxml" annotator is used
12 | #ssplit.htmlBoundariesToDiscard = p,text
13 | 
14 | #
15 | # None of these paths are necessary anymore: we load all models from the JAR file
16 | #
17 | 
18 | #pos.model = /u/nlp/data/pos-tagger/wsj3t0-18-left3words/left3words-distsim-wsj-0-18.tagger
19 | ## slightly better model but much slower:
20 | ##pos.model = /u/nlp/data/pos-tagger/wsj3t0-18-bidirectional/bidirectional-distsim-wsj-0-18.tagger
21 | 
22 | #ner.model.3class = /u/nlp/data/ner/goodClassifiers/all.3class.distsim.crf.ser.gz
23 | #ner.model.7class = /u/nlp/data/ner/goodClassifiers/muc.distsim.crf.ser.gz
24 | #ner.model.MISCclass = /u/nlp/data/ner/goodClassifiers/conll.distsim.crf.ser.gz
25 | 
26 | #regexner.mapping = /u/nlp/data/TAC-KBP2010/sentence_extraction/type_map_clean
27 | #regexner.ignorecase = false
28 | 
29 | #nfl.gazetteer = /scr/nlp/data/machine-reading/Machine_Reading_P1_Reading_Task_V2.0/data/SportsDomain/NFLScoring_UseCase/NFLgazetteer.txt
30 | #nfl.relation.model =  /scr/nlp/data/ldc/LDC2009E112/Machine_Reading_P1_NFL_Scoring_Training_Data_V1.2/models/nfl_relation_model.ser
31 | #nfl.entity.model =  /scr/nlp/data/ldc/LDC2009E112/Machine_Reading_P1_NFL_Scoring_Training_Data_V1.2/models/nfl_entity_model.ser
32 | #printable.relation.beam = 20
33 | 
34 | #parser.model = /u/nlp/data/lexparser/englishPCFG.ser.gz
35 | 
36 | #srl.verb.args=/u/kristina/srl/verbs.core_args
37 | #srl.model.cls=/u/nlp/data/srl/trainedModels/englishPCFG/cls/train.ann
38 | #srl.model.id=/u/nlp/data/srl/trainedModels/englishPCFG/id/train.ann
39 | 
40 | #coref.model=/u/nlp/rte/resources/anno/coref/corefClassifierAll.March2009.ser.gz
41 | #coref.name.dir=/u/nlp/data/coref/
42 | #wordnet.dir=/u/nlp/data/wordnet/wordnet-3.0-prolog
43 | 
44 | #dcoref.demonym = /scr/heeyoung/demonyms.txt
45 | #dcoref.animate = /scr/nlp/data/DekangLin-Animacy-Gender/Animacy/animate.unigrams.txt
46 | #dcoref.inanimate = /scr/nlp/data/DekangLin-Animacy-Gender/Animacy/inanimate.unigrams.txt
47 | #dcoref.male = /scr/nlp/data/Bergsma-Gender/male.unigrams.txt
48 | #dcoref.neutral = /scr/nlp/data/Bergsma-Gender/neutral.unigrams.txt
49 | #dcoref.female = /scr/nlp/data/Bergsma-Gender/female.unigrams.txt
50 | #dcoref.plural = /scr/nlp/data/Bergsma-Gender/plural.unigrams.txt
51 | #dcoref.singular = /scr/nlp/data/Bergsma-Gender/singular.unigrams.txt
52 | 
53 | 
54 | # This is the regular expression that describes which xml tags to keep
55 | # the text from.  In order to on off the xml removal, add cleanxml
56 | # to the list of annotators above after "tokenize".
57 | #clean.xmltags = .*
58 | # A set of tags which will force the end of a sentence.  HTML example:
59 | # you would not want to end on <i>, but you would want to end on <p>.
60 | # Once again, a regular expression.  
61 | # (Blank means there are no sentence enders.)
62 | #clean.sentenceendingtags =
63 | # Whether or not to allow malformed xml
64 | # StanfordCoreNLP.properties
65 | #wordnet.dir=models/wordnet-3.0-prolog
66 | 


--------------------------------------------------------------------------------
/stanford_corenlp_python/progressbar.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/python
  2 | # -*- coding: iso-8859-1 -*-
  3 | #
  4 | # progressbar  - Text progressbar library for python.
  5 | # Copyright (c) 2005 Nilton Volpato
  6 | #
  7 | # This library is free software; you can redistribute it and/or
  8 | # modify it under the terms of the GNU Lesser General Public
  9 | # License as published by the Free Software Foundation; either
 10 | # version 2.1 of the License, or (at your option) any later version.
 11 | #
 12 | # This library is distributed in the hope that it will be useful,
 13 | # but WITHOUT ANY WARRANTY; without even the implied warranty of
 14 | # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 15 | # Lesser General Public License for more details.
 16 | #
 17 | # You should have received a copy of the GNU Lesser General Public
 18 | # License along with this library; if not, write to the Free Software
 19 | # Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
 20 | 
 21 | 
 22 | """Text progressbar library for python.
 23 | 
 24 | This library provides a text mode progressbar. This is typically used
 25 | to display the progress of a long running operation, providing a
 26 | visual clue that processing is underway.
 27 | 
 28 | The ProgressBar class manages the progress, and the format of the line
 29 | is given by a number of widgets. A widget is an object that may
 30 | display diferently depending on the state of the progress. There are
 31 | three types of widget:
 32 | - a string, which always shows itself;
 33 | - a ProgressBarWidget, which may return a diferent value every time
 34 | it's update method is called; and
 35 | - a ProgressBarWidgetHFill, which is like ProgressBarWidget, except it
 36 | expands to fill the remaining width of the line.
 37 | 
 38 | The progressbar module is very easy to use, yet very powerful. And
 39 | automatically supports features like auto-resizing when available.
 40 | """
 41 | 
 42 | __author__ = "Nilton Volpato"
 43 | __author_email__ = "first-name dot last-name @ gmail.com"
 44 | __date__ = "2006-05-07"
 45 | __version__ = "2.2"
 46 | 
 47 | # Changelog
 48 | #
 49 | # 2006-05-07: v2.2 fixed bug in windows
 50 | # 2005-12-04: v2.1 autodetect terminal width, added start method
 51 | # 2005-12-04: v2.0 everything is now a widget (wow!)
 52 | # 2005-12-03: v1.0 rewrite using widgets
 53 | # 2005-06-02: v0.5 rewrite
 54 | # 2004-??-??: v0.1 first version
 55 | 
 56 | import sys
 57 | import time
 58 | from array import array
 59 | try:
 60 |     from fcntl import ioctl
 61 |     import termios
 62 | except ImportError:
 63 |     pass
 64 | import signal
 65 | 
 66 | 
 67 | class ProgressBarWidget(object):
 68 |     """This is an element of ProgressBar formatting.
 69 | 
 70 |     The ProgressBar object will call it's update value when an update
 71 |     is needed. It's size may change between call, but the results will
 72 |     not be good if the size changes drastically and repeatedly.
 73 |     """
 74 |     def update(self, pbar):
 75 |         """Returns the string representing the widget.
 76 | 
 77 |         The parameter pbar is a reference to the calling ProgressBar,
 78 |         where one can access attributes of the class for knowing how
 79 |         the update must be made.
 80 | 
 81 |         At least this function must be overriden."""
 82 |         pass
 83 | 
 84 | 
 85 | class ProgressBarWidgetHFill(object):
 86 |     """This is a variable width element of ProgressBar formatting.
 87 | 
 88 |     The ProgressBar object will call it's update value, informing the
 89 |     width this object must the made. This is like TeX \\hfill, it will
 90 |     expand to fill the line. You can use more than one in the same
 91 |     line, and they will all have the same width, and together will
 92 |     fill the line.
 93 |     """
 94 |     def update(self, pbar, width):
 95 |         """Returns the string representing the widget.
 96 | 
 97 |         The parameter pbar is a reference to the calling ProgressBar,
 98 |         where one can access attributes of the class for knowing how
 99 |         the update must be made. The parameter width is the total
100 |         horizontal width the widget must have.
101 | 
102 |         At least this function must be overriden."""
103 |         pass
104 | 
105 | 
106 | class ETA(ProgressBarWidget):
107 |     "Widget for the Estimated Time of Arrival"
108 |     def format_time(self, seconds):
109 |         return time.strftime('%H:%M:%S', time.gmtime(seconds))
110 | 
111 |     def update(self, pbar):
112 |         if pbar.currval == 0:
113 |             return 'ETA:  --:--:--'
114 |         elif pbar.finished:
115 |             return 'Time: %s' % self.format_time(pbar.seconds_elapsed)
116 |         else:
117 |             elapsed = pbar.seconds_elapsed
118 |             eta = elapsed * pbar.maxval / pbar.currval - elapsed
119 |             return 'ETA:  %s' % self.format_time(eta)
120 | 
121 | 
122 | class FileTransferSpeed(ProgressBarWidget):
123 |     "Widget for showing the transfer speed (useful for file transfers)."
124 |     def __init__(self):
125 |         self.fmt = '%6.2f %s'
126 |         self.units = ['B', 'K', 'M', 'G', 'T', 'P']
127 | 
128 |     def update(self, pbar):
129 |         if pbar.seconds_elapsed < 2e-6:  # == 0:
130 |             bps = 0.0
131 |         else:
132 |             bps = float(pbar.currval) / pbar.seconds_elapsed
133 |         spd = bps
134 |         for u in self.units:
135 |             if spd < 1000:
136 |                 break
137 |             spd /= 1000
138 |         return self.fmt % (spd, u + '/s')
139 | 
140 | 
141 | class RotatingMarker(ProgressBarWidget):
142 |     "A rotating marker for filling the bar of progress."
143 |     def __init__(self, markers='|/-\\'):
144 |         self.markers = markers
145 |         self.curmark = -1
146 | 
147 |     def update(self, pbar):
148 |         if pbar.finished:
149 |             return self.markers[0]
150 |         self.curmark = (self.curmark + 1) % len(self.markers)
151 |         return self.markers[self.curmark]
152 | 
153 | 
154 | class Percentage(ProgressBarWidget):
155 |     "Just the percentage done."
156 |     def update(self, pbar):
157 |         return '%3d%%' % pbar.percentage()
158 | 
159 | 
160 | class Fraction(ProgressBarWidget):
161 |     "Just the fraction done."
162 |     def update(self, pbar):
163 |         return "%d/%d" % (pbar.currval, pbar.maxval)
164 | 
165 | 
166 | class Bar(ProgressBarWidgetHFill):
167 |     "The bar of progress. It will strech to fill the line."
168 |     def __init__(self, marker='#', left='|', right='|'):
169 |         self.marker = marker
170 |         self.left = left
171 |         self.right = right
172 | 
173 |     def _format_marker(self, pbar):
174 |         if isinstance(self.marker, (str, unicode)):
175 |             return self.marker
176 |         else:
177 |             return self.marker.update(pbar)
178 | 
179 |     def update(self, pbar, width):
180 |         percent = pbar.percentage()
181 |         cwidth = width - len(self.left) - len(self.right)
182 |         marked_width = int(percent * cwidth / 100)
183 |         m = self._format_marker(pbar)
184 |         bar = (self.left + (m * marked_width).ljust(cwidth) + self.right)
185 |         return bar
186 | 
187 | 
188 | class ReverseBar(Bar):
189 |     "The reverse bar of progress, or bar of regress. :)"
190 |     def update(self, pbar, width):
191 |         percent = pbar.percentage()
192 |         cwidth = width - len(self.left) - len(self.right)
193 |         marked_width = int(percent * cwidth / 100)
194 |         m = self._format_marker(pbar)
195 |         bar = (self.left + (m * marked_width).rjust(cwidth) + self.right)
196 |         return bar
197 | 
198 | default_widgets = [Percentage(), ' ', Bar()]
199 | 
200 | 
201 | class ProgressBar(object):
202 |     """This is the ProgressBar class, it updates and prints the bar.
203 | 
204 |     The term_width parameter may be an integer. Or None, in which case
205 |     it will try to guess it, if it fails it will default to 80 columns.
206 | 
207 |     The simple use is like this:
208 |     >>> pbar = ProgressBar().start()
209 |     >>> for i in xrange(100):
210 |     ...    # do something
211 |     ...    pbar.update(i+1)
212 |     ...
213 |     >>> pbar.finish()
214 | 
215 |     But anything you want to do is possible (well, almost anything).
216 |     You can supply different widgets of any type in any order. And you
217 |     can even write your own widgets! There are many widgets already
218 |     shipped and you should experiment with them.
219 | 
220 |     When implementing a widget update method you may access any
221 |     attribute or function of the ProgressBar object calling the
222 |     widget's update method. The most important attributes you would
223 |     like to access are:
224 |     - currval: current value of the progress, 0 <= currval <= maxval
225 |     - maxval: maximum (and final) value of the progress
226 |     - finished: True if the bar is have finished (reached 100%), False o/w
227 |     - start_time: first time update() method of ProgressBar was called
228 |     - seconds_elapsed: seconds elapsed since start_time
229 |     - percentage(): percentage of the progress (this is a method)
230 |     """
231 |     def __init__(self, maxval=100, widgets=default_widgets, term_width=None,
232 |                  fd=sys.stderr, force_update=False):
233 |         assert maxval > 0
234 |         self.maxval = maxval
235 |         self.widgets = widgets
236 |         self.fd = fd
237 |         self.signal_set = False
238 |         if term_width is None:
239 |             try:
240 |                 self.handle_resize(None, None)
241 |                 signal.signal(signal.SIGWINCH, self.handle_resize)
242 |                 self.signal_set = True
243 |             except:
244 |                 self.term_width = 79
245 |         else:
246 |             self.term_width = term_width
247 | 
248 |         self.currval = 0
249 |         self.finished = False
250 |         self.prev_percentage = -1
251 |         self.start_time = None
252 |         self.seconds_elapsed = 0
253 |         self.force_update = force_update
254 | 
255 |     def handle_resize(self, signum, frame):
256 |         h, w = array('h', ioctl(self.fd, termios.TIOCGWINSZ, '\0' * 8))[:2]
257 |         self.term_width = w
258 | 
259 |     def percentage(self):
260 |         "Returns the percentage of the progress."
261 |         return self.currval * 100.0 / self.maxval
262 | 
263 |     def _format_widgets(self):
264 |         r = []
265 |         hfill_inds = []
266 |         num_hfill = 0
267 |         currwidth = 0
268 |         for i, w in enumerate(self.widgets):
269 |             if isinstance(w, ProgressBarWidgetHFill):
270 |                 r.append(w)
271 |                 hfill_inds.append(i)
272 |                 num_hfill += 1
273 |             elif isinstance(w, (str, unicode)):
274 |                 r.append(w)
275 |                 currwidth += len(w)
276 |             else:
277 |                 weval = w.update(self)
278 |                 currwidth += len(weval)
279 |                 r.append(weval)
280 |         for iw in hfill_inds:
281 |             r[iw] = r[iw].update(self,
282 |                                  (self.term_width - currwidth) / num_hfill)
283 |         return r
284 | 
285 |     def _format_line(self):
286 |         return ''.join(self._format_widgets()).ljust(self.term_width)
287 | 
288 |     def _need_update(self):
289 |         if self.force_update:
290 |             return True
291 |         return int(self.percentage()) != int(self.prev_percentage)
292 | 
293 |     def reset(self):
294 |         if not self.finished and self.start_time:
295 |             self.finish()
296 |         self.finished = False
297 |         self.currval = 0
298 |         self.start_time = None
299 |         self.seconds_elapsed = None
300 |         self.prev_percentage = None
301 |         return self
302 | 
303 |     def update(self, value):
304 |         "Updates the progress bar to a new value."
305 |         assert 0 <= value <= self.maxval
306 |         self.currval = value
307 |         if not self._need_update() or self.finished:
308 |             return
309 |         if not self.start_time:
310 |             self.start_time = time.time()
311 |         self.seconds_elapsed = time.time() - self.start_time
312 |         self.prev_percentage = self.percentage()
313 |         if value != self.maxval:
314 |             self.fd.write(self._format_line() + '\r')
315 |         else:
316 |             self.finished = True
317 |             self.fd.write(self._format_line() + '\n')
318 | 
319 |     def start(self):
320 |         """Start measuring time, and prints the bar at 0%.
321 | 
322 |         It returns self so you can use it like this:
323 |         >>> pbar = ProgressBar().start()
324 |         >>> for i in xrange(100):
325 |         ...    # do something
326 |         ...    pbar.update(i+1)
327 |         ...
328 |         >>> pbar.finish()
329 |         """
330 |         self.update(0)
331 |         return self
332 | 
333 |     def finish(self):
334 |         """Used to tell the progress is finished."""
335 |         self.update(self.maxval)
336 |         if self.signal_set:
337 |             signal.signal(signal.SIGWINCH, signal.SIG_DFL)
338 | 
339 | 
340 | def example1():
341 |     widgets = ['Test: ', Percentage(), ' ', Bar(marker=RotatingMarker()),
342 |                ' ', ETA(), ' ', FileTransferSpeed()]
343 |     pbar = ProgressBar(widgets=widgets, maxval=10000000).start()
344 |     for i in range(1000000):
345 |         # do something
346 |         pbar.update(10 * i + 1)
347 |     pbar.finish()
348 |     return pbar
349 | 
350 | 
351 | def example2():
352 |     class CrazyFileTransferSpeed(FileTransferSpeed):
353 |         "It's bigger between 45 and 80 percent"
354 |         def update(self, pbar):
355 |             if 45 < pbar.percentage() < 80:
356 |                 return 'Bigger Now ' + FileTransferSpeed.update(self, pbar)
357 |             else:
358 |                 return FileTransferSpeed.update(self, pbar)
359 | 
360 |     widgets = [CrazyFileTransferSpeed(), ' <<<',
361 |                Bar(), '>>> ', Percentage(), ' ', ETA()]
362 |     pbar = ProgressBar(widgets=widgets, maxval=10000000)
363 |     # maybe do something
364 |     pbar.start()
365 |     for i in range(2000000):
366 |         # do something
367 |         pbar.update(5 * i + 1)
368 |     pbar.finish()
369 |     return pbar
370 | 
371 | 
372 | def example3():
373 |     widgets = [Bar('>'), ' ', ETA(), ' ', ReverseBar('<')]
374 |     pbar = ProgressBar(widgets=widgets, maxval=10000000).start()
375 |     for i in range(1000000):
376 |         # do something
377 |         pbar.update(10 * i + 1)
378 |     pbar.finish()
379 |     return pbar
380 | 
381 | 
382 | def example4():
383 |     widgets = ['Test: ', Percentage(), ' ',
384 |                Bar(marker='0', left='[', right=']'),
385 |                ' ', ETA(), ' ', FileTransferSpeed()]
386 |     pbar = ProgressBar(widgets=widgets, maxval=500)
387 |     pbar.start()
388 |     for i in range(100, 500 + 1, 50):
389 |         time.sleep(0.2)
390 |         pbar.update(i)
391 |     pbar.finish()
392 |     return pbar
393 | 
394 | 
395 | def example5():
396 |     widgets = ['Test: ', Fraction(), ' ', Bar(marker=RotatingMarker()),
397 |                ' ', ETA(), ' ', FileTransferSpeed()]
398 |     pbar = ProgressBar(widgets=widgets, maxval=10, force_update=True).start()
399 |     for i in range(1, 11):
400 |         # do something
401 |         time.sleep(0.5)
402 |         pbar.update(i)
403 |     pbar.finish()
404 |     return pbar
405 | 
406 | 
407 | def main():
408 |     example1()
409 |     print
410 |     example2()
411 |     print
412 |     example3()
413 |     print
414 |     example4()
415 |     print
416 |     example5()
417 |     print
418 | 
419 | if __name__ == '__main__':
420 |     main()
421 | 


--------------------------------------------------------------------------------