├── Extract-Classify-ACOS
    ├── Readme.md
    ├── __pycache__
    │   ├── dataset_utils.cpython-38.pyc
    │   ├── dataset_utils.cpython-39.pyc
    │   ├── eval_metrics.cpython-38.pyc
    │   ├── file_utils.cpython-38.pyc
    │   ├── manager.cpython-38.pyc
    │   ├── manager.cpython-39.pyc
    │   ├── modeling.cpython-38.pyc
    │   └── run_classifier_dataset_utils.cpython-38.pyc
    ├── bert_utils
    │   ├── __init__.py
    │   ├── __pycache__
    │   │   ├── __init__.cpython-37.pyc
    │   │   ├── __init__.cpython-38.pyc
    │   │   ├── file_utils.cpython-37.pyc
    │   │   ├── file_utils.cpython-38.pyc
    │   │   ├── optimization.cpython-37.pyc
    │   │   ├── optimization.cpython-38.pyc
    │   │   ├── tokenization.cpython-37.pyc
    │   │   └── tokenization.cpython-38.pyc
    │   ├── file_utils.py
    │   ├── optimization.py
    │   └── tokenization.py
    ├── dataset_utils.py
    ├── eval_metrics.py
    ├── file_utils.py
    ├── manager.py
    ├── modeling.py
    ├── run.sh
    ├── run_classifier_dataset_utils.py
    ├── run_step1.py
    ├── run_step2.py
    └── tokenized_data
    │   ├── get_1st_pairs.py
    │   ├── laptop_dev_pair.tsv
    │   ├── laptop_dev_quad_bert.tsv
    │   ├── laptop_test_pair.tsv
    │   ├── laptop_test_pair_1st.tsv
    │   ├── laptop_test_quad_bert.tsv
    │   ├── laptop_train_pair.tsv
    │   ├── laptop_train_quad_bert.tsv
    │   ├── rest16_dev_pair.tsv
    │   ├── rest16_dev_quad_bert.tsv
    │   ├── rest16_test_pair.tsv
    │   ├── rest16_test_pair_1st.tsv
    │   ├── rest16_test_quad_bert.tsv
    │   ├── rest16_train_pair.tsv
    │   └── rest16_train_quad_bert.tsv
├── README.md
├── data
    ├── Laptop-ACOS
    │   ├── laptop_quad_dev.tsv
    │   ├── laptop_quad_test.tsv
    │   └── laptop_quad_train.tsv
    ├── Readme.md
    └── Restaurant-ACOS
    │   ├── rest16_quad_dev.tsv
    │   ├── rest16_quad_test.tsv
    │   └── rest16_quad_train.tsv
└── img
    ├── figure1.PNG
    ├── main_results.PNG
    ├── method.PNG
    ├── method.jpg
    ├── separate_results.PNG
    └── stat.PNG


/Extract-Classify-ACOS/Readme.md:
--------------------------------------------------------------------------------
 1 | 
 2 | ## Running
 3 | 
 4 | Modify the corresponding BERT_BASE_DIR, DATA_DIR and output_dir to run the script.
 5 | 
 6 | BERT_BASE_DIR: The directory containing config, pytorch_model, and vocab files of BERT (the pytorch BERT model should be added here).
 7 | 
 8 | BASE_DIR: The directory of current project.
 9 | 
10 | DATA_DIR: The data directory DIR, where data files are stored at 'DIR/tokenized_data/.' as DOMAIN_YEAR_train.tsv (e.g., rest16_train_quad_bert.tsv).
11 | 
12 | output_dir: Output directory containing the fine-tuned language model.
13 | 
14 | ## Requirements
15 | * Python 3.7
16 | * Pytorch 1.8
17 | * pytorch-crf 0.7.2
18 | 
19 | **Running**
20 | 
21 | Modify the corresponding BERT_BASE_DIR, BASE_DIR, DATA_DIR and output_dir to run the script:
22 | ```
23 | sh run.sh
24 | ```
25 | 
26 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/__pycache__/dataset_utils.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/dataset_utils.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/__pycache__/dataset_utils.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/dataset_utils.cpython-39.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/__pycache__/eval_metrics.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/eval_metrics.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/__pycache__/file_utils.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/file_utils.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/__pycache__/manager.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/manager.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/__pycache__/manager.cpython-39.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/manager.cpython-39.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/__pycache__/modeling.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/modeling.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/__pycache__/run_classifier_dataset_utils.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/run_classifier_dataset_utils.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__init__.py:
--------------------------------------------------------------------------------
1 | # 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__pycache__/__init__.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/__init__.cpython-37.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__pycache__/__init__.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/__init__.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__pycache__/file_utils.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/file_utils.cpython-37.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__pycache__/file_utils.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/file_utils.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__pycache__/optimization.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/optimization.cpython-37.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__pycache__/optimization.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/optimization.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__pycache__/tokenization.cpython-37.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/tokenization.cpython-37.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/__pycache__/tokenization.cpython-38.pyc:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/tokenization.cpython-38.pyc


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/file_utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Utilities for working with the local dataset cache.
  3 | This file is adapted from the AllenNLP library at https://github.com/allenai/allennlp
  4 | Copyright by the AllenNLP authors.
  5 | """
  6 | from __future__ import (absolute_import, division, print_function, unicode_literals)
  7 | 
  8 | import sys
  9 | import json
 10 | import logging
 11 | import os
 12 | import shutil
 13 | import tempfile
 14 | import fnmatch
 15 | from functools import wraps
 16 | from hashlib import sha256
 17 | import sys
 18 | from io import open
 19 | 
 20 | import boto3
 21 | import requests
 22 | from botocore.exceptions import ClientError
 23 | from tqdm import tqdm
 24 | 
 25 | try:
 26 |     from torch.hub import _get_torch_home
 27 |     torch_cache_home = _get_torch_home()
 28 | except ImportError:
 29 |     torch_cache_home = os.path.expanduser(
 30 |         os.getenv('TORCH_HOME', os.path.join(
 31 |             os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch')))
 32 | default_cache_path = os.path.join(torch_cache_home, 'pytorch_pretrained_bert')
 33 | 
 34 | try:
 35 |     from urllib.parse import urlparse
 36 | except ImportError:
 37 |     from urlparse import urlparse
 38 | 
 39 | try:
 40 |     from pathlib import Path
 41 |     PYTORCH_PRETRAINED_BERT_CACHE = Path(
 42 |         os.getenv('PYTORCH_PRETRAINED_BERT_CACHE', default_cache_path))
 43 | except (AttributeError, ImportError):
 44 |     PYTORCH_PRETRAINED_BERT_CACHE = os.getenv('PYTORCH_PRETRAINED_BERT_CACHE',
 45 |                                               default_cache_path)
 46 | 
 47 | CONFIG_NAME = "config.json"
 48 | WEIGHTS_NAME = "pytorch_model.bin"
 49 | 
 50 | logger = logging.getLogger(__name__)  # pylint: disable=invalid-name
 51 | 
 52 | 
 53 | def url_to_filename(url, etag=None):
 54 |     """
 55 |     Convert `url` into a hashed filename in a repeatable way.
 56 |     If `etag` is specified, append its hash to the url's, delimited
 57 |     by a period.
 58 |     """
 59 |     url_bytes = url.encode('utf-8')
 60 |     url_hash = sha256(url_bytes)
 61 |     filename = url_hash.hexdigest()
 62 | 
 63 |     if etag:
 64 |         etag_bytes = etag.encode('utf-8')
 65 |         etag_hash = sha256(etag_bytes)
 66 |         filename += '.' + etag_hash.hexdigest()
 67 | 
 68 |     return filename
 69 | 
 70 | 
 71 | def filename_to_url(filename, cache_dir=None):
 72 |     """
 73 |     Return the url and etag (which may be ``None``) stored for `filename`.
 74 |     Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist.
 75 |     """
 76 |     if cache_dir is None:
 77 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
 78 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
 79 |         cache_dir = str(cache_dir)
 80 | 
 81 |     cache_path = os.path.join(cache_dir, filename)
 82 |     if not os.path.exists(cache_path):
 83 |         raise EnvironmentError("file {} not found".format(cache_path))
 84 | 
 85 |     meta_path = cache_path + '.json'
 86 |     if not os.path.exists(meta_path):
 87 |         raise EnvironmentError("file {} not found".format(meta_path))
 88 | 
 89 |     with open(meta_path, encoding="utf-8") as meta_file:
 90 |         metadata = json.load(meta_file)
 91 |     url = metadata['url']
 92 |     etag = metadata['etag']
 93 | 
 94 |     return url, etag
 95 | 
 96 | 
 97 | def cached_path(url_or_filename, cache_dir=None):
 98 |     """
 99 |     Given something that might be a URL (or might be a local path),
100 |     determine which. If it's a URL, download the file and cache it, and
101 |     return the path to the cached file. If it's already a local path,
102 |     make sure the file exists and then return the path.
103 |     """
104 |     if cache_dir is None:
105 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
106 |     if sys.version_info[0] == 3 and isinstance(url_or_filename, Path):
107 |         url_or_filename = str(url_or_filename)
108 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
109 |         cache_dir = str(cache_dir)
110 | 
111 |     parsed = urlparse(url_or_filename)
112 | 
113 |     if parsed.scheme in ('http', 'https', 's3'):
114 |         # URL, so get it from the cache (downloading if necessary)
115 |         return get_from_cache(url_or_filename, cache_dir)
116 |     elif os.path.exists(url_or_filename):
117 |         # File, and it exists.
118 |         return url_or_filename
119 |     elif parsed.scheme == '':
120 |         # File, but it doesn't exist.
121 |         raise EnvironmentError("file {} not found".format(url_or_filename))
122 |     else:
123 |         # Something unknown
124 |         raise ValueError("unable to parse {} as a URL or as a local path".format(url_or_filename))
125 | 
126 | 
127 | def split_s3_path(url):
128 |     """Split a full s3 path into the bucket name and path."""
129 |     parsed = urlparse(url)
130 |     if not parsed.netloc or not parsed.path:
131 |         raise ValueError("bad s3 path {}".format(url))
132 |     bucket_name = parsed.netloc
133 |     s3_path = parsed.path
134 |     # Remove '/' at beginning of path.
135 |     if s3_path.startswith("/"):
136 |         s3_path = s3_path[1:]
137 |     return bucket_name, s3_path
138 | 
139 | 
140 | def s3_request(func):
141 |     """
142 |     Wrapper function for s3 requests in order to create more helpful error
143 |     messages.
144 |     """
145 | 
146 |     @wraps(func)
147 |     def wrapper(url, *args, **kwargs):
148 |         try:
149 |             return func(url, *args, **kwargs)
150 |         except ClientError as exc:
151 |             if int(exc.response["Error"]["Code"]) == 404:
152 |                 raise EnvironmentError("file {} not found".format(url))
153 |             else:
154 |                 raise
155 | 
156 |     return wrapper
157 | 
158 | 
159 | @s3_request
160 | def s3_etag(url):
161 |     """Check ETag on S3 object."""
162 |     s3_resource = boto3.resource("s3")
163 |     bucket_name, s3_path = split_s3_path(url)
164 |     s3_object = s3_resource.Object(bucket_name, s3_path)
165 |     return s3_object.e_tag
166 | 
167 | 
168 | @s3_request
169 | def s3_get(url, temp_file):
170 |     """Pull a file directly from S3."""
171 |     s3_resource = boto3.resource("s3")
172 |     bucket_name, s3_path = split_s3_path(url)
173 |     s3_resource.Bucket(bucket_name).download_fileobj(s3_path, temp_file)
174 | 
175 | 
176 | def http_get(url, temp_file):
177 |     req = requests.get(url, stream=True)
178 |     content_length = req.headers.get('Content-Length')
179 |     total = int(content_length) if content_length is not None else None
180 |     progress = tqdm(unit="B", total=total)
181 |     for chunk in req.iter_content(chunk_size=1024):
182 |         if chunk: # filter out keep-alive new chunks
183 |             progress.update(len(chunk))
184 |             temp_file.write(chunk)
185 |     progress.close()
186 | 
187 | 
188 | def get_from_cache(url, cache_dir=None):
189 |     """
190 |     Given a URL, look for the corresponding dataset in the local cache.
191 |     If it's not there, download it. Then return the path to the cached file.
192 |     """
193 |     if cache_dir is None:
194 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
195 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
196 |         cache_dir = str(cache_dir)
197 | 
198 |     if not os.path.exists(cache_dir):
199 |         os.makedirs(cache_dir)
200 | 
201 |     # Get eTag to add to filename, if it exists.
202 |     if url.startswith("s3://"):
203 |         etag = s3_etag(url)
204 |     else:
205 |         try:
206 |             response = requests.head(url, allow_redirects=True)
207 |             if response.status_code != 200:
208 |                 etag = None
209 |             else:
210 |                 etag = response.headers.get("ETag")
211 |         except EnvironmentError:
212 |             etag = None
213 | 
214 |     if sys.version_info[0] == 2 and etag is not None:
215 |         etag = etag.decode('utf-8')
216 |     filename = url_to_filename(url, etag)
217 | 
218 |     # get cache path to put the file
219 |     cache_path = os.path.join(cache_dir, filename)
220 | 
221 |     # If we don't have a connection (etag is None) and can't identify the file
222 |     # try to get the last downloaded one
223 |     if not os.path.exists(cache_path) and etag is None:
224 |         matching_files = fnmatch.filter(os.listdir(cache_dir), filename + '.*')
225 |         matching_files = list(filter(lambda s: not s.endswith('.json'), matching_files))
226 |         if matching_files:
227 |             cache_path = os.path.join(cache_dir, matching_files[-1])
228 | 
229 |     if not os.path.exists(cache_path):
230 |         # Download to temporary file, then copy to cache dir once finished.
231 |         # Otherwise you get corrupt cache entries if the download gets interrupted.
232 |         with tempfile.NamedTemporaryFile() as temp_file:
233 |             logger.info("%s not found in cache, downloading to %s", url, temp_file.name)
234 | 
235 |             # GET file object
236 |             if url.startswith("s3://"):
237 |                 s3_get(url, temp_file)
238 |             else:
239 |                 http_get(url, temp_file)
240 | 
241 |             # we are copying the file before closing it, so flush to avoid truncation
242 |             temp_file.flush()
243 |             # shutil.copyfileobj() starts at the current position, so go to the start
244 |             temp_file.seek(0)
245 | 
246 |             logger.info("copying %s to cache at %s", temp_file.name, cache_path)
247 |             with open(cache_path, 'wb') as cache_file:
248 |                 shutil.copyfileobj(temp_file, cache_file)
249 | 
250 |             logger.info("creating metadata file for %s", cache_path)
251 |             meta = {'url': url, 'etag': etag}
252 |             meta_path = cache_path + '.json'
253 |             with open(meta_path, 'w') as meta_file:
254 |                 output_string = json.dumps(meta)
255 |                 if sys.version_info[0] == 2 and isinstance(output_string, str):
256 |                     output_string = unicode(output_string, 'utf-8')  # The beauty of python 2
257 |                 meta_file.write(output_string)
258 | 
259 |             logger.info("removing temp file %s", temp_file.name)
260 | 
261 |     return cache_path
262 | 
263 | 
264 | def read_set_from_file(filename):
265 |     '''
266 |     Extract a de-duped collection (set) of text from a file.
267 |     Expected file format is one item per line.
268 |     '''
269 |     collection = set()
270 |     with open(filename, 'r', encoding='utf-8') as file_:
271 |         for line in file_:
272 |             collection.add(line.rstrip())
273 |     return collection
274 | 
275 | 
276 | def get_file_extension(path, dot=True, lower=True):
277 |     ext = os.path.splitext(path)[1]
278 |     ext = ext if dot else ext[1:]
279 |     return ext.lower() if lower else ext
280 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/optimization.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
  3 | #
  4 | # Licensed under the Apache License, Version 2.0 (the "License");
  5 | # you may not use this file except in compliance with the License.
  6 | # You may obtain a copy of the License at
  7 | #
  8 | #     http://www.apache.org/licenses/LICENSE-2.0
  9 | #
 10 | # Unless required by applicable law or agreed to in writing, software
 11 | # distributed under the License is distributed on an "AS IS" BASIS,
 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | # See the License for the specific language governing permissions and
 14 | # limitations under the License.
 15 | """PyTorch optimization for BERT model."""
 16 | 
 17 | import math
 18 | import torch
 19 | from torch.optim import Optimizer
 20 | from torch.optim.optimizer import required
 21 | from torch.nn.utils import clip_grad_norm_
 22 | import logging
 23 | import abc
 24 | import sys
 25 | 
 26 | logger = logging.getLogger(__name__)
 27 | 
 28 | 
 29 | if sys.version_info >= (3, 4):
 30 |     ABC = abc.ABC
 31 | else:
 32 |     ABC = abc.ABCMeta('ABC', (), {})
 33 | 
 34 | 
 35 | class _LRSchedule(ABC):
 36 |     """ Parent of all LRSchedules here. """
 37 |     warn_t_total = False        # is set to True for schedules where progressing beyond t_total steps doesn't make sense
 38 |     def __init__(self, warmup=0.002, t_total=-1, **kw):
 39 |         """
 40 |         :param warmup:  what fraction of t_total steps will be used for linear warmup
 41 |         :param t_total: how many training steps (updates) are planned
 42 |         :param kw:
 43 |         """
 44 |         super(_LRSchedule, self).__init__(**kw)
 45 |         if t_total < 0:
 46 |             logger.warning("t_total value of {} results in schedule not being applied".format(t_total))
 47 |         if not 0.0 <= warmup < 1.0 and not warmup == -1:
 48 |             raise ValueError("Invalid warmup: {} - should be in [0.0, 1.0[ or -1".format(warmup))
 49 |         warmup = max(warmup, 0.)
 50 |         self.warmup, self.t_total = float(warmup), float(t_total)
 51 |         self.warned_for_t_total_at_progress = -1
 52 | 
 53 |     def get_lr(self, step, nowarn=False):
 54 |         """
 55 |         :param step:    which of t_total steps we're on
 56 |         :param nowarn:  set to True to suppress warning regarding training beyond specified 't_total' steps
 57 |         :return:        learning rate multiplier for current update
 58 |         """
 59 |         if self.t_total < 0:
 60 |             return 1.
 61 |         progress = float(step) / self.t_total
 62 |         ret = self.get_lr_(progress)
 63 |         # warning for exceeding t_total (only active with warmup_linear
 64 |         if not nowarn and self.warn_t_total and progress > 1. and progress > self.warned_for_t_total_at_progress:
 65 |             logger.warning(
 66 |                 "Training beyond specified 't_total'. Learning rate multiplier set to {}. Please set 't_total' of {} correctly."
 67 |                     .format(ret, self.__class__.__name__))
 68 |             self.warned_for_t_total_at_progress = progress
 69 |         # end warning
 70 |         return ret
 71 | 
 72 |     @abc.abstractmethod
 73 |     def get_lr_(self, progress):
 74 |         """
 75 |         :param progress:    value between 0 and 1 (unless going beyond t_total steps) specifying training progress
 76 |         :return:            learning rate multiplier for current update
 77 |         """
 78 |         return 1.
 79 | 
 80 | 
 81 | class ConstantLR(_LRSchedule):
 82 |     def get_lr_(self, progress):
 83 |         return 1.
 84 | 
 85 | 
 86 | class WarmupCosineSchedule(_LRSchedule):
 87 |     """
 88 |     Linearly increases learning rate from 0 to 1 over `warmup` fraction of training steps.
 89 |     Decreases learning rate from 1. to 0. over remaining `1 - warmup` steps following a cosine curve.
 90 |     If `cycles` (default=0.5) is different from default, learning rate follows cosine function after warmup.
 91 |     """
 92 |     warn_t_total = True
 93 |     def __init__(self, warmup=0.002, t_total=-1, cycles=.5, **kw):
 94 |         """
 95 |         :param warmup:      see LRSchedule
 96 |         :param t_total:     see LRSchedule
 97 |         :param cycles:      number of cycles. Default: 0.5, corresponding to cosine decay from 1. at progress==warmup and 0 at progress==1.
 98 |         :param kw:
 99 |         """
100 |         super(WarmupCosineSchedule, self).__init__(warmup=warmup, t_total=t_total, **kw)
101 |         self.cycles = cycles
102 | 
103 |     def get_lr_(self, progress):
104 |         if progress < self.warmup:
105 |             return progress / self.warmup
106 |         else:
107 |             progress = (progress - self.warmup) / (1 - self.warmup)   # progress after warmup
108 |             return 0.5 * (1. + math.cos(math.pi * self.cycles * 2 * progress))
109 | 
110 | 
111 | class WarmupCosineWithHardRestartsSchedule(WarmupCosineSchedule):
112 |     """
113 |     Linearly increases learning rate from 0 to 1 over `warmup` fraction of training steps.
114 |     If `cycles` (default=1.) is different from default, learning rate follows `cycles` times a cosine decaying
115 |     learning rate (with hard restarts).
116 |     """
117 |     def __init__(self, warmup=0.002, t_total=-1, cycles=1., **kw):
118 |         super(WarmupCosineWithHardRestartsSchedule, self).__init__(warmup=warmup, t_total=t_total, cycles=cycles, **kw)
119 |         assert(cycles >= 1.)
120 | 
121 |     def get_lr_(self, progress):
122 |         if progress < self.warmup:
123 |             return progress / self.warmup
124 |         else:
125 |             progress = (progress - self.warmup) / (1 - self.warmup)     # progress after warmup
126 |             ret = 0.5 * (1. + math.cos(math.pi * ((self.cycles * progress) % 1)))
127 |             return ret
128 | 
129 | 
130 | class WarmupCosineWithWarmupRestartsSchedule(WarmupCosineWithHardRestartsSchedule):
131 |     """
132 |     All training progress is divided in `cycles` (default=1.) parts of equal length.
133 |     Every part follows a schedule with the first `warmup` fraction of the training steps linearly increasing from 0. to 1.,
134 |     followed by a learning rate decreasing from 1. to 0. following a cosine curve.
135 |     """
136 |     def __init__(self, warmup=0.002, t_total=-1, cycles=1., **kw):
137 |         assert(warmup * cycles < 1.)
138 |         warmup = warmup * cycles if warmup >= 0 else warmup
139 |         super(WarmupCosineWithWarmupRestartsSchedule, self).__init__(warmup=warmup, t_total=t_total, cycles=cycles, **kw)
140 | 
141 |     def get_lr_(self, progress):
142 |         progress = progress * self.cycles % 1.
143 |         if progress < self.warmup:
144 |             return progress / self.warmup
145 |         else:
146 |             progress = (progress - self.warmup) / (1 - self.warmup)     # progress after warmup
147 |             ret = 0.5 * (1. + math.cos(math.pi * progress))
148 |             return ret
149 | 
150 | 
151 | class WarmupConstantSchedule(_LRSchedule):
152 |     """
153 |     Linearly increases learning rate from 0 to 1 over `warmup` fraction of training steps.
154 |     Keeps learning rate equal to 1. after warmup.
155 |     """
156 |     def get_lr_(self, progress):
157 |         if progress < self.warmup:
158 |             return progress / self.warmup
159 |         return 1.
160 | 
161 | 
162 | class WarmupLinearSchedule(_LRSchedule):
163 |     """
164 |     Linearly increases learning rate from 0 to 1 over `warmup` fraction of training steps.
165 |     Linearly decreases learning rate from 1. to 0. over remaining `1 - warmup` steps.
166 |     """
167 |     warn_t_total = True
168 |     def get_lr_(self, progress):
169 |         if progress < self.warmup:
170 |             return progress / self.warmup
171 |         return max((progress - 1.) / (self.warmup - 1.), 0.)
172 | 
173 | 
174 | SCHEDULES = {
175 |     None:       ConstantLR,
176 |     "none":     ConstantLR,
177 |     "warmup_cosine": WarmupCosineSchedule,
178 |     "warmup_constant": WarmupConstantSchedule,
179 |     "warmup_linear": WarmupLinearSchedule
180 | }
181 | 
182 | 
183 | class BertAdam(Optimizer):
184 |     """Implements BERT version of Adam algorithm with weight decay fix.
185 |     Params:
186 |         lr: learning rate
187 |         warmup: portion of t_total for the warmup, -1  means no warmup. Default: -1
188 |         t_total: total number of training steps for the learning
189 |             rate schedule, -1  means constant learning rate of 1. (no warmup regardless of warmup setting). Default: -1
190 |         schedule: schedule to use for the warmup (see above).
191 |             Can be `'warmup_linear'`, `'warmup_constant'`, `'warmup_cosine'`, `'none'`, `None` or a `_LRSchedule` object (see below).
192 |             If `None` or `'none'`, learning rate is always kept constant.
193 |             Default : `'warmup_linear'`
194 |         b1: Adams b1. Default: 0.9
195 |         b2: Adams b2. Default: 0.999
196 |         e: Adams epsilon. Default: 1e-6
197 |         weight_decay: Weight decay. Default: 0.01
198 |         max_grad_norm: Maximum norm for the gradients (-1 means no clipping). Default: 1.0
199 |     """
200 |     def __init__(self, params, lr=required, warmup=-1, t_total=-1, schedule='warmup_linear',
201 |                  b1=0.9, b2=0.999, e=1e-6, weight_decay=0.01, max_grad_norm=1.0, **kwargs):
202 |         if lr is not required and lr < 0.0:
203 |             raise ValueError("Invalid learning rate: {} - should be >= 0.0".format(lr))
204 |         if not isinstance(schedule, _LRSchedule) and schedule not in SCHEDULES:
205 |             raise ValueError("Invalid schedule parameter: {}".format(schedule))
206 |         if not 0.0 <= b1 < 1.0:
207 |             raise ValueError("Invalid b1 parameter: {} - should be in [0.0, 1.0[".format(b1))
208 |         if not 0.0 <= b2 < 1.0:
209 |             raise ValueError("Invalid b2 parameter: {} - should be in [0.0, 1.0[".format(b2))
210 |         if not e >= 0.0:
211 |             raise ValueError("Invalid epsilon value: {} - should be >= 0.0".format(e))
212 |         # initialize schedule object
213 |         if not isinstance(schedule, _LRSchedule):
214 |             schedule_type = SCHEDULES[schedule]
215 |             schedule = schedule_type(warmup=warmup, t_total=t_total)
216 |         else:
217 |             if warmup != -1 or t_total != -1:
218 |                 logger.warning("warmup and t_total on the optimizer are ineffective when _LRSchedule object is provided as schedule. "
219 |                                "Please specify custom warmup and t_total in _LRSchedule object.")
220 |         defaults = dict(lr=lr, schedule=schedule,
221 |                         b1=b1, b2=b2, e=e, weight_decay=weight_decay,
222 |                         max_grad_norm=max_grad_norm)
223 |         super(BertAdam, self).__init__(params, defaults)
224 | 
225 |     def get_lr(self):
226 |         lr = []
227 |         for group in self.param_groups:
228 |             for p in group['params']:
229 |                 state = self.state[p]
230 |                 if len(state) == 0:
231 |                     return [0]
232 |                 lr_scheduled = group['lr']
233 |                 lr_scheduled *= group['schedule'].get_lr(state['step'])
234 |                 lr.append(lr_scheduled)
235 |         return lr
236 | 
237 |     def step(self, closure=None):
238 |         """Performs a single optimization step.
239 | 
240 |         Arguments:
241 |             closure (callable, optional): A closure that reevaluates the model
242 |                 and returns the loss.
243 |         """
244 |         loss = None
245 |         if closure is not None:
246 |             loss = closure()
247 | 
248 |         for group in self.param_groups:
249 |             for p in group['params']:
250 |                 if p.grad is None:
251 |                     continue
252 |                 grad = p.grad.data
253 |                 if grad.is_sparse:
254 |                     raise RuntimeError('Adam does not support sparse gradients, please consider SparseAdam instead')
255 | 
256 |                 state = self.state[p]
257 | 
258 |                 # State initialization
259 |                 if len(state) == 0:
260 |                     state['step'] = 0
261 |                     # Exponential moving average of gradient values
262 |                     state['next_m'] = torch.zeros_like(p.data)
263 |                     # Exponential moving average of squared gradient values
264 |                     state['next_v'] = torch.zeros_like(p.data)
265 | 
266 |                 next_m, next_v = state['next_m'], state['next_v']
267 |                 beta1, beta2 = group['b1'], group['b2']
268 | 
269 |                 # Add grad clipping
270 |                 if group['max_grad_norm'] > 0:
271 |                     clip_grad_norm_(p, group['max_grad_norm'])
272 | 
273 |                 # Decay the first and second moment running average coefficient
274 |                 # In-place operations to update the averages at the same time
275 |                 next_m.mul_(beta1).add_(1 - beta1, grad)
276 |                 next_v.mul_(beta2).addcmul_(1 - beta2, grad, grad)
277 |                 update = next_m / (next_v.sqrt() + group['e'])
278 | 
279 |                 # Just adding the square of the weights to the loss function is *not*
280 |                 # the correct way of using L2 regularization/weight decay with Adam,
281 |                 # since that will interact with the m and v parameters in strange ways.
282 |                 #
283 |                 # Instead we want to decay the weights in a manner that doesn't interact
284 |                 # with the m/v parameters. This is equivalent to adding the square
285 |                 # of the weights to the loss with plain (non-momentum) SGD.
286 |                 if group['weight_decay'] > 0.0:
287 |                     update += group['weight_decay'] * p.data
288 | 
289 |                 lr_scheduled = group['lr']
290 |                 lr_scheduled *= group['schedule'].get_lr(state['step'])
291 | 
292 |                 update_with_lr = lr_scheduled * update
293 |                 p.data.add_(-update_with_lr)
294 | 
295 |                 state['step'] += 1
296 | 
297 |                 # step_size = lr_scheduled * math.sqrt(bias_correction2) / bias_correction1
298 |                 # No bias correction
299 |                 # bias_correction1 = 1 - beta1 ** state['step']
300 |                 # bias_correction2 = 1 - beta2 ** state['step']
301 | 
302 |         return loss
303 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/bert_utils/tokenization.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
  3 | #
  4 | # Licensed under the Apache License, Version 2.0 (the "License");
  5 | # you may not use this file except in compliance with the License.
  6 | # You may obtain a copy of the License at
  7 | #
  8 | #     http://www.apache.org/licenses/LICENSE-2.0
  9 | #
 10 | # Unless required by applicable law or agreed to in writing, software
 11 | # distributed under the License is distributed on an "AS IS" BASIS,
 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | # See the License for the specific language governing permissions and
 14 | # limitations under the License.
 15 | """Tokenization classes."""
 16 | 
 17 | from __future__ import absolute_import, division, print_function, unicode_literals
 18 | 
 19 | import collections
 20 | import logging
 21 | import os
 22 | import unicodedata
 23 | from io import open
 24 | 
 25 | from .file_utils import cached_path
 26 | 
 27 | logger = logging.getLogger(__name__)
 28 | 
 29 | PRETRAINED_VOCAB_ARCHIVE_MAP = {
 30 |     'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt",
 31 |     'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",
 32 |     'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt",
 33 |     'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt",
 34 |     'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt",
 35 |     'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt",
 36 |     'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt",
 37 |     'bert-base-german-cased': "https://int-deepset-models-bert.s3.eu-central-1.amazonaws.com/pytorch/bert-base-german-cased-vocab.txt",
 38 |     'bert-large-uncased-whole-word-masking': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-vocab.txt",
 39 |     'bert-large-cased-whole-word-masking': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-whole-word-masking-vocab.txt",
 40 |     'bert-large-uncased-whole-word-masking-finetuned-squad': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-finetuned-squad-vocab.txt",
 41 |     'bert-large-cased-whole-word-masking-finetuned-squad': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-whole-word-masking-finetuned-squad-vocab.txt",
 42 |     'bert-base-cased-finetuned-mrpc': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-mrpc-vocab.txt",
 43 | }
 44 | PRETRAINED_VOCAB_POSITIONAL_EMBEDDINGS_SIZE_MAP = {
 45 |     'bert-base-uncased': 512,
 46 |     'bert-large-uncased': 512,
 47 |     'bert-base-cased': 512,
 48 |     'bert-large-cased': 512,
 49 |     'bert-base-multilingual-uncased': 512,
 50 |     'bert-base-multilingual-cased': 512,
 51 |     'bert-base-chinese': 512,
 52 |     'bert-base-german-cased': 512,
 53 |     'bert-large-uncased-whole-word-masking': 512,
 54 |     'bert-large-cased-whole-word-masking': 512,
 55 |     'bert-large-uncased-whole-word-masking-finetuned-squad': 512,
 56 |     'bert-large-cased-whole-word-masking-finetuned-squad': 512,
 57 |     'bert-base-cased-finetuned-mrpc': 512,
 58 | }
 59 | VOCAB_NAME = 'vocab.txt'
 60 | 
 61 | 
 62 | def load_vocab(vocab_file):
 63 |     """Loads a vocabulary file into a dictionary."""
 64 |     vocab = collections.OrderedDict()
 65 |     index = 0
 66 |     with open(vocab_file, "r", encoding="utf-8") as reader:
 67 |         while True:
 68 |             token = reader.readline()
 69 |             if not token:
 70 |                 break
 71 |             token = token.strip()
 72 |             vocab[token] = index
 73 |             index += 1
 74 |     return vocab
 75 | 
 76 | 
 77 | def whitespace_tokenize(text):
 78 |     """Runs basic whitespace cleaning and splitting on a piece of text."""
 79 |     text = text.strip()
 80 |     if not text:
 81 |         return []
 82 |     tokens = text.split()
 83 |     return tokens
 84 | 
 85 | 
 86 | class BertTokenizer(object):
 87 |     """Runs end-to-end tokenization: punctuation splitting + wordpiece"""
 88 | 
 89 |     def __init__(self, vocab_file, do_lower_case=True, max_len=None, do_basic_tokenize=True,
 90 |                  never_split=("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]")):
 91 |         """Constructs a BertTokenizer.
 92 | 
 93 |         Args:
 94 |           vocab_file: Path to a one-wordpiece-per-line vocabulary file
 95 |           do_lower_case: Whether to lower case the input
 96 |                          Only has an effect when do_wordpiece_only=False
 97 |           do_basic_tokenize: Whether to do basic tokenization before wordpiece.
 98 |           max_len: An artificial maximum length to truncate tokenized sequences to;
 99 |                          Effective maximum length is always the minimum of this
100 |                          value (if specified) and the underlying BERT model's
101 |                          sequence length.
102 |           never_split: List of tokens which will never be split during tokenization.
103 |                          Only has an effect when do_wordpiece_only=False
104 |         """
105 |         if not os.path.isfile(vocab_file):
106 |             raise ValueError(
107 |                 "Can't find a vocabulary file at path '{}'. To load the vocabulary from a Google pretrained "
108 |                 "model use `tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`".format(vocab_file))
109 |         self.vocab = load_vocab(vocab_file)
110 |         self.ids_to_tokens = collections.OrderedDict(
111 |             [(ids, tok) for tok, ids in self.vocab.items()])
112 |         self.do_basic_tokenize = do_basic_tokenize
113 |         if do_basic_tokenize:
114 |           self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case,
115 |                                                 never_split=never_split)
116 |         self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab)
117 |         self.max_len = max_len if max_len is not None else int(1e12)
118 | 
119 |     def tokenize(self, text):
120 |         split_tokens = []
121 |         if self.do_basic_tokenize:
122 |             for token in self.basic_tokenizer.tokenize(text):
123 |                 for sub_token in self.wordpiece_tokenizer.tokenize(token):
124 |                     split_tokens.append(sub_token)
125 |         else:
126 |             split_tokens = self.wordpiece_tokenizer.tokenize(text)
127 |         return split_tokens
128 | 
129 |     def convert_tokens_to_ids(self, tokens):
130 |         """Converts a sequence of tokens into ids using the vocab."""
131 |         ids = []
132 |         for token in tokens:
133 |             ids.append(self.vocab[token])
134 |         if len(ids) > self.max_len:
135 |             logger.warning(
136 |                 "Token indices sequence length is longer than the specified maximum "
137 |                 " sequence length for this BERT model ({} > {}). Running this"
138 |                 " sequence through BERT will result in indexing errors".format(len(ids), self.max_len)
139 |             )
140 |         return ids
141 | 
142 |     def convert_ids_to_tokens(self, ids):
143 |         """Converts a sequence of ids in wordpiece tokens using the vocab."""
144 |         tokens = []
145 |         for i in ids:
146 |             tokens.append(self.ids_to_tokens[i])
147 |         return tokens
148 | 
149 |     def save_vocabulary(self, vocab_path):
150 |         """Save the tokenizer vocabulary to a directory or file."""
151 |         index = 0
152 |         if os.path.isdir(vocab_path):
153 |             vocab_file = os.path.join(vocab_path, VOCAB_NAME)
154 |         with open(vocab_file, "w", encoding="utf-8") as writer:
155 |             for token, token_index in sorted(self.vocab.items(), key=lambda kv: kv[1]):
156 |                 if index != token_index:
157 |                     logger.warning("Saving vocabulary to {}: vocabulary indices are not consecutive."
158 |                                    " Please check that the vocabulary is not corrupted!".format(vocab_file))
159 |                     index = token_index
160 |                 writer.write(token + u'\n')
161 |                 index += 1
162 |         return vocab_file
163 | 
164 |     @classmethod
165 |     def from_pretrained(cls, pretrained_model_name_or_path, cache_dir=None, *inputs, **kwargs):
166 |         """
167 |         Instantiate a PreTrainedBertModel from a pre-trained model file.
168 |         Download and cache the pre-trained model file if needed.
169 |         """
170 |         if pretrained_model_name_or_path in PRETRAINED_VOCAB_ARCHIVE_MAP:
171 |             vocab_file = PRETRAINED_VOCAB_ARCHIVE_MAP[pretrained_model_name_or_path]
172 |             if '-cased' in pretrained_model_name_or_path and kwargs.get('do_lower_case', True):
173 |                 logger.warning("The pre-trained model you are loading is a cased model but you have not set "
174 |                                "`do_lower_case` to False. We are setting `do_lower_case=False` for you but "
175 |                                "you may want to check this behavior.")
176 |                 kwargs['do_lower_case'] = False
177 |             elif '-cased' not in pretrained_model_name_or_path and not kwargs.get('do_lower_case', True):
178 |                 logger.warning("The pre-trained model you are loading is an uncased model but you have set "
179 |                                "`do_lower_case` to False. We are setting `do_lower_case=True` for you "
180 |                                "but you may want to check this behavior.")
181 |                 kwargs['do_lower_case'] = True
182 |         else:
183 |             vocab_file = pretrained_model_name_or_path
184 |         if os.path.isdir(vocab_file):
185 |             vocab_file = os.path.join(vocab_file, VOCAB_NAME)
186 |         # redirect to the cache, if necessary
187 |         try:
188 |             resolved_vocab_file = cached_path(vocab_file, cache_dir=cache_dir)
189 |         except EnvironmentError:
190 |             if pretrained_model_name_or_path in PRETRAINED_VOCAB_ARCHIVE_MAP:
191 |                 logger.error(
192 |                     "Couldn't reach server at '{}' to download vocabulary.".format(
193 |                         vocab_file))
194 |             else:
195 |                 logger.error(
196 |                     "Model name '{}' was not found in model name list ({}). "
197 |                     "We assumed '{}' was a path or url but couldn't find any file "
198 |                     "associated to this path or url.".format(
199 |                         pretrained_model_name_or_path,
200 |                         ', '.join(PRETRAINED_VOCAB_ARCHIVE_MAP.keys()),
201 |                         vocab_file))
202 |             return None
203 |         if resolved_vocab_file == vocab_file:
204 |             logger.info("loading vocabulary file {}".format(vocab_file))
205 |         else:
206 |             logger.info("loading vocabulary file {} from cache at {}".format(
207 |                 vocab_file, resolved_vocab_file))
208 |         if pretrained_model_name_or_path in PRETRAINED_VOCAB_POSITIONAL_EMBEDDINGS_SIZE_MAP:
209 |             # if we're using a pretrained model, ensure the tokenizer wont index sequences longer
210 |             # than the number of positional embeddings
211 |             max_len = PRETRAINED_VOCAB_POSITIONAL_EMBEDDINGS_SIZE_MAP[pretrained_model_name_or_path]
212 |             kwargs['max_len'] = min(kwargs.get('max_len', int(1e12)), max_len)
213 |         # Instantiate tokenizer.
214 |         tokenizer = cls(resolved_vocab_file, *inputs, **kwargs)
215 |         return tokenizer
216 | 
217 | 
218 | class BasicTokenizer(object):
219 |     """Runs basic tokenization (punctuation splitting, lower casing, etc.)."""
220 | 
221 |     def __init__(self,
222 |                  do_lower_case=True,
223 |                  never_split=("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]")):
224 |         """Constructs a BasicTokenizer.
225 | 
226 |         Args:
227 |           do_lower_case: Whether to lower case the input.
228 |         """
229 |         self.do_lower_case = do_lower_case
230 |         self.never_split = never_split
231 | 
232 |     def tokenize(self, text):
233 |         """Tokenizes a piece of text."""
234 |         text = self._clean_text(text)
235 |         # This was added on November 1st, 2018 for the multilingual and Chinese
236 |         # models. This is also applied to the English models now, but it doesn't
237 |         # matter since the English models were not trained on any Chinese data
238 |         # and generally don't have any Chinese data in them (there are Chinese
239 |         # characters in the vocabulary because Wikipedia does have some Chinese
240 |         # words in the English Wikipedia.).
241 |         text = self._tokenize_chinese_chars(text)
242 |         orig_tokens = whitespace_tokenize(text)
243 |         split_tokens = []
244 |         for token in orig_tokens:
245 |             if self.do_lower_case and token not in self.never_split:
246 |                 token = token.lower()
247 |                 token = self._run_strip_accents(token)
248 |             split_tokens.extend(self._run_split_on_punc(token))
249 | 
250 |         output_tokens = whitespace_tokenize(" ".join(split_tokens))
251 |         return output_tokens
252 | 
253 |     def _run_strip_accents(self, text):
254 |         """Strips accents from a piece of text."""
255 |         text = unicodedata.normalize("NFD", text)
256 |         output = []
257 |         for char in text:
258 |             cat = unicodedata.category(char)
259 |             if cat == "Mn":
260 |                 continue
261 |             output.append(char)
262 |         return "".join(output)
263 | 
264 |     def _run_split_on_punc(self, text):
265 |         """Splits punctuation on a piece of text."""
266 |         if text in self.never_split:
267 |             return [text]
268 |         chars = list(text)
269 |         i = 0
270 |         start_new_word = True
271 |         output = []
272 |         while i < len(chars):
273 |             char = chars[i]
274 |             if _is_punctuation(char):
275 |                 output.append([char])
276 |                 start_new_word = True
277 |             else:
278 |                 if start_new_word:
279 |                     output.append([])
280 |                 start_new_word = False
281 |                 output[-1].append(char)
282 |             i += 1
283 | 
284 |         return ["".join(x) for x in output]
285 | 
286 |     def _tokenize_chinese_chars(self, text):
287 |         """Adds whitespace around any CJK character."""
288 |         output = []
289 |         for char in text:
290 |             cp = ord(char)
291 |             if self._is_chinese_char(cp):
292 |                 output.append(" ")
293 |                 output.append(char)
294 |                 output.append(" ")
295 |             else:
296 |                 output.append(char)
297 |         return "".join(output)
298 | 
299 |     def _is_chinese_char(self, cp):
300 |         """Checks whether CP is the codepoint of a CJK character."""
301 |         # This defines a "chinese character" as anything in the CJK Unicode block:
302 |         #   https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block)
303 |         #
304 |         # Note that the CJK Unicode block is NOT all Japanese and Korean characters,
305 |         # despite its name. The modern Korean Hangul alphabet is a different block,
306 |         # as is Japanese Hiragana and Katakana. Those alphabets are used to write
307 |         # space-separated words, so they are not treated specially and handled
308 |         # like the all of the other languages.
309 |         if ((cp >= 0x4E00 and cp <= 0x9FFF) or  #
310 |                 (cp >= 0x3400 and cp <= 0x4DBF) or  #
311 |                 (cp >= 0x20000 and cp <= 0x2A6DF) or  #
312 |                 (cp >= 0x2A700 and cp <= 0x2B73F) or  #
313 |                 (cp >= 0x2B740 and cp <= 0x2B81F) or  #
314 |                 (cp >= 0x2B820 and cp <= 0x2CEAF) or
315 |                 (cp >= 0xF900 and cp <= 0xFAFF) or  #
316 |                 (cp >= 0x2F800 and cp <= 0x2FA1F)):  #
317 |             return True
318 | 
319 |         return False
320 | 
321 |     def _clean_text(self, text):
322 |         """Performs invalid character removal and whitespace cleanup on text."""
323 |         output = []
324 |         for char in text:
325 |             cp = ord(char)
326 |             if cp == 0 or cp == 0xfffd or _is_control(char):
327 |                 continue
328 |             if _is_whitespace(char):
329 |                 output.append(" ")
330 |             else:
331 |                 output.append(char)
332 |         return "".join(output)
333 | 
334 | 
335 | class WordpieceTokenizer(object):
336 |     """Runs WordPiece tokenization."""
337 | 
338 |     def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=100):
339 |         self.vocab = vocab
340 |         self.unk_token = unk_token
341 |         self.max_input_chars_per_word = max_input_chars_per_word
342 | 
343 |     def tokenize(self, text):
344 |         """Tokenizes a piece of text into its word pieces.
345 | 
346 |         This uses a greedy longest-match-first algorithm to perform tokenization
347 |         using the given vocabulary.
348 | 
349 |         For example:
350 |           input = "unaffable"
351 |           output = ["un", "##aff", "##able"]
352 | 
353 |         Args:
354 |           text: A single token or whitespace separated tokens. This should have
355 |             already been passed through `BasicTokenizer`.
356 | 
357 |         Returns:
358 |           A list of wordpiece tokens.
359 |         """
360 | 
361 |         output_tokens = []
362 |         for token in whitespace_tokenize(text):
363 |             chars = list(token)
364 |             if len(chars) > self.max_input_chars_per_word:
365 |                 output_tokens.append(self.unk_token)
366 |                 continue
367 | 
368 |             is_bad = False
369 |             start = 0
370 |             sub_tokens = []
371 |             while start < len(chars):
372 |                 end = len(chars)
373 |                 cur_substr = None
374 |                 while start < end:
375 |                     substr = "".join(chars[start:end])
376 |                     if start > 0:
377 |                         substr = "##" + substr
378 |                     if substr in self.vocab:
379 |                         cur_substr = substr
380 |                         break
381 |                     end -= 1
382 |                 if cur_substr is None:
383 |                     is_bad = True
384 |                     break
385 |                 sub_tokens.append(cur_substr)
386 |                 start = end
387 | 
388 |             if is_bad:
389 |                 output_tokens.append(self.unk_token)
390 |             else:
391 |                 output_tokens.extend(sub_tokens)
392 |         return output_tokens
393 | 
394 | 
395 | def _is_whitespace(char):
396 |     """Checks whether `chars` is a whitespace character."""
397 |     # \t, \n, and \r are technically contorl characters but we treat them
398 |     # as whitespace since they are generally considered as such.
399 |     if char == " " or char == "\t" or char == "\n" or char == "\r":
400 |         return True
401 |     cat = unicodedata.category(char)
402 |     if cat == "Zs":
403 |         return True
404 |     return False
405 | 
406 | 
407 | def _is_control(char):
408 |     """Checks whether `chars` is a control character."""
409 |     # These are technically control characters but we count them as whitespace
410 |     # characters.
411 |     if char == "\t" or char == "\n" or char == "\r":
412 |         return False
413 |     cat = unicodedata.category(char)
414 |     if cat.startswith("C"):
415 |         return True
416 |     return False
417 | 
418 | 
419 | def _is_punctuation(char):
420 |     """Checks whether `chars` is a punctuation character."""
421 |     cp = ord(char)
422 |     # We treat all non-letter/number ASCII as punctuation.
423 |     # Characters such as "^", "$", and "`" are not in the Unicode
424 |     # Punctuation class but we treat them as punctuation anyways, for
425 |     # consistency.
426 |     if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or
427 |             (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)):
428 |         return True
429 |     cat = unicodedata.category(char)
430 |     if cat.startswith("P"):
431 |         return True
432 |     return False
433 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/dataset_utils.py:
--------------------------------------------------------------------------------
 1 | #coding=utf-8
 2 | 
 3 | import codecs as cs
 4 | import os
 5 | import sys
 6 | import pdb
 7 | sys.path.insert(0, '/mnt/nfs-storage-titan/BERT/pytorch_pretrained_BERT')
 8 | from pytorch_pretrained_bert.tokenization import BertTokenizer
 9 | 
10 | def read_pair_gold(f, args):
11 |     # key: text + aspect span + opinion span; value: corresponding category-sentiment type number
12 | 
13 |     tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)
14 |     quad_text = []
15 |     quad_gold = []
16 |     for line in f:
17 |         cur_quad_gold = [[]]
18 |         line = line.strip().split('\t')
19 |         text = line[0].split('####')[0]
20 |         text = text.split(' ')
21 |         cur_text = tokenizer.convert_tokens_to_ids(text)
22 |         # while len(cur_text) < args.max_seq_length:
23 |         #     cur_text.append(0)
24 |         quad_text.append(cur_text)
25 |         cur_quad_gold[0].append(line[0].split('####')[1])
26 |         for ele in line[1:]:
27 |             if ele not in cur_quad_gold[0]:
28 |                 cur_quad_gold[0].append(ele)
29 |         quad_gold += cur_quad_gold
30 |     return quad_text, quad_gold
31 | 
32 | 
33 | def read_triplet_gold(f, args):
34 |     # key: text + aspect span + opinion span + sentiment type; value: corresponding category type number
35 |     tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)
36 |     quad_text = []
37 |     quad_gold = []
38 |     for line in f:
39 |         cur_quad_gold = [[]]
40 |         line = line.strip().split('\t')
41 |         text = line[0].split('####')[0]
42 |         text = text.split(' ')
43 |         cur_text = tokenizer.convert_tokens_to_ids(text)
44 |         # while len(cur_text) < args.max_seq_length:
45 |         #     cur_text.append(0)
46 |         quad_text.append(cur_text)
47 |         cur_quad_gold[0].append(line[0].split('####')[1])
48 |         for ele in line[1:]:
49 |             if ele not in cur_quad_gold[0]:
50 |                 cur_quad_gold[0].append(ele)
51 |         quad_gold += cur_quad_gold
52 | 
53 |     return quad_text, quad_gold


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/eval_metrics.py:
--------------------------------------------------------------------------------
  1 | #coding=utf-8
  2 | 
  3 | from __future__ import absolute_import, division, print_function
  4 | 
  5 | import argparse
  6 | import logging
  7 | import os
  8 | import sys
  9 | import random
 10 | from tqdm import tqdm, trange
 11 | import pdb
 12 | import warnings
 13 | import codecs as cs
 14 | import copy
 15 | import re
 16 | # warnings.filterwarnings('ignore')
 17 | 
 18 | import numpy as np
 19 | 
 20 | import torch
 21 | from torch.nn import CrossEntropyLoss, MSELoss, MultiLabelSoftMarginLoss, BCEWithLogitsLoss
 22 | 
 23 | from run_classifier_dataset_utils import compute_metrics
 24 | 
 25 | def measureQuad(pred, gold):
 26 |     tp = .0
 27 |     fp = .0
 28 |     fn = .0
 29 |     for text in pred:
 30 |         cnt = 0
 31 |         if text in gold:
 32 |             for pair in pred[text]:
 33 |                 if pair in gold[text]:
 34 |                     cnt += 1
 35 |         tp += cnt
 36 |         fp += len(pred[text])-cnt
 37 |         if text in gold:
 38 |             fn += len(gold[text])-cnt
 39 |     for text in gold:
 40 |         if text not in pred:
 41 |             fn += len(gold[text])
 42 | 
 43 |     print("tp: {}. fp: {}. fn: {}.".format(tp, fp, fn))
 44 |     p = 0 if tp + fp == 0 else 1.*tp / (tp + fp)
 45 |     r = 0 if tp + fn == 0 else 1.*tp / (tp + fn)
 46 |     f = 0 if p + r == 0 else 2 * p * r / (p + r)
 47 |     return {'precision':p, 'recall':r, 'micro-F1':f}
 48 | 
 49 | def pred_eval(_e, args, logger, tokenizer, model, dataloader, eval_gold, label_list, device, task_name, eval_type='valid'):
 50 |     
 51 |     preds = {}
 52 |     golds = {}
 53 |     ids_to_token = {}
 54 |     pred_aspect_tag = []
 55 |     pred_imp_aspect = []
 56 |     pred_imp_opinion = []
 57 |     input_text, pairgold = eval_gold
 58 |     _all_tokens_len = []
 59 |     input_length_map = {}
 60 |     entity_label = r'32*'
 61 |     opinion_entity_label = r'54*'
 62 |     label_map_seq = {label : i for i, label in enumerate(label_list[1])}
 63 | 
 64 |     for index in range(0, len(pairgold), 3):
 65 |         cur_quad = pairgold[index]
 66 |         gold_imp_aspect = pairgold[index+1]
 67 |         gold_imp_opinion = pairgold[index+2]
 68 |         gold_tag = []
 69 |         cur_aspect_tag = ''.join(str(ele) for ele in cur_quad)
 70 |         max_len = len(cur_aspect_tag)
 71 |         for ele in re.finditer(entity_label, cur_aspect_tag):
 72 |             gold_tag.append('a-' + str(ele.start()) + ',' + str(ele.end()))
 73 |         if gold_imp_aspect == 1:
 74 |             gold_tag.append('a--1,-1')
 75 |     
 76 |         for ele in re.finditer(opinion_entity_label, cur_aspect_tag):
 77 |             gold_tag.append('o-' + str(ele.start()) + ',' + str(ele.end()))
 78 |         if gold_imp_opinion == 1:
 79 |             gold_tag.append('o--1,-1')
 80 |         
 81 |         cur_input = ' '.join(str(ele) for ele in input_text[index//3])
 82 |         golds[cur_input] = gold_tag
 83 |         ids_to_token[cur_input] = ' '.join(ele for ele in tokenizer.convert_ids_to_tokens(input_text[index//3]))
 84 | 
 85 |     for step, batch in enumerate(dataloader):
 86 | 
 87 |         if step % 500 == 0 and step>0:
 88 |             print(step)
 89 | 
 90 |         _all_tokens_len += batch[0].numpy().tolist()
 91 |         batch = tuple(t.to(device) for t in batch)
 92 |         _tokens_len, _aspect_input_ids, _aspect_input_mask, _aspect_ids, _aspect_segment_ids, \
 93 |                 _exist_imp_aspect, _exist_imp_opinion = batch
 94 | 
 95 |         with torch.no_grad():
 96 |             _, logits = model(aspect_input_ids=_aspect_input_ids, aspect_labels=_aspect_ids,
 97 |                 aspect_token_type_ids=_aspect_segment_ids, aspect_attention_mask=_aspect_input_mask,
 98 |                 exist_imp_aspect=_exist_imp_aspect, exist_imp_opinion=_exist_imp_opinion)
 99 | 
100 |             # input '[CLS] text [SEP] category/sentiment [SEP]', obtain '[CLS] text' only, first '[SEP]' is used to predict
101 |             # the existence of implicit aspect or opinion.
102 | 
103 |             logits_imp_aspect = np.argmax(logits[1].detach().cpu().numpy(), axis=-1).tolist()
104 |             logits_imp_opinion = np.argmax(logits[2].detach().cpu().numpy(), axis=-1).tolist()
105 |             for i, ele in enumerate(logits[0]):
106 |                 pred_aspect_tag.append(ele)
107 |             for i, ele in enumerate(logits_imp_aspect):
108 |                 pred_imp_aspect.append(ele)
109 |             for i, ele in enumerate(logits_imp_opinion):
110 |                 pred_imp_opinion.append(ele)
111 | 
112 |     for i in range(len(pred_aspect_tag)):
113 |         cur_aspect_tag = ''.join(str(ele) for ele in pred_aspect_tag[i])
114 |         pred_tag = []
115 |         for ele in re.finditer(entity_label, cur_aspect_tag):
116 |             pred_tag.append('a-'+str(ele.start()-1) + ',' + str(ele.end()-1))
117 |         if pred_imp_aspect[i] == 1:
118 |             pred_tag.append('a--1,-1')
119 |         
120 |         for ele in re.finditer(opinion_entity_label, cur_aspect_tag):
121 |             pred_tag.append('o-'+str(ele.start()-1) + ',' + str(ele.end()-1))
122 |         if pred_imp_opinion[i] == 1:
123 |             pred_tag.append('o--1,-1')
124 | 
125 |         cur_input = ' '.join(str(ele) for ele in input_text[i])
126 |         preds[cur_input] = pred_tag
127 |         input_length_map[cur_input] = _all_tokens_len[i]
128 |     
129 |     res = measureQuad(preds, golds)
130 |     if eval_type == 'valid':
131 |         pipeline_file = cs.open(args.output_dir+os.sep+'valid.txt', 'w')
132 |     else:
133 |         pipeline_file = cs.open(args.output_dir+os.sep+'pred4pipeline.txt', 'w')
134 |     for text in preds:
135 |         length = input_length_map[text]-1
136 |         cur_text = ids_to_token[text]
137 |         cur_text = cur_text.split(' ')[1:length]
138 |         if len(preds[text]) > 0:
139 |             pipeline_file.write(' '.join(ele for ele in cur_text)+'\t'+'\t'.join(ele for ele in preds[text])+'\n')
140 | 
141 |     if eval_type == 'valid':
142 |         logger.info("***** Eval results *****")
143 |         for key in sorted(res.keys()):
144 |             logger.info("  %s = %s", key, str(res[key]))
145 |         return res
146 | 
147 |     elif eval_type == 'test':
148 |         logger.info("***** Test results *****")
149 |         for key in sorted(res.keys()):
150 |             logger.info("  %s = %s", key, str(res[key]))
151 |         return res
152 | 
153 | 
154 | def getTextType(gold):
155 |     text_type = {}
156 |     for text in gold:
157 |         if text not in text_type:
158 |             text_type[text] = []
159 | 
160 |         for ele in gold[text]:
161 |             if 4 not in text_type[text]:
162 |                 text_type[text].append(4)
163 |             if '-1' not in ele[2] and '-1' not in ele[3]:
164 |                 if 0 not in text_type[text]:
165 |                     text_type[text].append(0)
166 |             elif '-1' in ele[2] and '-1' not in ele[3]:
167 |                 if 1 not in text_type[text]:
168 |                     text_type[text].append(1)
169 |             elif '-1' not in ele[2] and '-1' in ele[3]:
170 |                 if 2 not in text_type[text]:
171 |                     text_type[text].append(2)
172 |             elif '-1' in ele[2] and '-1' in ele[3]:
173 |                 if 3 not in text_type[text]:
174 |                     text_type[text].append(3)
175 |     
176 |     return text_type
177 | 
178 | def measureQuad_imp(pred, gold, text_type):
179 |     tp = [.0, .0, .0, .0, .0]
180 |     fp = [.0, .0, .0, .0, .0]
181 |     fn = [.0, .0, .0, .0, .0]
182 | 
183 |     # text_set = set()
184 |     # for text in gold:
185 |     #     text_set.add(text)
186 |     # for text in text_set:
187 |     #     for dt in text_type[text]:
188 |     #         cnt = 0
189 |     #         for ele in pred[text]:
190 |     #             if ele in gold[text]:
191 |     #                 cnt += 1
192 |     #         tp[dt] += cnt
193 |     #         fp[dt] += len(pred[text])-cnt
194 | 
195 |     #         for ele in gold[text]:
196 |     #             if ele not in pred[text]:
197 |     #                 fn[dt] += 1
198 | 
199 |     for text in pred:
200 |         for dt in text_type[text]:
201 |             cnt = 0
202 |             if text in gold:
203 |                 for pair in pred[text]:
204 |                     if pair in gold[text]:
205 |                         cnt += 1
206 |             tp[dt] += cnt
207 |             fp[dt] += len(pred[text])-cnt
208 |             if text in gold:
209 |                 fn[dt] += len(gold[text])-cnt
210 |     for text in gold:
211 |         for dt in text_type[text]:
212 |             if text not in pred:
213 |                 fn[dt] += len(gold[text])
214 | 
215 |     for i in range(5):
216 |         print("tp: {}. fp: {}. fn: {}.".format(tp[i], fp[i], fn[i]))
217 |         p = 0 if tp[i] + fp[i] == 0 else 1.*tp[i] / (tp[i] + fp[i])
218 |         r = 0 if tp[i] + fn[i] == 0 else 1.*tp[i] / (tp[i] + fn[i])
219 |         f = 0 if p + r == 0 else 2 * p * r / (p + r)
220 |         print(i, ': ', {'precision':p, 'recall':r, 'micro-F1':f})
221 |     return {'precision':p, 'recall':r, 'micro-F1':f}
222 | 
223 | def pair_eval(_e, args, logger, tokenizer, model, dataloader, gold, label_list, device, task_name, eval_type='valid'):
224 |     preds = {}
225 |     golds = {}
226 |     quad_preds = {}
227 |     quad_golds = {}
228 |     ids_to_token = {}
229 |     catesenti_dict = {i: label for i, label in enumerate(label_list[0])}
230 |     input_text, quadgold = gold
231 |     for index, cur_quad in enumerate(quadgold):
232 |         cur_input = ' '.join(str(ele) for ele in input_text[index])
233 |         cur_input = cur_input+' '+cur_quad[0]
234 |         golds[cur_input] = cur_quad[1:]
235 |         ori_text = ' '.join(ele for ele in tokenizer.convert_ids_to_tokens(input_text[index]))
236 |         ids_to_token[cur_input] = ori_text+' '+cur_quad[0]
237 | 
238 |         quad_pairs = []
239 |         for ele in cur_quad[1:]:
240 |             ele = ele.split('#')
241 |             cate = '#'.join(item for item in ele[:-1]); senti = ele[-1]
242 |             asp = cur_quad[0].split(' ')[0]; opi = cur_quad[0].split(' ')[1]
243 |             tmp_quad = [cate, senti, asp, opi]
244 |             if tmp_quad not in quad_pairs:
245 |                 quad_pairs.append(tmp_quad)
246 |         if ori_text in quad_golds:
247 |             quad_golds[ori_text] += quad_pairs
248 |         else:
249 |             quad_golds[ori_text] = quad_pairs
250 |     tmp_cnt = 0
251 |     for step, batch in enumerate(dataloader):
252 | 
253 |         batch = tuple(t.to(device) for t in batch)
254 |         _tokens_len, _aspect_input_ids, _aspect_input_mask, _aspect_segment_ids, _candidate_aspect, \
255 |         _candidate_opinion, _label_id = batch
256 | 
257 |         # define a new function to compute loss values for both output_modes
258 |         with torch.no_grad():
259 |             loss, logits = model(tokenizer, _e, aspect_input_ids=_aspect_input_ids,
260 |                     aspect_token_type_ids=_aspect_segment_ids, aspect_attention_mask=_aspect_input_mask,
261 |                     candidate_aspect=_candidate_aspect, candidate_opinion=_candidate_opinion, label_id=_label_id)
262 | 
263 |         logits = logits[0].detach().cpu().numpy()
264 |         # pair_matrix = logits[0].view(len(_tokens_len), logits[1].item(), logits[1].item(), 3).detach().cpu().numpy()
265 | 
266 |         for i in range(len(_tokens_len)):
267 |             #得到输入文本作为key，相应的类别预测结果作为value
268 |             aspect_len = _aspect_input_mask[i].detach().cpu().numpy().sum()
269 |             aspect_tags = _candidate_aspect[i].detach().cpu().numpy()
270 |             opinion_tags = _candidate_opinion[i].detach().cpu().numpy()
271 |             entity_label = r'11*'
272 | 
273 |             aspect_labels = ''.join(str(ele) for ele in aspect_tags)
274 |             cur_aspect = []
275 |             for ele in re.finditer(entity_label, aspect_labels):
276 |                 # if (ele.end()-ele.start())<_tokens_len[i]-2:
277 |                 if ele.start() == 0 and '-1,-1' not in cur_aspect:
278 |                     cur_aspect.append('-1,-1')
279 |                 elif (ele.start() > 0 and ele.end()<aspect_len):
280 |                     cur_aspect.append(str(ele.start()-1) + ',' + str(ele.end()-1))
281 | 
282 |             opinion_labels = ''.join(str(ele) for ele in opinion_tags)
283 |             cur_opinion = []
284 |             for ele in re.finditer(entity_label, opinion_labels):
285 |                 if ele.start() == (aspect_len-1) and '-1,-1' not in cur_opinion:
286 |                     cur_opinion.append('-1,-1')
287 |                 elif (ele.start() > 0 and ele.end()<aspect_len):
288 |                     cur_opinion.append(str(ele.start()-1) + ',' + str(ele.end()-1))
289 | 
290 |             if len(cur_aspect) == 1 and len(cur_opinion) == 1:
291 |                 cur_ao = cur_aspect[0]+' '+cur_opinion[0]
292 |                 pred_res = []
293 |                 ind = np.where(logits[i]>0)
294 |                 for ele in ind[0]:
295 |                     pred_res.append(catesenti_dict[int(ele)])
296 |                 ttt = (_aspect_input_ids[i].detach().cpu().numpy().tolist())[1:(_tokens_len[i]-1)]
297 |                 cur_input = ' '.join(str(ele) for ele in ttt)+' '+cur_ao
298 |                 ids_to_token[cur_input] = ' '.join(ele for ele in tokenizer.convert_ids_to_tokens(ttt))+' '+cur_ao
299 |                 
300 |                 preds[cur_input] = pred_res
301 |                 
302 |                 quad_pairs = []
303 |                 for ele in pred_res:
304 |                     ele = ele.split('#')
305 |                     cate = '#'.join(item for item in ele[:-1]); senti = ele[-1]
306 |                     tmp_quad = [cate, senti, cur_aspect[0], cur_opinion[0]]
307 |                     if tmp_quad not in quad_pairs:
308 |                         quad_pairs.append(tmp_quad)
309 |                 tmp_cnt += len(quad_pairs)
310 |                 ori_text = ' '.join(ele for ele in tokenizer.convert_ids_to_tokens(ttt))
311 |                 if ori_text in quad_preds:
312 |                     quad_preds[ori_text] += quad_pairs
313 |                 else:
314 |                     quad_preds[ori_text] = quad_pairs
315 |     print("Quad num: {}".format(tmp_cnt))
316 |     # pdb.set_trace()
317 |     res = measureQuad(preds, golds)
318 |     text_type = getTextType(quad_golds)
319 |     # tmp = measureQuad_imp(quad_preds, quad_golds)
320 | 
321 |     if eval_type == 'valid':
322 |         logger.info("***** Eval results *****")
323 |         for key in sorted(res.keys()):
324 |             logger.info("  %s = %s", key, str(res[key]))
325 |         return res
326 | 
327 |     elif eval_type == 'test':
328 |         
329 |         # evaluation for all sub-tasks, we do quad extraction, so the element number is 4.
330 |         ele_num = 4
331 |         index_to_name = {0:'category', 1:'sentiment', 2:'aspect', 3:'opinion'}
332 |         for comb_choice in range(1, (1<<ele_num)):
333 |             exist_index = []
334 |             cnt = 0
335 |             while comb_choice:
336 |                 if comb_choice&1:
337 |                     exist_index.append(cnt)
338 |                 cnt += 1
339 |                 comb_choice >>= 1
340 |             sub_preds = {}
341 |             sub_golds = {}
342 |             for cur_key in quad_preds:
343 |                 cur_subs = []
344 |                 for quad in quad_preds[cur_key]:
345 |                     cur_sub = [quad[index] for index in exist_index]
346 |                     if cur_sub not in cur_subs:
347 |                         cur_subs.append(cur_sub)
348 |                 sub_preds[cur_key] = cur_subs
349 |             for cur_key in quad_golds:
350 |                 cur_subs = []
351 |                 for quad in quad_golds[cur_key]:
352 |                     cur_sub = [quad[index] for index in exist_index]
353 |                     if cur_sub not in cur_subs:
354 |                         cur_subs.append(cur_sub)
355 |                 sub_golds[cur_key] = cur_subs
356 |             sub_res = measureQuad_imp(sub_preds, sub_golds, text_type)
357 |             subtask_name = ' '.join(index_to_name[ele] for ele in exist_index)
358 |             # if subtask_name == 'aspect':
359 |             #     pdb.set_trace()
360 |             logger.info("***** %s results *****", subtask_name)
361 |             for key in sorted(sub_res.keys()):
362 |                 logger.info("  {} = {:.2%}".format(key, sub_res[key]))
363 |             logger.info("-----------------------------------")
364 | 
365 |         pipeline_res = cs.open(args.output_dir+os.sep+'result.txt', 'w')
366 |         for key in golds:
367 |             pipeline_res.write(ids_to_token[key]+'\n')
368 |             for cur_pair in golds[key]:
369 |                 pipeline_res.write(cur_pair+'\t')
370 |             pipeline_res.write('\n')
371 |             if key in preds:
372 |                 for cur_pair in preds[key]:
373 |                     pipeline_res.write(cur_pair+'\t')
374 |             pipeline_res.write('\n\n')
375 |         for key in preds:
376 |             if key not in golds:
377 |                 pipeline_res.write(ids_to_token[key]+'\n')
378 |                 pipeline_res.write('\n')
379 |                 for cur_pair in preds[key]:
380 |                     pipeline_res.write(cur_pair+'\t')
381 |                 pipeline_res.write('\n\n')
382 | 
383 |         logger.info("***** Test results *****")
384 |         for key in sorted(res.keys()):
385 |             logger.info("  %s = %s", key, str(res[key]))
386 |         return res


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/file_utils.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Utilities for working with the local dataset cache.
  3 | This file is adapted from the AllenNLP library at https://github.com/allenai/allennlp
  4 | Copyright by the AllenNLP authors.
  5 | """
  6 | from __future__ import (absolute_import, division, print_function, unicode_literals)
  7 | 
  8 | import sys
  9 | import json
 10 | import logging
 11 | import os
 12 | import shutil
 13 | import tempfile
 14 | import fnmatch
 15 | from functools import wraps
 16 | from hashlib import sha256
 17 | import sys
 18 | from io import open
 19 | 
 20 | import boto3
 21 | import requests
 22 | from botocore.exceptions import ClientError
 23 | from tqdm import tqdm
 24 | 
 25 | try:
 26 |     from torch.hub import _get_torch_home
 27 |     torch_cache_home = _get_torch_home()
 28 | except ImportError:
 29 |     torch_cache_home = os.path.expanduser(
 30 |         os.getenv('TORCH_HOME', os.path.join(
 31 |             os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch')))
 32 | default_cache_path = os.path.join(torch_cache_home, 'pytorch_pretrained_bert')
 33 | 
 34 | try:
 35 |     from urllib.parse import urlparse
 36 | except ImportError:
 37 |     from urlparse import urlparse
 38 | 
 39 | try:
 40 |     from pathlib import Path
 41 |     PYTORCH_PRETRAINED_BERT_CACHE = Path(
 42 |         os.getenv('PYTORCH_PRETRAINED_BERT_CACHE', default_cache_path))
 43 | except (AttributeError, ImportError):
 44 |     PYTORCH_PRETRAINED_BERT_CACHE = os.getenv('PYTORCH_PRETRAINED_BERT_CACHE',
 45 |                                               default_cache_path)
 46 | 
 47 | CONFIG_NAME = "config.json"
 48 | WEIGHTS_NAME = "pytorch_model.bin"
 49 | 
 50 | logger = logging.getLogger(__name__)  # pylint: disable=invalid-name
 51 | 
 52 | 
 53 | def url_to_filename(url, etag=None):
 54 |     """
 55 |     Convert `url` into a hashed filename in a repeatable way.
 56 |     If `etag` is specified, append its hash to the url's, delimited
 57 |     by a period.
 58 |     """
 59 |     url_bytes = url.encode('utf-8')
 60 |     url_hash = sha256(url_bytes)
 61 |     filename = url_hash.hexdigest()
 62 | 
 63 |     if etag:
 64 |         etag_bytes = etag.encode('utf-8')
 65 |         etag_hash = sha256(etag_bytes)
 66 |         filename += '.' + etag_hash.hexdigest()
 67 | 
 68 |     return filename
 69 | 
 70 | 
 71 | def filename_to_url(filename, cache_dir=None):
 72 |     """
 73 |     Return the url and etag (which may be ``None``) stored for `filename`.
 74 |     Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist.
 75 |     """
 76 |     if cache_dir is None:
 77 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
 78 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
 79 |         cache_dir = str(cache_dir)
 80 | 
 81 |     cache_path = os.path.join(cache_dir, filename)
 82 |     if not os.path.exists(cache_path):
 83 |         raise EnvironmentError("file {} not found".format(cache_path))
 84 | 
 85 |     meta_path = cache_path + '.json'
 86 |     if not os.path.exists(meta_path):
 87 |         raise EnvironmentError("file {} not found".format(meta_path))
 88 | 
 89 |     with open(meta_path, encoding="utf-8") as meta_file:
 90 |         metadata = json.load(meta_file)
 91 |     url = metadata['url']
 92 |     etag = metadata['etag']
 93 | 
 94 |     return url, etag
 95 | 
 96 | 
 97 | def cached_path(url_or_filename, cache_dir=None):
 98 |     """
 99 |     Given something that might be a URL (or might be a local path),
100 |     determine which. If it's a URL, download the file and cache it, and
101 |     return the path to the cached file. If it's already a local path,
102 |     make sure the file exists and then return the path.
103 |     """
104 |     if cache_dir is None:
105 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
106 |     if sys.version_info[0] == 3 and isinstance(url_or_filename, Path):
107 |         url_or_filename = str(url_or_filename)
108 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
109 |         cache_dir = str(cache_dir)
110 | 
111 |     parsed = urlparse(url_or_filename)
112 | 
113 |     if parsed.scheme in ('http', 'https', 's3'):
114 |         # URL, so get it from the cache (downloading if necessary)
115 |         return get_from_cache(url_or_filename, cache_dir)
116 |     elif os.path.exists(url_or_filename):
117 |         # File, and it exists.
118 |         return url_or_filename
119 |     elif parsed.scheme == '':
120 |         # File, but it doesn't exist.
121 |         raise EnvironmentError("file {} not found".format(url_or_filename))
122 |     else:
123 |         # Something unknown
124 |         raise ValueError("unable to parse {} as a URL or as a local path".format(url_or_filename))
125 | 
126 | 
127 | def split_s3_path(url):
128 |     """Split a full s3 path into the bucket name and path."""
129 |     parsed = urlparse(url)
130 |     if not parsed.netloc or not parsed.path:
131 |         raise ValueError("bad s3 path {}".format(url))
132 |     bucket_name = parsed.netloc
133 |     s3_path = parsed.path
134 |     # Remove '/' at beginning of path.
135 |     if s3_path.startswith("/"):
136 |         s3_path = s3_path[1:]
137 |     return bucket_name, s3_path
138 | 
139 | 
140 | def s3_request(func):
141 |     """
142 |     Wrapper function for s3 requests in order to create more helpful error
143 |     messages.
144 |     """
145 | 
146 |     @wraps(func)
147 |     def wrapper(url, *args, **kwargs):
148 |         try:
149 |             return func(url, *args, **kwargs)
150 |         except ClientError as exc:
151 |             if int(exc.response["Error"]["Code"]) == 404:
152 |                 raise EnvironmentError("file {} not found".format(url))
153 |             else:
154 |                 raise
155 | 
156 |     return wrapper
157 | 
158 | 
159 | @s3_request
160 | def s3_etag(url):
161 |     """Check ETag on S3 object."""
162 |     s3_resource = boto3.resource("s3")
163 |     bucket_name, s3_path = split_s3_path(url)
164 |     s3_object = s3_resource.Object(bucket_name, s3_path)
165 |     return s3_object.e_tag
166 | 
167 | 
168 | @s3_request
169 | def s3_get(url, temp_file):
170 |     """Pull a file directly from S3."""
171 |     s3_resource = boto3.resource("s3")
172 |     bucket_name, s3_path = split_s3_path(url)
173 |     s3_resource.Bucket(bucket_name).download_fileobj(s3_path, temp_file)
174 | 
175 | 
176 | def http_get(url, temp_file):
177 |     req = requests.get(url, stream=True)
178 |     content_length = req.headers.get('Content-Length')
179 |     total = int(content_length) if content_length is not None else None
180 |     progress = tqdm(unit="B", total=total)
181 |     for chunk in req.iter_content(chunk_size=1024):
182 |         if chunk: # filter out keep-alive new chunks
183 |             progress.update(len(chunk))
184 |             temp_file.write(chunk)
185 |     progress.close()
186 | 
187 | 
188 | def get_from_cache(url, cache_dir=None):
189 |     """
190 |     Given a URL, look for the corresponding dataset in the local cache.
191 |     If it's not there, download it. Then return the path to the cached file.
192 |     """
193 |     if cache_dir is None:
194 |         cache_dir = PYTORCH_PRETRAINED_BERT_CACHE
195 |     if sys.version_info[0] == 3 and isinstance(cache_dir, Path):
196 |         cache_dir = str(cache_dir)
197 | 
198 |     if not os.path.exists(cache_dir):
199 |         os.makedirs(cache_dir)
200 | 
201 |     # Get eTag to add to filename, if it exists.
202 |     if url.startswith("s3://"):
203 |         etag = s3_etag(url)
204 |     else:
205 |         try:
206 |             response = requests.head(url, allow_redirects=True)
207 |             if response.status_code != 200:
208 |                 etag = None
209 |             else:
210 |                 etag = response.headers.get("ETag")
211 |         except EnvironmentError:
212 |             etag = None
213 | 
214 |     if sys.version_info[0] == 2 and etag is not None:
215 |         etag = etag.decode('utf-8')
216 |     filename = url_to_filename(url, etag)
217 | 
218 |     # get cache path to put the file
219 |     cache_path = os.path.join(cache_dir, filename)
220 | 
221 |     # If we don't have a connection (etag is None) and can't identify the file
222 |     # try to get the last downloaded one
223 |     if not os.path.exists(cache_path) and etag is None:
224 |         matching_files = fnmatch.filter(os.listdir(cache_dir), filename + '.*')
225 |         matching_files = list(filter(lambda s: not s.endswith('.json'), matching_files))
226 |         if matching_files:
227 |             cache_path = os.path.join(cache_dir, matching_files[-1])
228 | 
229 |     if not os.path.exists(cache_path):
230 |         # Download to temporary file, then copy to cache dir once finished.
231 |         # Otherwise you get corrupt cache entries if the download gets interrupted.
232 |         with tempfile.NamedTemporaryFile() as temp_file:
233 |             logger.info("%s not found in cache, downloading to %s", url, temp_file.name)
234 | 
235 |             # GET file object
236 |             if url.startswith("s3://"):
237 |                 s3_get(url, temp_file)
238 |             else:
239 |                 http_get(url, temp_file)
240 | 
241 |             # we are copying the file before closing it, so flush to avoid truncation
242 |             temp_file.flush()
243 |             # shutil.copyfileobj() starts at the current position, so go to the start
244 |             temp_file.seek(0)
245 | 
246 |             logger.info("copying %s to cache at %s", temp_file.name, cache_path)
247 |             with open(cache_path, 'wb') as cache_file:
248 |                 shutil.copyfileobj(temp_file, cache_file)
249 | 
250 |             logger.info("creating metadata file for %s", cache_path)
251 |             meta = {'url': url, 'etag': etag}
252 |             meta_path = cache_path + '.json'
253 |             with open(meta_path, 'w') as meta_file:
254 |                 output_string = json.dumps(meta)
255 |                 if sys.version_info[0] == 2 and isinstance(output_string, str):
256 |                     output_string = unicode(output_string, 'utf-8')  # The beauty of python 2
257 |                 meta_file.write(output_string)
258 | 
259 |             logger.info("removing temp file %s", temp_file.name)
260 | 
261 |     return cache_path
262 | 
263 | 
264 | def read_set_from_file(filename):
265 |     '''
266 |     Extract a de-duped collection (set) of text from a file.
267 |     Expected file format is one item per line.
268 |     '''
269 |     collection = set()
270 |     with open(filename, 'r', encoding='utf-8') as file_:
271 |         for line in file_:
272 |             collection.add(line.rstrip())
273 |     return collection
274 | 
275 | 
276 | def get_file_extension(path, dot=True, lower=True):
277 |     ext = os.path.splitext(path)[1]
278 |     ext = ext if dot else ext[1:]
279 |     return ext.lower() if lower else ext
280 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/manager.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | """
  3 | Created on Mon Aug  7 19:38:30 2017
  4 | 
  5 | @author: Quantum Liu
  6 | """
  7 | 
  8 | '''
  9 | Example:
 10 | gm=GPUManager()
 11 | with gm.auto_choice():
 12 |     blabla
 13 | '''
 14 | 
 15 | import os
 16 | import pdb
 17 | import sched, time
 18 | import datetime
 19 | #from tensorflow.python.client import device_lib
 20 | 
 21 | def check_gpus():
 22 |     '''
 23 |     GPU available check
 24 |     reference : http://feisky.xyz/machine-learning/tensorflow/gpu_list.html
 25 |     '''
 26 | # =============================================================================
 27 | #     all_gpus = [x.name for x in device_lib.list_local_devices() if x.device_type == 'GPU']
 28 | # =============================================================================
 29 |     first_gpus = os.popen('nvidia-smi --query-gpu=index --format=csv,noheader').readlines()[0].strip()
 30 |     if not first_gpus=='0':
 31 |         print('This script could only be used to manage NVIDIA GPUs,but no GPU found in your device')
 32 |         return False
 33 |     elif not 'NVIDIA System Management' in os.popen('nvidia-smi -h').read():
 34 |         print("'nvidia-smi' tool not found.")
 35 |         return False
 36 |     return True
 37 | 
 38 | def parse(line,qargs):
 39 |     '''
 40 |     line:
 41 |         a line of text
 42 |     qargs:
 43 |         query arguments
 44 |     return:
 45 |         a dict of gpu infos
 46 |     Pasing a line of csv format text returned by nvidia-smi
 47 |     解析一行nvidia-smi返回的csv格式文本
 48 |     '''
 49 |     numberic_args = ['memory.free', 'memory.total', 'power.draw', 'power.limit']#可计数的参数
 50 |     power_manage_enable=lambda v:(not 'Not Support' in v)#lambda表达式，显卡是否滋瓷power management（笔记本可能不滋瓷）
 51 |     to_numberic=lambda v:float(v.upper().strip().replace('MIB','').replace('W',''))#带单位字符串去掉单位
 52 |     process = lambda k,v:((int(to_numberic(v)) if power_manage_enable(v) else 1) if k in numberic_args else v.strip())
 53 |     return {k:process(k,v) for k,v in zip(qargs,line.strip().split(','))}
 54 | 
 55 | def query_gpu(qargs=[]):
 56 |     '''
 57 |     qargs:
 58 |         query arguments
 59 |     return:
 60 |         a list of dict
 61 |     Querying GPUs infos
 62 |     查询GPU信息
 63 |     '''
 64 |     qargs =['index','gpu_name', 'memory.free', 'memory.total', 'power.draw', 'power.limit', 'utilization.gpu']+ qargs
 65 |     cmd = 'nvidia-smi --query-gpu={} --format=csv,noheader'.format(','.join(qargs))
 66 |     results = os.popen(cmd).readlines()
 67 |     return [parse(line,qargs) for line in results]
 68 | 
 69 | def by_power(d):
 70 |     '''
 71 |     helper function fo sorting gpus by power
 72 |     '''
 73 |     power_infos=(d['power.draw'],d['power.limit'])
 74 |     if any(v==1 for v in power_infos):
 75 |         print('Power management unable for GPU {}'.format(d['index']))
 76 |         return 1
 77 |     return float(d['power.draw'])/d['power.limit']
 78 | 
 79 | class GPUManager():
 80 |     '''
 81 |     qargs:
 82 |         query arguments
 83 |     A manager which can list all available GPU devices
 84 |     and sort them and choice the most free one.Unspecified
 85 |     ones pref.
 86 |     GPU设备管理器，考虑列举出所有可用GPU设备，并加以排序，自动选出
 87 |     最空闲的设备。在一个GPUManager对象内会记录每个GPU是否已被指定，
 88 |     优先选择未指定的GPU。
 89 |     '''
 90 |     def __init__(self,qargs=[]):
 91 |         '''
 92 |         '''
 93 |         self.qargs=qargs
 94 |         self.gpus=query_gpu(qargs)
 95 |         for gpu in self.gpus:
 96 |             gpu['specified']=False
 97 |         self.gpu_num=len(self.gpus)
 98 | 
 99 |     def _sort_by_memory(self,gpus,by_size=False):
100 |         if by_size:
101 |             print('Sorted by free memory size')
102 |             return sorted(gpus,key=lambda d:d['memory.free'],reverse=True)
103 |         else:
104 |             print('Sorted by free memory rate')
105 |             return sorted(gpus,key=lambda d:float(d['memory.free'])/ d['memory.total'],reverse=True)
106 | 
107 |     def _sort_by_power(self,gpus):
108 |         return sorted(gpus,key=by_power)
109 | 
110 |     def _sort_by_custom(self,gpus,key,reverse=False,qargs=[]):
111 |         if isinstance(key,str) and (key in qargs):
112 |             return sorted(gpus,key=lambda d:d[key],reverse=reverse)
113 |         if isinstance(key,type(lambda a:a)):
114 |             return sorted(gpus,key=key,reverse=reverse)
115 |         raise ValueError("The argument 'key' must be a function or a key in query args,please read the documention of nvidia-smi")
116 | 
117 |     def auto_choice(self,mode=0):
118 |         '''
119 |         mode:
120 |             0:(default)sorted by free memory size
121 |         return:
122 |             a TF device object
123 |         '''
124 |         def check_free_gpu(unspecified_gpus):
125 |             FLAG = False
126 |             for gpu_dict in unspecified_gpus:
127 |                 if gpu_dict['memory.free'] >= 18:
128 |                     FLAG = True
129 |                     break
130 |             return FLAG
131 | 
132 |         st_time = time.time()
133 |         def wait():
134 |             print("waiting for free gpu ...")
135 |             seconds = int(time.time() - st_time)
136 |             print("Have waited for {}".format(str(datetime.timedelta(seconds=seconds))))
137 | 
138 |         for old_infos,new_infos in zip(self.gpus,query_gpu(self.qargs)):
139 |             old_infos.update(new_infos)
140 |         unspecified_gpus=[gpu for gpu in self.gpus if not gpu['specified']] or self.gpus
141 | 
142 |         if mode==0:
143 |             scheduler = sched.scheduler(time.time, time.sleep)
144 |             while(True):
145 |                 if check_free_gpu(unspecified_gpus):
146 |                     break
147 |                 scheduler.enter(10, 1, wait)
148 |                 scheduler.run()
149 |             print('Choosing the GPU device has largest free memory...')
150 |             tmp = self._sort_by_memory(unspecified_gpus,True)
151 |             # for ele in tmp:
152 |             #     print('ele is : {}'.format(ele))
153 |             chosen_gpu=self._sort_by_memory(unspecified_gpus,True)[0]
154 |         elif mode==1:
155 |             print('Choosing the GPU device has highest free memory rate...')
156 |             chosen_gpu=self._sort_by_power(unspecified_gpus)[0]
157 |         elif mode==2:
158 |             print('Choosing the GPU device by power...')
159 |             chosen_gpu=self._sort_by_power(unspecified_gpus)[0]
160 |         else:
161 |             print('Given an unaviliable mode,will be chosen by memory')
162 |             chosen_gpu=self._sort_by_memory(unspecified_gpus)[0]
163 |         chosen_gpu['specified']=True
164 |         index=chosen_gpu['index']
165 |         print('Using GPU {i}:\n{info}'.format(i=index,info='\n'.join([str(k)+':'+str(v) for k,v in chosen_gpu.items()])))
166 |         return index
167 | # else:
168 | #     raise ImportError('GPU available check failed')
169 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/run.sh:
--------------------------------------------------------------------------------
 1 | BERT_BASE_DIR=/mnt/nfs-storage-titan/BERT/uncased_L-12_H-768_A-12
 2 | BASE_DIR=/mnt/nfs-storage-titan/BERT/pytorch_pretrained_BERT
 3 | DATA_DIR=$BASE_DIR/ACOS-main/Extract-Classify-ACOS
 4 | TASK_NAME=quad
 5 | MODEL=quad
 6 | DOMAIN=rest16
 7 | 
 8 | echo 'DOMAIN is chosen from [rest16, laptop]'
 9 | python run_step1.py \
10 |   --task_name $TASK_NAME \
11 |   --do_train \
12 |   --do_eval \
13 |   --domain_type $DOMAIN \
14 |   --model_type $MODEL\
15 |   --do_lower_case \
16 |   --data_dir $DATA_DIR \
17 |   --bert_model $BERT_BASE_DIR\
18 |   --max_seq_length 128 \
19 |   --train_batch_size 24 \
20 |   --learning_rate 2e-5 \
21 |   --num_train_epochs 30 \
22 |   --output_dir $BASE_DIR/output/Extract-Classify-QUAD/${DOMAIN}_1st/
23 | 
24 | 
25 | python tokenized_data/get_1st_pairs.py $BASE_DIR $DOMAIN
26 | 
27 | TASK_NAME=categorysenti
28 | MODEL=categorysenti
29 | 
30 | python run_step2.py \
31 |   --task_name $TASK_NAME \
32 |   --do_train \
33 |   --do_eval \
34 |   --domain_type $DOMAIN \
35 |   --model_type $MODEL\
36 |   --do_lower_case \
37 |   --data_dir $DATA_DIR \
38 |   --bert_model $BERT_BASE_DIR\
39 |   --max_seq_length 128 \
40 |   --train_batch_size 16 \
41 |   --learning_rate 5e-5 \
42 |   --num_train_epochs 30 \
43 |   --output_dir $BASE_DIR/output/Extract-Classify-QUAD/${DOMAIN}_2nd
44 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/run_classifier_dataset_utils.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
  3 | # Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
  4 | #
  5 | # Licensed under the Apache License, Version 2.0 (the "License");
  6 | # you may not use this file except in compliance with the License.
  7 | # You may obtain a copy of the License at
  8 | #
  9 | #     http://www.apache.org/licenses/LICENSE-2.0
 10 | #
 11 | # Unless required by applicable law or agreed to in writing, software
 12 | # distributed under the License is distributed on an "AS IS" BASIS,
 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 14 | # See the License for the specific language governing permissions and
 15 | # limitations under the License.
 16 | """ BERT classification fine-tuning: utilities to work with GLUE tasks """
 17 | 
 18 | from __future__ import absolute_import, division, print_function
 19 | 
 20 | import csv
 21 | import logging
 22 | import numpy as np
 23 | import matplotlib.pyplot as plt
 24 | import seaborn as sns
 25 | import os
 26 | import copy
 27 | import pdb
 28 | import sys
 29 | 
 30 | from scipy.stats import pearsonr, spearmanr
 31 | from sklearn.metrics import matthews_corrcoef, f1_score, hamming_loss, precision_score, recall_score
 32 | 
 33 | logger = logging.getLogger(__name__)
 34 | 
 35 | 
 36 | class InputExample(object):
 37 |     """A single training/test example for simple sequence classification."""
 38 | 
 39 |     def __init__(self, guid, text_a, text_cate=None, text_senti=None, label=None):
 40 |         """Constructs a InputExample.
 41 | 
 42 |         Args:
 43 |             guid: Unique id for the example.
 44 |             text_a: string. The untokenized text of the first sequence. For single
 45 |             sequence tasks, only this sequence must be specified.
 46 |             text_b: (Optional) string. The untokenized text of the second sequence.
 47 |             Only must be specified for sequence pair tasks.
 48 |             label: (Optional) string. The label of the example. This should be
 49 |             specified for train and dev examples, but not for test examples.
 50 |         """
 51 |         self.guid = guid
 52 |         self.text_a = text_a
 53 |         self.label = label
 54 | 
 55 | class InputFeatures(object):
 56 |     """A single set of features of data."""
 57 | 
 58 |     def __init__(self, tokens_len, aspect_input_ids, aspect_input_mask, aspect_ids, aspect_segment_ids, aspect_labels, 
 59 |         exist_imp_aspect, exist_imp_opinion):
 60 |         self.tokens_len = tokens_len
 61 |         self.aspect_input_ids=aspect_input_ids
 62 |         self.aspect_input_mask=aspect_input_mask
 63 |         self.aspect_ids=aspect_ids
 64 |         self.aspect_segment_ids=aspect_segment_ids
 65 |         self.aspect_labels=aspect_labels
 66 |         self.exist_imp_aspect=exist_imp_aspect
 67 |         self.exist_imp_opinion=exist_imp_opinion
 68 | 
 69 | class InputExample2nd(object):
 70 |     """A single training/test example for simple sequence classification."""
 71 | 
 72 |     def __init__(self, guid, text_a, text_b=None, label=None):
 73 |         """Constructs a InputExample.
 74 | 
 75 |         Args:
 76 |             guid: Unique id for the example.
 77 |             text_a: string. The untokenized text of the first sequence. For single
 78 |             sequence tasks, only this sequence must be specified.
 79 |             text_b: (Optional) string. The untokenized text of the second sequence.
 80 |             Only must be specified for sequence pair tasks.
 81 |             label: (Optional) string. The label of the example. This should be
 82 |             specified for train and dev examples, but not for test examples.
 83 |         """
 84 |         self.guid = guid
 85 |         self.text_a = text_a
 86 |         self.text_b = text_b
 87 |         self.label = label
 88 | 
 89 | 
 90 | class InputFeatures2nd(object):
 91 |     """A single set of features of data."""
 92 | 
 93 |     def __init__(self, tokens_len, aspect_tokens, aspect_input_ids, aspect_input_mask, aspect_segment_ids, 
 94 |         candidate_aspect, candidate_opinion, label_id):
 95 | 
 96 |         self.tokens_len=tokens_len
 97 |         self.aspect_tokens=aspect_tokens
 98 |         self.aspect_input_ids=aspect_input_ids
 99 |         self.aspect_input_mask=aspect_input_mask
100 |         self.aspect_segment_ids=aspect_segment_ids
101 |         
102 |         self.candidate_aspect=candidate_aspect
103 |         self.candidate_opinion=candidate_opinion
104 |         self.label_id=label_id
105 | 
106 | class DataProcessor(object):
107 |     """Base class for data converters for sequence classification data sets."""
108 | 
109 |     def get_train_examples(self, data_dir):
110 |         """Gets a collection of `InputExample`s for the train set."""
111 |         raise NotImplementedError()
112 | 
113 |     def get_dev_examples(self, data_dir):
114 |         """Gets a collection of `InputExample`s for the dev set."""
115 |         raise NotImplementedError()
116 | 
117 |     def get_labels(self):
118 |         """Gets the list of labels for this data set."""
119 |         raise NotImplementedError()
120 | 
121 |     @classmethod
122 |     def _read_tsv(cls, input_file, quotechar=None):
123 |         """Reads a tab separated value file."""
124 |         with open(input_file, "r", encoding="utf-8") as f:
125 |             reader = csv.reader(f, delimiter="\t", quotechar=quotechar)
126 |             lines = []
127 |             for line in reader:
128 |                 if sys.version_info[0] == 2:
129 |                     line = list(unicode(cell, 'utf-8') for cell in line)
130 |                 lines.append(line)
131 |             return lines
132 | 
133 | 
134 | class QuadProcessor(DataProcessor):
135 |     """Processor for the MRPC data set (GLUE version)."""
136 | 
137 |     def get_train_examples(self, data_dir, domain_type):
138 |         """See base class."""
139 |         string = domain_type
140 |         logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_train_quad_bert.tsv")))
141 |         return self._create_examples(
142 |             self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_train_quad_bert.tsv")), "train")
143 | 
144 |     def get_valid_examples(self, data_dir, domain_type):
145 |         """See base class."""
146 |         string = domain_type
147 |         logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_dev_quad_bert.tsv")))
148 |         return self._create_examples(
149 |             self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_dev_quad_bert.tsv")), "valid")
150 | 
151 |     def get_dev_examples(self, data_dir, domain_type):
152 |         """See base class."""
153 |         string = domain_type
154 |         logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_test_quad_bert.tsv")))
155 |         return self._create_examples(
156 |             self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_test_quad_bert.tsv")), "test")
157 | 
158 |     def get_labels(self, domain_type):
159 |         """See base class."""
160 | 
161 |         sentiment = ['negative', 'neutral', 'positive']
162 |         # seqlabs = ['O',  'I']
163 |         # 'P' means PAD, 'M' means IMP.
164 |         seqlabs = ['[CLS]', 'O', 'I-A', 'B-A', 'I-O', 'B-O']
165 |         # seqlabs = ['O', 'I-A', 'B-A', 'M-A', 'I-O', 'B-O', 'M-O']
166 |         label_list = []
167 |         
168 |         label_list.append(sentiment)
169 |         label_list.append(seqlabs)
170 |         return label_list
171 | 
172 |     def _create_examples(self, lines, set_type):
173 |         """Creates examples for the training and dev sets."""
174 |         examples = []
175 |         for (i, line) in enumerate(lines):
176 |             guid = "%s-%s" % (set_type, i)
177 |             try:
178 |                 text_a = line[0]
179 |             except:
180 |                 pdb.set_trace()
181 |             labels = line[1:]
182 |             examples.append(
183 |                 InputExample(guid=guid, text_a=text_a, label=labels))
184 |         return examples
185 | 
186 | 
187 | class CategorySentiProcessor(DataProcessor):
188 |     """Processor for the MRPC data set (GLUE version)."""
189 | 
190 |     def get_train_examples(self, data_dir, domain_type):
191 |         """See base class."""
192 |         string = domain_type
193 |         logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_train_pair.tsv")))
194 |         return self._create_examples(
195 |             self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_train_pair.tsv")), "train")
196 | 
197 |     def get_valid_examples(self, data_dir, domain_type):
198 |         """See base class."""
199 |         string = domain_type
200 |         logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_dev_pair.tsv")))
201 |         return self._create_examples(
202 |             self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_dev_pair.tsv")), "valid")
203 | 
204 |     def get_dev_examples(self, data_dir, domain_type):
205 |         """See base class."""
206 |         string = domain_type
207 |         return self._create_examples(
208 |             self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_test_pair_1st.tsv")), "test")
209 | 
210 |     def get_labels(self, domain_type):
211 |         """See base class."""
212 |         l = None
213 |         sentiment = None
214 |         if domain_type.startswith('rest'):
215 |             l = ['RESTAURANT#GENERAL', 'SERVICE#GENERAL', 'FOOD#GENERAL', 'FOOD#QUALITY', 'FOOD#STYLE_OPTIONS', 'DRINKS#STYLE_OPTIONS', 'DRINKS#PRICES', 
216 |             'AMBIENCE#GENERAL', 'RESTAURANT#PRICES', 'FOOD#PRICES', 'RESTAURANT#MISCELLANEOUS', 'DRINKS#QUALITY', 'LOCATION#GENERAL']
217 |         elif domain_type == 'laptop':
218 |             l = ['MULTIMEDIA_DEVICES#PRICE', 'OS#QUALITY', 'SHIPPING#QUALITY', 'GRAPHICS#OPERATION_PERFORMANCE', 'CPU#OPERATION_PERFORMANCE', 
219 |             'COMPANY#DESIGN_FEATURES', 'MEMORY#OPERATION_PERFORMANCE', 'SHIPPING#PRICE', 'POWER_SUPPLY#CONNECTIVITY', 'SOFTWARE#USABILITY', 
220 |             'FANS&COOLING#GENERAL', 'GRAPHICS#DESIGN_FEATURES', 'BATTERY#GENERAL', 'HARD_DISC#USABILITY', 'FANS&COOLING#DESIGN_FEATURES', 
221 |             'MEMORY#DESIGN_FEATURES', 'MOUSE#USABILITY', 'CPU#GENERAL', 'LAPTOP#QUALITY', 'POWER_SUPPLY#GENERAL', 'PORTS#QUALITY', 
222 |             'KEYBOARD#PORTABILITY', 'SUPPORT#DESIGN_FEATURES', 'MULTIMEDIA_DEVICES#USABILITY', 'MOUSE#GENERAL', 'KEYBOARD#MISCELLANEOUS', 
223 |             'MULTIMEDIA_DEVICES#DESIGN_FEATURES', 'OS#MISCELLANEOUS', 'LAPTOP#MISCELLANEOUS', 'SOFTWARE#PRICE', 'FANS&COOLING#OPERATION_PERFORMANCE', 
224 |             'MEMORY#QUALITY', 'OPTICAL_DRIVES#OPERATION_PERFORMANCE', 'HARD_DISC#GENERAL', 'MEMORY#GENERAL', 'DISPLAY#OPERATION_PERFORMANCE', 
225 |             'MULTIMEDIA_DEVICES#GENERAL', 'LAPTOP#GENERAL', 'MOTHERBOARD#QUALITY', 'LAPTOP#PORTABILITY', 'KEYBOARD#PRICE', 'SUPPORT#OPERATION_PERFORMANCE', 
226 |             'GRAPHICS#GENERAL', 'MOTHERBOARD#OPERATION_PERFORMANCE', 'DISPLAY#GENERAL', 'BATTERY#QUALITY', 'LAPTOP#USABILITY', 'LAPTOP#DESIGN_FEATURES', 
227 |             'PORTS#CONNECTIVITY', 'HARDWARE#QUALITY', 'SUPPORT#GENERAL', 'MOTHERBOARD#GENERAL', 'PORTS#USABILITY', 'KEYBOARD#QUALITY', 'GRAPHICS#USABILITY', 
228 |             'HARD_DISC#PRICE', 'OPTICAL_DRIVES#USABILITY', 'MULTIMEDIA_DEVICES#CONNECTIVITY', 'HARDWARE#DESIGN_FEATURES', 'MEMORY#USABILITY', 
229 |             'SHIPPING#GENERAL', 'CPU#PRICE', 'Out_Of_Scope#DESIGN_FEATURES', 'MULTIMEDIA_DEVICES#QUALITY', 'OS#PRICE', 'SUPPORT#QUALITY', 
230 |             'OPTICAL_DRIVES#GENERAL', 'HARDWARE#USABILITY', 'DISPLAY#DESIGN_FEATURES', 'PORTS#GENERAL', 'COMPANY#OPERATION_PERFORMANCE', 
231 |             'COMPANY#GENERAL', 'Out_Of_Scope#GENERAL', 'KEYBOARD#DESIGN_FEATURES', 'Out_Of_Scope#OPERATION_PERFORMANCE', 
232 |             'OPTICAL_DRIVES#DESIGN_FEATURES', 'LAPTOP#OPERATION_PERFORMANCE', 'KEYBOARD#USABILITY', 'DISPLAY#USABILITY', 'POWER_SUPPLY#QUALITY', 
233 |             'HARD_DISC#DESIGN_FEATURES', 'DISPLAY#QUALITY', 'MOUSE#DESIGN_FEATURES', 'COMPANY#QUALITY', 'HARDWARE#GENERAL', 'COMPANY#PRICE', 
234 |             'MULTIMEDIA_DEVICES#OPERATION_PERFORMANCE', 'KEYBOARD#OPERATION_PERFORMANCE', 'SOFTWARE#PORTABILITY', 'HARD_DISC#OPERATION_PERFORMANCE', 
235 |             'BATTERY#DESIGN_FEATURES', 'CPU#QUALITY', 'WARRANTY#GENERAL', 'OS#DESIGN_FEATURES', 'OS#OPERATION_PERFORMANCE', 'OS#USABILITY', 
236 |             'SOFTWARE#GENERAL', 'SUPPORT#PRICE', 'SHIPPING#OPERATION_PERFORMANCE', 'DISPLAY#PRICE', 'LAPTOP#PRICE', 'OS#GENERAL', 'HARDWARE#PRICE', 
237 |             'SOFTWARE#DESIGN_FEATURES', 'HARD_DISC#MISCELLANEOUS', 'PORTS#PORTABILITY', 'FANS&COOLING#QUALITY', 'BATTERY#OPERATION_PERFORMANCE', 
238 |             'CPU#DESIGN_FEATURES', 'PORTS#OPERATION_PERFORMANCE', 'SOFTWARE#OPERATION_PERFORMANCE', 'KEYBOARD#GENERAL', 'SOFTWARE#QUALITY', 
239 |             'LAPTOP#CONNECTIVITY', 'POWER_SUPPLY#DESIGN_FEATURES', 'HARDWARE#OPERATION_PERFORMANCE', 'WARRANTY#QUALITY', 'HARD_DISC#QUALITY', 
240 |             'POWER_SUPPLY#OPERATION_PERFORMANCE', 'PORTS#DESIGN_FEATURES', 'Out_Of_Scope#USABILITY']
241 |         sentiment = ['0', '1', '2']
242 |         label_list = []
243 |         # label_list.append(l)
244 |         # label_list.append(sentiment)
245 |         cate_senti = []
246 |         for cate in l:
247 |             for senti in sentiment:
248 |                 cate_senti.append(cate+'#'+senti)
249 |         label_list.append(cate_senti)
250 |         return label_list
251 | 
252 |     def _create_examples(self, lines, set_type):
253 |         """Creates examples for the training and dev sets."""
254 |         examples = []
255 |         for (i, line) in enumerate(lines):
256 |             guid = "%s-%s" % (set_type, i)
257 |             text_a = line[0]
258 |             labels = line[1:]
259 |             examples.append(
260 |                 InputExample2nd(guid=guid, text_a=text_a, text_b=None, label=labels))
261 |         return examples
262 | 
263 | 
264 | def convert_examples_to_features(examples, label_list, max_seq_length,
265 |                                  tokenizer, output_mode, task_name):
266 |     """Loads a data file into a list of `InputBatch`s."""
267 | 
268 |     label_map_senti = {label : i for i, label in enumerate(label_list[0])}
269 |     label_map_seq = {label : i for i, label in enumerate(label_list[1])}
270 | 
271 |     features = []
272 | 
273 |     for (ex_index, example) in enumerate(examples):
274 |         # pdb.set_trace()
275 |         if ex_index % 10000 == 0:
276 |             logger.info("Writing example %d of %d" % (ex_index, len(examples)))
277 | 
278 |         orig_tokens = example.text_a.strip().split()
279 |         labels = example.label
280 | 
281 |         exist_imp_aspect = 0
282 |         exist_imp_opinion = 0
283 | 
284 |         bert_tokens_a = orig_tokens
285 | 
286 |         aspect_labels = ['O' for ele in range(len(orig_tokens))]
287 |         for quad in labels:
288 |             cur_aspect = quad.split(' ')[0]; cur_opinion = quad.split(' ')[-1]
289 |             a_st = int(cur_aspect.split(',')[0]); a_ed = int(cur_aspect.split(',')[1])
290 |             if a_ed != -1:
291 |                 aspect_labels[a_st] = 'B-A'
292 |                 for i in range(a_st+1, a_ed):
293 |                     aspect_labels[i] = 'I-A'
294 |             else:
295 |                 exist_imp_aspect = 1
296 |             o_st = int(cur_opinion.split(',')[0]); o_ed = int(cur_opinion.split(',')[1])
297 |             if o_ed != -1:
298 |                 aspect_labels[o_st] = 'B-O'
299 |                 for i in range(o_st+1, o_ed):
300 |                     aspect_labels[i] = 'I-O'
301 |             else:
302 |                 exist_imp_opinion = 1
303 | 
304 |         _truncate_seq_pair(bert_tokens_a, aspect_labels, max_seq_length - 2)
305 | 
306 |         aspect_ids = []
307 | 
308 |         aspect_tokens = []
309 |         aspect_segment_ids = []
310 | 
311 |         aspect_tokens.append("[CLS]")
312 |         aspect_ids.append(label_map_seq['[CLS]'])
313 |         aspect_segment_ids.append(0)
314 | 
315 |         for i, token in enumerate(bert_tokens_a):
316 |             aspect_tokens.append(token)
317 |             aspect_ids.append(label_map_seq[aspect_labels[i]])
318 |             aspect_segment_ids.append(0)
319 |             
320 |         aspect_tokens.append("[CLS]")
321 |         tokens_len = len(aspect_tokens)
322 | 
323 |         aspect_ids.append(label_map_seq['[CLS]'])
324 |         aspect_segment_ids.append(0)
325 | 
326 |         aspect_input_ids = tokenizer.convert_tokens_to_ids(aspect_tokens)
327 |         # The mask has 1 for real tokens and 0 for padding tokens. Only real
328 |         # tokens are attended to.
329 |         aspect_input_mask = [1] * len(aspect_input_ids)
330 |         # if example.text_a.startswith('it has all the features that we'):
331 |         #   pdb.set_trace()
332 | 
333 |         # Zero-pad up to the sequence length.
334 |         while len(aspect_input_ids) < max_seq_length:
335 |             aspect_input_ids.append(0)
336 |             aspect_input_mask.append(0)
337 |             aspect_ids.append(label_map_seq["O"])
338 |             aspect_segment_ids.append(0)
339 | 
340 |         assert len(aspect_input_ids) == max_seq_length
341 |         assert len(aspect_input_mask) == max_seq_length
342 |         assert len(aspect_ids) == max_seq_length
343 |         assert len(aspect_segment_ids) == max_seq_length
344 | 
345 |         if ex_index < 5:
346 |             logger.info("*** Example ***")
347 |             logger.info("guid: %s" % (example.guid))
348 |             logger.info("tokens_len: %s" % (tokens_len))
349 |             logger.info("guid: %s" % (exist_imp_aspect))
350 |             logger.info("guid: %s" % (exist_imp_opinion))
351 | 
352 |             logger.info("aspect tokens: %s" % " ".join(
353 |                     [str(x) for x in aspect_tokens]))
354 |             logger.info("aspect_input_ids: %s" % " ".join([str(x) for x in aspect_input_ids]))
355 |             logger.info("aspect_input_mask: %s" % " ".join([str(x) for x in aspect_input_mask]))
356 |             logger.info("aspect_ids: %s" % " ".join([str(x) for x in aspect_ids]))
357 |             logger.info(
358 |                     "aspect_segment_ids: %s" % " ".join([str(x) for x in aspect_segment_ids]))
359 | 
360 |         features.append(
361 |                 InputFeatures(tokens_len,
362 |                     aspect_input_ids=aspect_input_ids,
363 |                     aspect_input_mask=aspect_input_mask,
364 |                     aspect_ids=aspect_ids,
365 |                     aspect_segment_ids=aspect_segment_ids,
366 |                     aspect_labels=aspect_labels,
367 |                     exist_imp_aspect=exist_imp_aspect,
368 |                     exist_imp_opinion=exist_imp_opinion))
369 |     return features
370 | 
371 | 
372 | def _truncate_seq_pair(bert_tokens_a, aspect_labels, max_length):
373 |   """Truncates a sequence pair in place to the maximum length."""
374 | 
375 |   # This is a simple heuristic which will always truncate the longer sequence
376 |   # one token at a time. This makes more sense than truncating an equal percent
377 |   # of tokens from each, since if one sequence is very short then each token
378 |   # that's truncated likely contains more information than a longer sequence.
379 |   while True:
380 |     total_length = len(bert_tokens_a)
381 |     if total_length <= max_length:
382 |         break
383 |     bert_tokens_a.pop()
384 |     aspect_labels.pop()
385 | 
386 | 
387 | def convert_examples_to_features2nd(examples, label_list, max_seq_length,
388 |                                  tokenizer, output_mode):
389 |     """Loads a data file into a list of `InputBatch`s."""
390 | 
391 |     category_senti_map = {label : i for i, label in enumerate(label_list[0])}
392 | 
393 |     features = []
394 | 
395 |     for (ex_index, example) in enumerate(examples):
396 |         # pdb.set_trace()
397 |         if ex_index % 10000 == 0:
398 |             logger.info("Writing example %d of %d" % (ex_index, len(examples)))
399 | 
400 |         orig_tokens, ao_tags = example.text_a.strip().split('####')
401 |         # label for examples with negative samples
402 |         # labels = example.label[:-1]
403 |         orig_tokens = orig_tokens.split()
404 |         labels = example.label
405 | 
406 |         bert_tokens_a = orig_tokens
407 |         bert_tokens_b = None
408 | 
409 |         _truncate_seq_pair2nd(bert_tokens_a, max_seq_length - 2)
410 | 
411 |         aspect_tokens = []
412 |         aspect_segment_ids = []
413 | 
414 |         aspect_tokens.append("[CLS]")
415 |         aspect_segment_ids.append(0)
416 | 
417 |         for i, token in enumerate(bert_tokens_a):
418 |             aspect_tokens.append(token)
419 |             aspect_segment_ids.append(0)
420 |         aspect_tokens.append("[CLS]")
421 |         tokens_len = len(aspect_tokens)
422 |         aspect_segment_ids.append(0)
423 | 
424 |         aspect_input_ids = tokenizer.convert_tokens_to_ids(aspect_tokens)
425 |         # The mask has 1 for real tokens and 0 for padding tokens. Only real
426 |         # tokens are attended to.
427 |         aspect_input_mask = [1] * len(aspect_input_ids)
428 |         imp_opinion_pos = len(aspect_input_ids)
429 |         # if example.text_a.startswith('it has all the features that we'):
430 |         #   pdb.set_trace()
431 | 
432 |         # Zero-pad up to the sequence length.
433 |         while len(aspect_input_ids) < max_seq_length:
434 |             aspect_input_ids.append(0)
435 |             aspect_input_mask.append(0)
436 |             aspect_segment_ids.append(0)
437 | 
438 |         assert len(aspect_input_ids) == max_seq_length
439 |         assert len(aspect_input_mask) == max_seq_length
440 |         assert len(aspect_segment_ids) == max_seq_length
441 | 
442 |         # get candidate aspect and opinion
443 |         label_id = [0] * len(label_list[0])
444 |         candidate_aspect = [0 for i in range(max_seq_length)]
445 |         candidate_opinion = [0 for i in range(max_seq_length)]
446 |         cur_aspect = ao_tags.split()[0]; cur_opinion = ao_tags.split()[1]
447 |         a_st = int(cur_aspect.split(',')[0]); a_ed = int(cur_aspect.split(',')[1])
448 |         o_st = int(cur_opinion.split(',')[0]); o_ed = int(cur_opinion.split(',')[1])
449 |         if a_st == -1:
450 |             a_ed = 0
451 |         if o_st == -1:
452 |             o_st = imp_opinion_pos - 2; o_ed = imp_opinion_pos - 1
453 |         for i in range(a_st+1, a_ed+1):
454 |             candidate_aspect[i] = 1
455 |         for i in range(o_st+1, o_ed+1):
456 |             candidate_opinion[i] = 1
457 |         if len(labels) > 0:
458 |             for ele in labels[0].split():
459 |                 label_id[category_senti_map[ele]] = 1
460 | 
461 |         if ex_index < 5:
462 |             logger.info("*** Example ***")
463 |             logger.info("guid: %s" % (example.guid))
464 |             logger.info("tokens_len: %s" % (tokens_len))
465 | 
466 |             logger.info("aspect tokens: %s" % " ".join(
467 |                     [str(x) for x in aspect_tokens]))
468 |             logger.info("aspect_input_ids: %s" % " ".join([str(x) for x in aspect_input_ids]))
469 |             logger.info("aspect_input_mask: %s" % " ".join([str(x) for x in aspect_input_mask]))
470 |             logger.info(
471 |                     "aspect_segment_ids: %s" % " ".join([str(x) for x in aspect_segment_ids]))
472 |             logger.info(
473 |                     "candidate_aspect: %s" % " ".join([str(x) for x in candidate_aspect]))
474 |             logger.info(
475 |                     "candidate_opinion: %s" % " ".join([str(x) for x in candidate_opinion]))
476 |             logger.info(
477 |                     "label_id: %s" % " ".join([str(x) for x in label_id]))
478 | 
479 |         features.append(
480 |                 InputFeatures2nd(
481 |                     tokens_len=tokens_len,
482 |                     aspect_tokens=aspect_tokens,
483 |                     aspect_input_ids=aspect_input_ids,
484 |                     aspect_input_mask=aspect_input_mask,
485 |                     aspect_segment_ids=aspect_segment_ids,
486 |                     
487 |                     candidate_aspect=candidate_aspect,
488 |                     candidate_opinion=candidate_opinion,
489 |                     label_id=label_id,
490 |                     ))
491 |     return features
492 | 
493 | 
494 | def _truncate_seq_pair2nd(tokens_a, max_length):
495 |     """Truncates a sequence pair in place to the maximum length."""
496 | 
497 |     # This is a simple heuristic which will always truncate the longer sequence
498 |     # one token at a time. This makes more sense than truncating an equal percent
499 |     # of tokens from each, since if one sequence is very short then each token
500 |     # that's truncated likely contains more information than a longer sequence.
501 |     while True:
502 |         total_length = len(tokens_a)
503 |         if total_length <= max_length:
504 |             break
505 |         tokens_a.pop()
506 | 
507 | 
508 | def simple_accuracy(preds, labels):
509 |     return (preds == labels).mean()
510 | 
511 | 
512 | def acc_and_f1(preds, labels):
513 |     acc = simple_accuracy(preds, labels)
514 |     precision = precision_score(labels, preds, average='micro')
515 |     recall = recall_score(labels, preds, average='micro')
516 |     f1 = f1_score(y_true=labels, y_pred=preds, average='micro')
517 |     macro = f1_score(y_true=labels, y_pred=preds, average='macro')
518 |     hamming = hamming_loss(y_true=labels, y_pred=preds)
519 |     return {
520 |         "acc": acc,
521 |         "precision": precision,
522 |         "recall": recall,
523 |         "micro-f1": f1,
524 |         "macro-f1": macro,
525 |         "hamming_loss":hamming,
526 |         "acc_and_f1": (acc + f1) / 2,
527 |     }
528 | 
529 | 
530 | def pearson_and_spearman(preds, labels):
531 |     pearson_corr = pearsonr(preds, labels)[0]
532 |     spearman_corr = spearmanr(preds, labels)[0]
533 |     return {
534 |         "pearson": pearson_corr,
535 |         "spearmanr": spearman_corr,
536 |         "corr": (pearson_corr + spearman_corr) / 2,
537 |     }
538 | 
539 | 
540 | def compute_metrics(task_name, preds, labels):
541 |     assert len(preds) == len(labels)
542 |     return acc_and_f1(preds, labels)
543 | 
544 | processors = {
545 |     "quad": QuadProcessor,
546 |     "categorysenti": CategorySentiProcessor,
547 | }
548 | 
549 | output_modes = {
550 |     "quad": "classification",
551 |     "categorysenti": "classification",
552 | }
553 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/run_step1.py:
--------------------------------------------------------------------------------
  1 | # coding=utf-8
  2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team.
  3 | # Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
  4 | #
  5 | # Licensed under the Apache License, Version 2.0 (the "License");
  6 | # you may not use this file except in compliance with the License.
  7 | # You may obtain a copy of the License at
  8 | #
  9 | #     http://www.apache.org/licenses/LICENSE-2.0
 10 | #
 11 | # Unless required by applicable law or agreed to in writing, software
 12 | # distributed under the License is distributed on an "AS IS" BASIS,
 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 14 | # See the License for the specific language governing permissions and
 15 | # limitations under the License.
 16 | """BERT finetuning runner."""
 17 | 
 18 | from __future__ import absolute_import, division, print_function
 19 | 
 20 | import argparse
 21 | import logging
 22 | import os
 23 | import sys
 24 | import random
 25 | from tqdm import tqdm, trange
 26 | import pdb
 27 | from collections import defaultdict, namedtuple
 28 | from manager import *
 29 | import math
 30 | import codecs as cs
 31 | from sklearn.model_selection import KFold
 32 | 
 33 | gm = GPUManager()
 34 | device = gm.auto_choice(mode=0)
 35 | os.environ["CUDA_VISIBLE_DEVICES"] = str(device)
 36 | 
 37 | import numpy as np
 38 | 
 39 | import torch
 40 | from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,
 41 |                               TensorDataset)
 42 | from torch.utils.data.distributed import DistributedSampler
 43 | from torch.nn import CrossEntropyLoss, MSELoss, MultiLabelSoftMarginLoss, BCEWithLogitsLoss
 44 | 
 45 | from modeling import BertForQuadABSA
 46 | from bert_utils.tokenization import BertTokenizer
 47 | from bert_utils.optimization import BertAdam, WarmupLinearSchedule
 48 | 
 49 | from run_classifier_dataset_utils import *
 50 | from eval_metrics import *
 51 | import gc
 52 | 
 53 | if sys.version_info[0] == 2:
 54 |     import cPickle as pickle
 55 | else:
 56 |     import pickle
 57 | 
 58 | CONFIG_NAME = "config.json"
 59 | WEIGHTS_NAME = "pytorch_model.bin"
 60 | 
 61 | logger = logging.getLogger(__name__)
 62 | 
 63 | def main():
 64 |     parser = argparse.ArgumentParser()
 65 | 
 66 |     ## Required parameters
 67 |     parser.add_argument("--data_dir",
 68 |                         default=None,
 69 |                         type=str,
 70 |                         required=True,
 71 |                         help="The input source data dir. Should contain the .tsv files (or other data files) for the task.")
 72 |     parser.add_argument("--bert_model", default=None, type=str, required=True,
 73 |                         help="Bert pre-trained model selected in the list: bert-base-uncased, "
 74 |                         "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, "
 75 |                         "bert-base-multilingual-cased, bert-base-chinese.")
 76 |     parser.add_argument("--task_name",
 77 |                         default=None,
 78 |                         type=str,
 79 |                         required=True,
 80 |                         help="The name of the task to train.")
 81 |     parser.add_argument("--output_dir",
 82 |                         default=None,
 83 |                         type=str,
 84 |                         required=True,
 85 |                         help="The output directory where the model predictions and checkpoints will be written.")
 86 | 
 87 |     parser.add_argument("--domain_type",
 88 |                         default=None,
 89 |                         type=str,
 90 |                         required=True,
 91 |                         help="domain to choose.")
 92 | 
 93 |     parser.add_argument("--model_type",
 94 |                         default=None,
 95 |                         type=str,
 96 |                         required=True,
 97 |                         help="model to choose.")
 98 | 
 99 |     ## Other parameters
100 |     parser.add_argument("--cache_dir",
101 |                         default="",
102 |                         type=str,
103 |                         help="Where do you want to store the pre-trained models downloaded from s3")
104 |     parser.add_argument("--max_seq_length",
105 |                         default=128,
106 |                         type=int,
107 |                         help="The maximum total input sequence length after WordPiece tokenization. \n"
108 |                              "Sequences longer than this will be truncated, and sequences shorter \n"
109 |                              "than this will be padded.")
110 |     parser.add_argument("--do_train",
111 |                         action='store_true',
112 |                         help="Whether to run training.")
113 |     parser.add_argument("--do_eval",
114 |                         action='store_true',
115 |                         help="Whether to run eval on the dev set.")
116 |     parser.add_argument("--do_lower_case",
117 |                         action='store_true',
118 |                         help="Set this flag if you are using an uncased model.")
119 |     parser.add_argument("--train_batch_size",
120 |                         default=32,
121 |                         type=int,
122 |                         help="Total batch size for training.")
123 |     parser.add_argument("--eval_batch_size",
124 |                         default=8,
125 |                         type=int,
126 |                         help="Total batch size for eval.")
127 |     parser.add_argument("--learning_rate",
128 |                         default=5e-5,
129 |                         type=float,
130 |                         help="The initial learning rate for Adam.")
131 |     parser.add_argument("--num_train_epochs",
132 |                         default=3.0,
133 |                         type=float,
134 |                         help="Total number of training epochs to perform.")
135 |     parser.add_argument("--warmup_proportion",
136 |                         default=0.1,
137 |                         type=float,
138 |                         help="Proportion of training to perform linear learning rate warmup for. "
139 |                              "E.g., 0.1 = 10%% of training.")
140 |     parser.add_argument("--no_cuda",
141 |                         action='store_true',
142 |                         help="Whether not to use CUDA when available")
143 |     parser.add_argument('--overwrite_output_dir',
144 |                         action='store_true',
145 |                         help="Overwrite the content of the output directory")
146 |     parser.add_argument("--local_rank",
147 |                         type=int,
148 |                         default=-1,
149 |                         help="local_rank for distributed training on gpus")
150 |     parser.add_argument('--seed',
151 |                         type=int,
152 |                         default=42,
153 |                         help="random seed for initialization")
154 |     parser.add_argument('--gradient_accumulation_steps',
155 |                         type=int,
156 |                         default=1,
157 |                         help="Number of updates steps to accumulate before performing a backward/update pass.")
158 |     parser.add_argument('--fp16',
159 |                         action='store_true',
160 |                         help="Whether to use 16-bit float precision instead of 32-bit")
161 |     parser.add_argument('--loss_scale',
162 |                         type=float, default=0,
163 |                         help="Loss scaling to improve fp16 numeric stability. Only used when fp16 set to True.\n"
164 |                              "0 (default value): dynamic loss scaling.\n"
165 |                              "Positive power of 2: static loss scaling value.\n")
166 |     args = parser.parse_args()
167 | 
168 |     if args.local_rank == -1 or args.no_cuda:
169 |         device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu")
170 |         n_gpu = torch.cuda.device_count()
171 |     else:
172 |         torch.cuda.set_device(args.local_rank)
173 |         device = torch.device("cuda", args.local_rank)
174 |         n_gpu = 1
175 |         # Initializes the distributed backend which will take care of sychronizing nodes/GPUs
176 |         torch.distributed.init_process_group(backend='nccl')
177 |     args.device = device
178 | 
179 |     logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
180 |                         datefmt = '%m/%d/%Y %H:%M:%S',
181 |                         level = logging.INFO if args.local_rank in [-1, 0] else logging.WARN)
182 | 
183 |     logger.info("device: {} n_gpu: {}, distributed training: {}, 16-bits training: {}".format(
184 |         device, n_gpu, bool(args.local_rank != -1), args.fp16))
185 | 
186 |     if args.gradient_accumulation_steps < 1:
187 |         raise ValueError("Invalid gradient_accumulation_steps parameter: {}, should be >= 1".format(
188 |                             args.gradient_accumulation_steps))
189 | 
190 |     args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps
191 | 
192 |     random.seed(args.seed)
193 |     np.random.seed(args.seed)
194 |     torch.manual_seed(args.seed)
195 |     if n_gpu > 0:
196 |         torch.cuda.manual_seed_all(args.seed)
197 | 
198 |     if not args.do_train and not args.do_eval:
199 |         raise ValueError("At least one of `do_train` or `do_eval` must be True.")
200 | 
201 |     # if os.path.exists(args.output_dir) and os.listdir(args.output_dir) and args.do_train and not args.overwrite_output_dir:
202 |     #     raise ValueError("Output directory ({}) already exists and is not empty.".format(args.output_dir))
203 |     if not os.path.exists(args.output_dir):
204 |         os.makedirs(args.output_dir)
205 |     # print(args.output_dir)
206 |     # pdb.set_trace()
207 | 
208 |     task_name = args.task_name.lower()
209 | 
210 |     if task_name not in processors:
211 |         raise ValueError("Task not found: %s" % (task_name))
212 | 
213 |     processor = processors[task_name]()
214 |     output_mode = output_modes[task_name]
215 | 
216 |     label_list = processor.get_labels(args.domain_type)
217 |     num_labels = len(label_list[1])
218 | 
219 |     if args.local_rank not in [-1, 0]:
220 |         torch.distributed.barrier()  # Make sure only the first process in distributed training will download model & vocab
221 |     tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)
222 |     model_dict = {
223 |         'quad': BertForQuadABSA,
224 |     }
225 | 
226 |     label_map_senti = {label : i for i, label in enumerate(label_list[0])}
227 |     label_map_seq = {label : i for i, label in enumerate(label_list[1])}
228 | 
229 |     global_step = 0
230 |     nb_tr_steps = 0
231 |     eval_gold = []
232 |     valid_gold = []
233 |     if args.do_eval:
234 |         eval_examples = processor.get_dev_examples(args.data_dir, args.domain_type)
235 |         f = cs.open(args.data_dir+'/tokenized_data/'+args.domain_type +'_test_quad_bert.tsv', 'r').readlines()
236 |         for line in f:
237 |             cur_exist_imp_aspect = 0
238 |             cur_exist_imp_opinion = 0
239 |             line = line.strip().split('\t')
240 |             cur_text = line[0]
241 |             aspect_labels = [label_map_seq['O'] for ele in range(args.max_seq_length)]
242 |             
243 |             for quad in line[1:]:
244 |                 cur_aspect = quad.split(' ')[0]; cur_opinion = quad.split(' ')[-1]
245 |                 a_st = int(cur_aspect.split(',')[0]); a_ed = int(cur_aspect.split(',')[1])
246 |                 
247 |                 if a_ed != -1:
248 |                     aspect_labels[a_st] = label_map_seq['B-A']
249 |                     for i in range(a_st+1, a_ed):
250 |                         aspect_labels[i] = label_map_seq['I-A']
251 |                 else:
252 |                     cur_exist_imp_aspect = 1
253 | 
254 |                 o_st = int(cur_opinion.split(',')[0]); o_ed = int(cur_opinion.split(',')[1])
255 |                 
256 |                 if o_ed != -1:
257 |                     aspect_labels[o_st] = label_map_seq['B-O']
258 |                     for i in range(o_st+1, o_ed):
259 |                         aspect_labels[i] = label_map_seq['I-O']
260 |                 else:
261 |                     cur_exist_imp_opinion = 1
262 | 
263 |             eval_gold += [aspect_labels, cur_exist_imp_aspect, cur_exist_imp_opinion]
264 | 
265 |         eval_features = convert_examples_to_features(
266 |             eval_examples, label_list, args.max_seq_length, tokenizer, output_mode, task_name)
267 | 
268 |         eval_tokens_len = torch.tensor([f.tokens_len for f in eval_features], dtype=torch.long)
269 |         eval_aspect_input_ids = torch.tensor([f.aspect_input_ids for f in eval_features], dtype=torch.long)
270 |         eval_aspect_input_mask = torch.tensor([f.aspect_input_mask for f in eval_features], dtype=torch.long)
271 |         eval_aspect_ids = torch.tensor([f.aspect_ids for f in eval_features], dtype=torch.long)
272 |         eval_aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in eval_features], dtype=torch.long)
273 |         eval_exist_imp_aspect = torch.tensor([f.exist_imp_aspect for f in eval_features], dtype=torch.long)
274 |         eval_exist_imp_opinion = torch.tensor([f.exist_imp_opinion for f in eval_features], dtype=torch.long)
275 | 
276 |         eval_gold = [eval_aspect_input_ids.numpy().tolist(), eval_gold]
277 | 
278 |         eval_data = TensorDataset(eval_tokens_len, eval_aspect_input_ids, eval_aspect_input_mask, eval_aspect_ids,
279 |         eval_aspect_segment_ids, eval_exist_imp_aspect, eval_exist_imp_opinion)
280 |         # Run prediction for full data
281 |         if args.local_rank == -1:
282 |             eval_sampler = SequentialSampler(eval_data)
283 |         else:
284 |             eval_sampler = DistributedSampler(eval_data)  # Note that this sampler samples randomly
285 |         eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args.eval_batch_size)
286 | 
287 |     if args.do_train:
288 | 
289 |         # Prepare data loader
290 |         train_examples = processor.get_train_examples(args.data_dir, args.domain_type)
291 | 
292 |         train_features = convert_examples_to_features(
293 |             train_examples, label_list, args.max_seq_length, tokenizer, output_mode, task_name)
294 | 
295 |         tokens_len = torch.tensor([f.tokens_len for f in train_features], dtype=torch.long)
296 |         aspect_input_ids = torch.tensor([f.aspect_input_ids for f in train_features], dtype=torch.long)
297 |         aspect_input_mask = torch.tensor([f.aspect_input_mask for f in train_features], dtype=torch.long)
298 |         aspect_ids = torch.tensor([f.aspect_ids for f in train_features], dtype=torch.long)
299 |         aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in train_features], dtype=torch.long)
300 |         exist_imp_aspect = torch.tensor([f.exist_imp_aspect for f in train_features], dtype=torch.long)
301 |         exist_imp_opinion = torch.tensor([f.exist_imp_opinion for f in train_features], dtype=torch.long)
302 | 
303 |         valid_examples = processor.get_valid_examples(args.data_dir, args.domain_type)
304 |         f = cs.open(args.data_dir+'/tokenized_data/'+args.domain_type +'_dev_quad_bert.tsv', 'r').readlines()
305 |         for line in f:
306 |             cur_exist_imp_aspect = 0
307 |             cur_exist_imp_opinion = 0
308 |             line = line.strip().split('\t')
309 |             cur_text = line[0]
310 |             aspect_labels = [label_map_seq['O'] for ele in range(args.max_seq_length)]
311 |             
312 |             for quad in line[1:]:
313 |                 cur_aspect = quad.split(' ')[0]; cur_opinion = quad.split(' ')[-1]
314 |                 a_st = int(cur_aspect.split(',')[0]); a_ed = int(cur_aspect.split(',')[1])
315 |                 
316 |                 if a_ed != -1:
317 |                     aspect_labels[a_st] = label_map_seq['B-A']
318 |                     for i in range(a_st+1, a_ed):
319 |                         aspect_labels[i] = label_map_seq['I-A']
320 |                 else:
321 |                     cur_exist_imp_aspect = 1
322 |                     
323 |                 o_st = int(cur_opinion.split(',')[0]); o_ed = int(cur_opinion.split(',')[1])
324 |                 
325 |                 if o_ed != -1:
326 |                     aspect_labels[o_st] = label_map_seq['B-O']
327 |                     for i in range(o_st+1, o_ed):
328 |                         aspect_labels[i] = label_map_seq['I-O']
329 |                 else:
330 |                     cur_exist_imp_opinion = 1
331 | 
332 |             valid_gold += [aspect_labels, cur_exist_imp_aspect, cur_exist_imp_opinion]
333 | 
334 |         valid_features = convert_examples_to_features(
335 |             valid_examples, label_list, args.max_seq_length, tokenizer, output_mode, task_name)
336 | 
337 |         valid_tokens_len = torch.tensor([f.tokens_len for f in valid_features], dtype=torch.long)
338 |         valid_aspect_input_ids = torch.tensor([f.aspect_input_ids for f in valid_features], dtype=torch.long)
339 |         valid_aspect_input_mask = torch.tensor([f.aspect_input_mask for f in valid_features], dtype=torch.long)
340 |         valid_aspect_ids = torch.tensor([f.aspect_ids for f in valid_features], dtype=torch.long)
341 |         valid_aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in valid_features], dtype=torch.long)
342 |         valid_exist_imp_aspect = torch.tensor([f.exist_imp_aspect for f in valid_features], dtype=torch.long)
343 |         valid_exist_imp_opinion = torch.tensor([f.exist_imp_opinion for f in valid_features], dtype=torch.long)
344 | 
345 |         # Prepare optimizer
346 | 
347 |         logger.info("***** Running training *****")
348 |         logger.info("  Num examples = %d", len(train_examples))
349 |         logger.info("  Batch size = %d", args.train_batch_size)
350 | 
351 |         all_results = []
352 |             
353 |         model = model_dict[args.model_type].from_pretrained(args.bert_model, num_labels=num_labels)
354 |         if args.local_rank == 0:
355 |             torch.distributed.barrier()
356 | 
357 |         if args.fp16:
358 |             model.half()
359 | 
360 |         model.to(device)
361 | 
362 |         train_data = TensorDataset(tokens_len, aspect_input_ids, aspect_input_mask,
363 |         aspect_ids, aspect_segment_ids, exist_imp_aspect, exist_imp_opinion)
364 | 
365 |         if args.local_rank == -1:
366 |             train_sampler = RandomSampler(train_data)
367 |         else:
368 |             train_sampler = DistributedSampler(train_data)
369 |         train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args.train_batch_size)
370 | 
371 |         valid_gold = [valid_aspect_input_ids.numpy().tolist(), valid_gold]
372 | 
373 |         valid_data = TensorDataset(valid_tokens_len, valid_aspect_input_ids, valid_aspect_input_mask,
374 |         valid_aspect_ids, valid_aspect_segment_ids, valid_exist_imp_aspect, valid_exist_imp_opinion)
375 |         if args.local_rank == -1:
376 |             valid_sampler = SequentialSampler(valid_data)
377 |         else:
378 |             valid_sampler = DistributedSampler(valid_data)
379 |         valid_dataloader = DataLoader(valid_data, sampler=valid_sampler, batch_size=args.eval_batch_size)
380 | 
381 |         num_train_optimization_steps = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps) * args.num_train_epochs
382 | 
383 |         param_optimizer = list(model.named_parameters())
384 |         no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
385 |         optimizer_grouped_parameters = [
386 |             {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
387 |             {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
388 |             ]
389 | 
390 |         optimizer = BertAdam(optimizer_grouped_parameters,
391 |                                 lr=args.learning_rate,
392 |                                 warmup=args.warmup_proportion,
393 |                                 t_total=num_train_optimization_steps)
394 | 
395 |         max_macro_F1 = -1.0
396 | 
397 |         for _e in trange(int(args.num_train_epochs), desc="Epoch"):
398 |             model.train()
399 |             nb_tr_examples, nb_tr_steps = 0, 0
400 |             
401 |             for step, batch in enumerate(train_dataloader):
402 |                 batch = tuple(t.to(device) for t in batch)
403 |                 _tokens_len, _aspect_input_ids, _aspect_input_mask, _aspect_ids, _aspect_segment_ids, \
404 |                     _exist_imp_aspect, _exist_imp_opinion = batch
405 | 
406 |                 # define a new function to compute loss values for both output_modes
407 | 
408 |                 losses, logits = model(aspect_input_ids=_aspect_input_ids, aspect_labels=_aspect_ids,
409 |                 aspect_token_type_ids=_aspect_segment_ids, aspect_attention_mask=_aspect_input_mask,
410 |                 exist_imp_aspect=_exist_imp_aspect, exist_imp_opinion=_exist_imp_opinion)
411 | 
412 |                 if step % 30 == 0:
413 |                     logger.info('Total Loss is {} .'.format(losses[0]))
414 |                 step += 1
415 |                 loss = losses[0]
416 |                 if n_gpu > 1:
417 |                     loss = loss.mean() # mean() to average on multi-gpu.
418 |                 else:
419 |                     loss = loss
420 |                 if args.gradient_accumulation_steps > 1:
421 |                     loss = loss / args.gradient_accumulation_steps
422 |                     ae_loss = ae_loss / args.gradient_accumulation_steps
423 | 
424 |                 if args.fp16:
425 |                     # optimizer.backward(loss)
426 |                     optimizer.backward(ae_loss)
427 |                 else:
428 |                     loss.backward()
429 | 
430 |                 nb_tr_examples += _aspect_input_ids.size(0)
431 |                 nb_tr_steps += 1
432 |                 if (step + 1) % args.gradient_accumulation_steps == 0:
433 |                     if args.fp16:
434 |                         # modify learning rate with special warm up BERT uses
435 |                         # if args.fp16 is False, BertAdam is used that handles this automatically
436 |                         lr_this_step = args.learning_rate * warmup_linear.get_lr(global_step, args.warmup_proportion)
437 |                         for param_group in optimizer.param_groups:
438 |                             param_group['lr'] = lr_this_step
439 |                     optimizer.step()
440 |                     optimizer.zero_grad()
441 |                     global_step += 1
442 | 
443 |             model.eval()
444 |             result = pred_eval(_e, args, logger, tokenizer, model, valid_dataloader, valid_gold, label_list, device, task_name, eval_type='valid')
445 | 
446 |             if max_macro_F1 < result['micro-F1']:
447 |                 model_to_save = model.module if hasattr(model, 'module') else model  # Only save the model it-self
448 |                 dirs_name = args.output_dir
449 |                 if not os.path.exists(dirs_name):
450 |                     os.mkdir(dirs_name)
451 |                 output_model_file = os.path.join(dirs_name, WEIGHTS_NAME)
452 |                 output_config_file = os.path.join(dirs_name, CONFIG_NAME)
453 | 
454 |                 torch.save(model_to_save.state_dict(), output_model_file)
455 |                 model_to_save.config.to_json_file(output_config_file)
456 |                 tokenizer.save_vocabulary(dirs_name)
457 | 
458 |                 final_result = pred_eval(_e, args, logger, tokenizer, model, eval_dataloader, eval_gold, label_list, device, task_name, eval_type='test')
459 |                 max_macro_F1 = result['micro-F1']
460 | 
461 |     else:
462 |         model = model_dict[args.model_type].from_pretrained(args.bert_model, num_labels=num_labels)
463 |         if args.local_rank == 0:
464 |             torch.distributed.barrier()
465 | 
466 |         if args.fp16:
467 |             model.half()
468 | 
469 |         model.to(device)
470 |         model.eval()
471 |         final_result = pred_eval('load fine-tuned', args, logger, tokenizer, model, eval_dataloader, eval_gold, label_list, device, task_name, eval_type='test')
472 | 
473 |     output_eval_file = os.path.join(args.output_dir, "Test_results.txt")
474 |     with open(output_eval_file, "w") as writer:
475 |         logger.info("***** Test results *****")
476 |         for key in sorted(final_result.keys()):
477 |             logger.info("  %s = %s", key, str(final_result[key]))
478 |             writer.write("%s = %s\n" % (key, str(final_result[key])))
479 | 
480 | if __name__ == "__main__":
481 |     main()
482 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/run_step2.py:
--------------------------------------------------------------------------------
  1 | from __future__ import absolute_import, division, print_function
  2 | 
  3 | import argparse
  4 | import logging
  5 | import os
  6 | import sys
  7 | import random
  8 | from tqdm import tqdm, trange
  9 | import pdb
 10 | from collections import defaultdict, namedtuple
 11 | from manager import *
 12 | import math
 13 | import codecs as cs
 14 | from sklearn.model_selection import KFold
 15 | from dataset_utils import *
 16 | 
 17 | gm = GPUManager()
 18 | device = gm.auto_choice(mode=0)
 19 | os.environ["CUDA_VISIBLE_DEVICES"] = str(device)
 20 | 
 21 | import numpy as np
 22 | 
 23 | import torch
 24 | from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler,
 25 |                               TensorDataset)
 26 | from torch.nn import CrossEntropyLoss, MSELoss, MultiLabelSoftMarginLoss, BCEWithLogitsLoss
 27 | 
 28 | from modeling import CategorySentiClassification
 29 | 
 30 | # sys.path.insert(0, '/home/hjcai/8RTX/BERT/pytorch_pretrained_BERT')
 31 | # from modeling_for_share import BertForQuadABSAPairCSAO
 32 | from bert_utils.tokenization import BertTokenizer
 33 | from bert_utils.optimization import BertAdam, WarmupLinearSchedule
 34 | 
 35 | from run_classifier_dataset_utils import *
 36 | from eval_metrics import *
 37 | import gc
 38 | 
 39 | if sys.version_info[0] == 2:
 40 |     import cPickle as pickle
 41 | else:
 42 |     import pickle
 43 | import warnings
 44 | 
 45 | warnings.filterwarnings('ignore')
 46 | 
 47 | CONFIG_NAME = "config.json"
 48 | WEIGHTS_NAME = "pytorch_model.bin"
 49 | 
 50 | logger = logging.getLogger(__name__)
 51 | 
 52 | def main():
 53 |     parser = argparse.ArgumentParser()
 54 |     ## Required parameters
 55 |     parser.add_argument("--data_dir",
 56 |                         default=None,
 57 |                         type=str,
 58 |                         required=True,
 59 |                         help="The input source data dir. Should contain the .tsv files (or other data files) for the task.")
 60 |     parser.add_argument("--bert_model", default=None, type=str, required=True)
 61 |     parser.add_argument("--output_dir",
 62 |                         default=None,
 63 |                         type=str,
 64 |                         required=True,
 65 |                         help="The output directory where the model predictions and checkpoints will be written.")
 66 |     parser.add_argument("--task_name",
 67 |                         default=None,
 68 |                         type=str,
 69 |                         required=True,
 70 |                         help="The name of the task to train.")
 71 |     parser.add_argument("--domain_type",
 72 |                         default=None,
 73 |                         type=str,
 74 |                         required=True,
 75 |                         help="domain to choose.")
 76 | 
 77 |     parser.add_argument("--model_type",
 78 |                         default=None,
 79 |                         type=str,
 80 |                         required=True,
 81 |                         help="model to choose.")
 82 | 
 83 |     ## Other parameters
 84 |     parser.add_argument("--max_seq_length",
 85 |                         default=128,
 86 |                         type=int,
 87 |                         help="The maximum total input sequence length after WordPiece tokenization. \n"
 88 |                              "Sequences longer than this will be truncated, and sequences shorter \n"
 89 |                              "than this will be padded.")
 90 |     parser.add_argument("--do_train",
 91 |                         action='store_true',
 92 |                         help="Whether to run training.")
 93 |     parser.add_argument("--do_eval",
 94 |                         action='store_true',
 95 |                         help="Whether to run eval on the dev set.")
 96 |     parser.add_argument("--do_lower_case",
 97 |                         action='store_true',
 98 |                         help="Set this flag if you are using an uncased model.")
 99 |     parser.add_argument("--train_batch_size",
100 |                         default=32,
101 |                         type=int,
102 |                         help="Total batch size for training.")
103 |     parser.add_argument("--eval_batch_size",
104 |                         default=8,
105 |                         type=int,
106 |                         help="Total batch size for eval.")
107 |     parser.add_argument("--learning_rate",
108 |                         default=5e-5,
109 |                         type=float,
110 |                         help="The initial learning rate for Adam.")
111 |     parser.add_argument("--num_train_epochs",
112 |                         default=3.0,
113 |                         type=float,
114 |                         help="Total number of training epochs to perform.")
115 |     parser.add_argument("--warmup_proportion",
116 |                         default=0.1,
117 |                         type=float,
118 |                         help="Proportion of training to perform linear learning rate warmup for. "
119 |                              "E.g., 0.1 = 10%% of training.")
120 |     parser.add_argument("--local_rank",
121 |                         type=int,
122 |                         default=-1,
123 |                         help="local_rank for distributed training on gpus")
124 |     parser.add_argument('--seed',
125 |                         type=int,
126 |                         # default=42,
127 |                         default=13,
128 |                         help="random seed for initialization")
129 |     parser.add_argument('--gradient_accumulation_steps',
130 |                         type=int,
131 |                         default=1,
132 |                         help="Number of updates steps to accumulate before performing a backward/update pass.")
133 |     
134 |     args = parser.parse_args()
135 |     device = torch.device("cuda")
136 |     n_gpu = torch.cuda.device_count()
137 | 
138 |     logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
139 |                         datefmt = '%m/%d/%Y %H:%M:%S',
140 |                         level = logging.INFO)
141 | 
142 |     args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps
143 |     random.seed(args.seed)
144 |     np.random.seed(args.seed)
145 |     torch.manual_seed(args.seed)
146 |     torch.cuda.manual_seed_all(args.seed)
147 | 
148 |     if not os.path.exists(args.output_dir):
149 |         os.makedirs(args.output_dir)
150 |     
151 |     task_name = args.task_name.lower()
152 |     processor = processors[task_name]()
153 |     label_list = processor.get_labels(args.domain_type)
154 |     num_labels = len(label_list[0])
155 | 
156 |     tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)
157 |     model_dict = {
158 |         'categorysenti': CategorySentiClassification,
159 |     }
160 |     cate_dict = {label : i for i, label in enumerate(label_list[0])}
161 | 
162 |     global_step = 0
163 |     nb_tr_steps = 0
164 |     eval_quad_gold = []
165 |     train_quad_gold = []
166 |     eval_quad_text = []
167 |     train_quad_text = []
168 |     #for entity#attribute
169 |     if args.do_eval:
170 |         eval_examples = processor.get_dev_examples(args.data_dir, args.domain_type)
171 |         f = cs.open(args.data_dir+'/tokenized_data/'+args.domain_type+'_test_pair.tsv', 'r').readlines()
172 |         eval_quad_text, eval_quad_gold = read_pair_gold(f, args)
173 | 
174 |         eval_features = convert_examples_to_features2nd(
175 |             eval_examples, label_list, args.max_seq_length, tokenizer, task_name)
176 | 
177 |         eval_tokens_len = torch.tensor([f.tokens_len for f in eval_features], dtype=torch.long)
178 |         eval_aspect_input_ids = torch.tensor([f.aspect_input_ids for f in eval_features], dtype=torch.long)
179 |         eval_aspect_input_mask = torch.tensor([f.aspect_input_mask for f in eval_features], dtype=torch.long)
180 |         eval_aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in eval_features], dtype=torch.long)
181 |         eval_candidate_aspect = torch.tensor([f.candidate_aspect for f in eval_features], dtype=torch.long)
182 |         eval_candidate_opinion = torch.tensor([f.candidate_opinion for f in eval_features], dtype=torch.long)
183 |         eval_label_id = torch.tensor([f.label_id for f in eval_features], dtype=torch.long)
184 | 
185 |         # eval_gold = [eval_aspect_input_ids.numpy().tolist(), eval_quad_gold]
186 |         eval_gold = [eval_quad_text, eval_quad_gold]
187 | 
188 |         eval_data = TensorDataset(eval_tokens_len, eval_aspect_input_ids, eval_aspect_input_mask,
189 |         eval_aspect_segment_ids, eval_candidate_aspect, eval_candidate_opinion,
190 |         eval_label_id)
191 |         # Run prediction for full data
192 |         if args.local_rank == -1:
193 |             eval_sampler = SequentialSampler(eval_data)
194 |         else:
195 |             eval_sampler = DistributedSampler(eval_data)  # Note that this sampler samples randomly
196 |         eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args.eval_batch_size)
197 | 
198 |     
199 |     # Prepare data loader
200 |     train_examples = processor.get_train_examples(args.data_dir, args.domain_type)
201 |     train_features = convert_examples_to_features2nd(
202 |         train_examples, label_list, args.max_seq_length, tokenizer, task_name)
203 | 
204 |     tokens_len = torch.tensor([f.tokens_len for f in train_features], dtype=torch.long)
205 |     aspect_input_ids = torch.tensor([f.aspect_input_ids for f in train_features], dtype=torch.long)
206 |     aspect_input_mask = torch.tensor([f.aspect_input_mask for f in train_features], dtype=torch.long)
207 |     aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in train_features], dtype=torch.long)
208 |     candidate_aspect = torch.tensor([f.candidate_aspect for f in train_features], dtype=torch.long)
209 |     candidate_opinion = torch.tensor([f.candidate_opinion for f in train_features], dtype=torch.long)
210 |     label_id = torch.tensor([f.label_id for f in train_features], dtype=torch.long)
211 | 
212 |     valid_examples = processor.get_valid_examples(args.data_dir, args.domain_type)
213 |     valid_features = convert_examples_to_features2nd(
214 |         valid_examples, label_list, args.max_seq_length, tokenizer, task_name)
215 |     f = cs.open(args.data_dir+'/tokenized_data/'+args.domain_type+'_dev_pair.tsv', 'r').readlines()
216 |     valid_quad_text, valid_quad_gold = read_pair_gold(f, args)
217 | 
218 |     valid_tokens_len = torch.tensor([f.tokens_len for f in valid_features], dtype=torch.long)
219 |     valid_aspect_input_ids = torch.tensor([f.aspect_input_ids for f in valid_features], dtype=torch.long)
220 |     valid_aspect_input_mask = torch.tensor([f.aspect_input_mask for f in valid_features], dtype=torch.long)
221 |     valid_aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in valid_features], dtype=torch.long)
222 |     valid_candidate_aspect = torch.tensor([f.candidate_aspect for f in valid_features], dtype=torch.long)
223 |     valid_candidate_opinion = torch.tensor([f.candidate_opinion for f in valid_features], dtype=torch.long)
224 |     valid_label_id = torch.tensor([f.label_id for f in valid_features], dtype=torch.long)
225 | 
226 |     all_results = []
227 | 
228 | 
229 |     valid_gold = [valid_quad_text, valid_quad_gold]
230 | 
231 |     train_data = TensorDataset(tokens_len, aspect_input_ids, aspect_input_mask,
232 |     aspect_segment_ids, candidate_aspect, candidate_opinion, 
233 |     label_id)
234 | 
235 |     if args.local_rank == -1:
236 |         train_sampler = RandomSampler(train_data)
237 |     else:
238 |         train_sampler = DistributedSampler(train_data)
239 |     train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args.train_batch_size)
240 | 
241 |     valid_data = TensorDataset(valid_tokens_len, valid_aspect_input_ids, valid_aspect_input_mask,
242 |     valid_aspect_segment_ids, valid_candidate_aspect, valid_candidate_opinion, 
243 |     valid_label_id)
244 |     
245 |     if args.local_rank == -1:
246 |         valid_sampler = SequentialSampler(valid_data)
247 |     else:
248 |         valid_sampler = DistributedSampler(valid_data)
249 |     valid_dataloader = DataLoader(valid_data, sampler=valid_sampler, batch_size=args.eval_batch_size)
250 | 
251 |     if args.do_train:
252 |         logger.info("***** Running training *****")
253 | 
254 |         num_train_optimization_steps = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps) * args.num_train_epochs
255 | 
256 |         model = model_dict[args.model_type].from_pretrained(args.bert_model, num_labels=num_labels)
257 |         param_optimizer = list(model.named_parameters())
258 |         no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight']
259 |         optimizer_grouped_parameters = [
260 |             {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01},
261 |             {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0}
262 |             ]
263 | 
264 |         optimizer = BertAdam(optimizer_grouped_parameters,
265 |                                 lr=args.learning_rate,
266 |                                 warmup=args.warmup_proportion,
267 |                                 t_total=num_train_optimization_steps)
268 | 
269 |         if args.local_rank == 0:
270 |             torch.distributed.barrier()
271 | 
272 |         model.to(device)
273 |         max_macro_F1 = -1.0
274 | 
275 |         for _e in trange(int(args.num_train_epochs), desc="Epoch"):
276 |             model.train()
277 |             nb_tr_examples, nb_tr_steps = 0, 0
278 |             
279 |             for step, batch in enumerate(train_dataloader):
280 |                 batch = tuple(t.to(device) for t in batch)
281 |                 _tokens_len, _aspect_input_ids, _aspect_input_mask, _aspect_segment_ids, _candidate_aspect, \
282 |                 _candidate_opinion, _label_id = batch
283 | 
284 |                 # define a new function to compute loss values for both output_modes
285 | 
286 |                 losses, logits = model(tokenizer, _e, aspect_input_ids=_aspect_input_ids,
287 |                 aspect_token_type_ids=_aspect_segment_ids, aspect_attention_mask=_aspect_input_mask,
288 |                 candidate_aspect=_candidate_aspect, candidate_opinion=_candidate_opinion, label_id=_label_id)
289 | 
290 |                 if step % 10 == 0:
291 |                     logger.info('Total Loss is {} .'.format(losses[0]))
292 |                 step += 1
293 |                 loss = losses[0]
294 |                 if n_gpu > 1:
295 |                     loss = loss.mean() # mean() to average on multi-gpu.
296 |                 else:
297 |                     loss = loss
298 |                 if args.gradient_accumulation_steps > 1:
299 |                     loss = loss / args.gradient_accumulation_steps
300 | 
301 |                 loss.backward()
302 | 
303 |                 nb_tr_examples += _aspect_input_ids.size(0)
304 |                 nb_tr_steps += 1
305 |                 if (step + 1) % args.gradient_accumulation_steps == 0:
306 |                     optimizer.step()
307 |                     optimizer.zero_grad()
308 |                     global_step += 1
309 | 
310 |             model.eval()
311 |             result = pair_eval(_e, args, logger, tokenizer, model, valid_dataloader, valid_gold, 
312 |             label_list, device, task_name, eval_type='valid')
313 | 
314 |             if max_macro_F1 < result['micro-F1']:
315 |                 model_to_save = model.module if hasattr(model, 'module') else model  # Only save the model it-self
316 |                 dirs_name = args.output_dir
317 |                 if not os.path.exists(dirs_name):
318 |                     os.mkdir(dirs_name)
319 |                 output_model_file = os.path.join(dirs_name, WEIGHTS_NAME)
320 |                 output_config_file = os.path.join(dirs_name, CONFIG_NAME)
321 | 
322 |                 torch.save(model_to_save.state_dict(), output_model_file)
323 |                 model_to_save.config.to_json_file(output_config_file)
324 |                 tokenizer.save_vocabulary(dirs_name)
325 | 
326 |                 final_result = pair_eval(_e, args, logger, tokenizer, model, eval_dataloader, eval_gold, 
327 |                 label_list, device, task_name, eval_type='test')
328 |                 max_macro_F1 = result['micro-F1']
329 |     else:
330 |         model = model_dict[args.model_type].from_pretrained(args.bert_model, num_labels=num_labels)
331 |         if args.local_rank == 0:
332 |             torch.distributed.barrier()
333 | 
334 |         model.to(device)
335 |         model.eval()
336 |         final_result = pair_eval('load fine-tuned', args, logger, tokenizer, model, eval_dataloader, eval_gold, 
337 |         label_list, device, task_name, eval_type='test')
338 | 
339 |     output_eval_file = os.path.join(args.output_dir, "Test_results.txt")
340 |     with open(output_eval_file, "w") as writer:
341 |         logger.info("***** Test results *****")
342 |         for key in sorted(final_result.keys()):
343 |             logger.info("  %s = %s", key, str(final_result[key]))
344 |             writer.write("%s = %s\n" % (key, str(final_result[key])))
345 | 
346 | if __name__ == "__main__":
347 |     main()
348 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/tokenized_data/get_1st_pairs.py:
--------------------------------------------------------------------------------
 1 | #coding=utf-8
 2 | 
 3 | import codecs as cs
 4 | import os
 5 | import sys
 6 | 
 7 | base_dir = sys.argv[1]
 8 | domian_type = sys.argv[2]
 9 | 
10 | cur_dir = base_dir+'/output/Extract-Classify-QUAD/'+domian_type
11 | 
12 | if not os.path.exists(cur_dir+'_1st'):
13 |     os.makedirs(cur_dir+'_1st')
14 | 
15 | f = cs.open(cur_dir+'_1st'+'/pred4pipeline.txt', 'r').readlines()
16 | wf = cs.open(base_dir+'/ACOS-main/Extract-Classify-ACOS/tokenized_data/'+domian_type+'_test_pair_1st.tsv', 'w')
17 | 
18 | for line in f:
19 |     asp = []; opi = []
20 |     line = line.strip().split('\t')
21 |     if len(line) <= 1:
22 |         continue
23 |     text = line[0]
24 |     af = 0
25 |     of = 0
26 |     for ele in line[1:]:
27 |         if ele.startswith('a'):
28 |             asp.append(ele[2:])
29 |             af = 1
30 |         else:
31 |             opi.append(ele[2:])
32 |             of = 1
33 |     if af == 0:
34 |         asp.append('-1,-1')
35 |     if of == 0:
36 |         opi.append('-1,-1')
37 |     if len(asp)>0 and len(opi)>0:
38 |         pred = []
39 | 
40 |         for pa in asp:
41 |             ast, aed = int(pa.split(',')[0]), int(pa.split(',')[1])
42 |             for po in opi:
43 |                 ost, oed = int(po.split(',')[0]), int(po.split(',')[1])
44 |                 pred.append([pa, po])
45 |         for ele in pred:  
46 |             wf.write(text+'####'+ele[0]+' '+ele[1]+'\n')
47 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/tokenized_data/rest16_dev_pair.tsv:
--------------------------------------------------------------------------------
  1 | ca n ' t wait wait for my next visit .####-1,-1 -1,-1	RESTAURANT#GENERAL#2
  2 | their sake list was extensive , but we were looking for purple haze , which was n ' t listed but made for us upon request !####1,3 4,5	DRINKS#STYLE_OPTIONS#2
  3 | their sake list was extensive , but we were looking for purple haze , which was n ' t listed but made for us upon request !####-1,-1 -1,-1	SERVICE#GENERAL#2
  4 | the spicy tuna roll was unusually good and the rock shrimp te ##mp ##ura was awesome , great app ##eti ##zer to share !####1,4 6,7	FOOD#QUALITY#2
  5 | the spicy tuna roll was unusually good and the rock shrimp te ##mp ##ura was awesome , great app ##eti ##zer to share !####9,14 15,16	FOOD#QUALITY#2
  6 | we love th pink pony .####3,5 1,2	RESTAURANT#GENERAL#2
  7 | this place has got to be the best japanese restaurant in the new york area .####1,2 7,8	RESTAURANT#GENERAL#2
  8 | i tend to judge a su ##shi restaurant by its sea ur ##chin , which was heavenly at su ##shi rose .####10,13 16,17	FOOD#QUALITY#2
  9 | the prix fix ##e menu is worth every penny and you get more than enough ( both in quantity and quality ) .####1,5 6,7	FOOD#QUALITY#2 FOOD#STYLE_OPTIONS#2 FOOD#PRICES#2
 10 | the food here is rather good , but only if you like to wait for it .####1,2 5,6	FOOD#QUALITY#2
 11 | the food here is rather good , but only if you like to wait for it .####-1,-1 -1,-1	SERVICE#GENERAL#0
 12 | also , specify if you like your food spicy - its rather bland if you do n ' t .####7,8 12,13	FOOD#QUALITY#0
 13 | the am ##bie ##nce is pretty and nice for conversation , so a casual lunch here would probably be best .####1,4 5,6	AMBIENCE#GENERAL#2
 14 | the am ##bie ##nce is pretty and nice for conversation , so a casual lunch here would probably be best .####1,4 7,8	AMBIENCE#GENERAL#2
 15 | the am ##bie ##nce is pretty and nice for conversation , so a casual lunch here would probably be best .####-1,-1 19,20	RESTAURANT#MISCELLANEOUS#2
 16 | it was horrible .####-1,-1 2,3	RESTAURANT#GENERAL#0
 17 | have been dozens of times and never failed to enjoy the experience .####-1,-1 9,10	RESTAURANT#GENERAL#2
 18 | make sure you try this place as often as you can .####5,6 7,8	RESTAURANT#GENERAL#2
 19 | i had a huge group for my birthday and we were well taken care of .####-1,-1 11,12	SERVICE#GENERAL#2
 20 | get the tuna of ga ##ri .####2,6 -1,-1	FOOD#QUALITY#2
 21 | make sure you have the spicy sc ##all ##op roll . . .####5,10 -1,-1	FOOD#QUALITY#2
 22 | rag ##a ' s is a romantic , cozy restaurant .####0,4 6,7	AMBIENCE#GENERAL#2
 23 | rag ##a ' s is a romantic , cozy restaurant .####0,4 8,9	AMBIENCE#GENERAL#2
 24 | i had a great time at je ##ky ##ll and hyde !####6,11 3,4	RESTAURANT#GENERAL#2
 25 | i am bringing my whole family back next time .####-1,-1 -1,-1	RESTAURANT#MISCELLANEOUS#2
 26 | fine dining restaurant quality .####1,2 0,1	FOOD#QUALITY#2
 27 | we will return many times for this oasis in mid - town .####-1,-1 -1,-1	RESTAURANT#GENERAL#2
 28 | the food options rule .####1,2 -1,-1	FOOD#STYLE_OPTIONS#2
 29 | my husband and i thou ##gt it would be great to go to the je ##ky ##ll and hyde pub for our anniversary , and to our surprise it was fantastic .####14,20 9,10	RESTAURANT#GENERAL#2
 30 | my husband and i thou ##gt it would be great to go to the je ##ky ##ll and hyde pub for our anniversary , and to our surprise it was fantastic .####14,20 30,31	RESTAURANT#GENERAL#2
 31 | please take my advice , go and try this place .####9,10 -1,-1	RESTAURANT#GENERAL#2
 32 | they were served warm and had a soft fluffy interior .####-1,-1 3,4	FOOD#QUALITY#2
 33 | they were served warm and had a soft fluffy interior .####-1,-1 7,8	FOOD#QUALITY#2
 34 | but they do .####-1,-1 -1,-1	SERVICE#GENERAL#2
 35 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .####1,2 0,1	RESTAURANT#GENERAL#2
 36 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .####1,2 3,4	RESTAURANT#GENERAL#2
 37 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .####12,13 14,15	FOOD#QUALITY#2
 38 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .####12,13 18,19	FOOD#QUALITY#2
 39 | hats off to the chef .####4,5 0,2	FOOD#QUALITY#2
 40 | this is some really good , inexpensive su ##shi .####7,9 4,5	FOOD#QUALITY#2
 41 | this is some really good , inexpensive su ##shi .####7,9 6,7	FOOD#PRICES#2
 42 | this place is always very crowded and popular .####1,2 5,6	RESTAURANT#MISCELLANEOUS#2
 43 | this place is always very crowded and popular .####1,2 7,8	RESTAURANT#MISCELLANEOUS#2
 44 | and evaluated on those terms past ##is is simply wonderful .####5,7 9,10	RESTAURANT#GENERAL#2
 45 | i ' m still mad that i had to pay for lou ##sy food .####13,14 11,13	FOOD#QUALITY#0
 46 | the hang ##er steak was like rubber and the tuna was flavor ##less not to mention it tasted like it had just been tha ##wed .####1,4 6,7	FOOD#QUALITY#0
 47 | the hang ##er steak was like rubber and the tuna was flavor ##less not to mention it tasted like it had just been tha ##wed .####9,10 11,13	FOOD#QUALITY#0
 48 | big thumbs up !####-1,-1 1,3	RESTAURANT#GENERAL#2
 49 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .####1,2 5,6	FOOD#QUALITY#2
 50 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .####3,4 5,6	DRINKS#QUALITY#2
 51 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .####8,9 5,6	SERVICE#GENERAL#2
 52 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .####20,23 -1,-1	AMBIENCE#GENERAL#2
 53 | it is one the nice ##st outdoor restaurants i have ever seen in ny - - i am from italy and this place rivals the ones in my country .####6,8 4,6	AMBIENCE#GENERAL#2
 54 | it is simply amazing .####-1,-1 3,4	FOOD#QUALITY#2
 55 | beautiful experience .####-1,-1 0,1	RESTAURANT#GENERAL#2
 56 | the menu is very limited - i think we counted 4 or 5 en ##tree ##s .####1,2 4,5	FOOD#STYLE_OPTIONS#0
 57 | we will go back every time we are in the city .####-1,-1 -1,-1	RESTAURANT#GENERAL#2
 58 | the characters really make for an enjoyable experience .####1,2 6,7	AMBIENCE#GENERAL#2
 59 | however , i think je ##ck ##ll and hyde ##s t is one of those places that is fun to do once .####4,10 18,19	RESTAURANT#GENERAL#2
 60 | we had a good time .####-1,-1 3,4	RESTAURANT#GENERAL#2
 61 | a little over ##pr ##ice ##d but worth it once you take a bite .####-1,-1 2,6	FOOD#PRICES#0
 62 | a little over ##pr ##ice ##d but worth it once you take a bite .####-1,-1 7,8	FOOD#QUALITY#2
 63 | i have lived in japan for 7 years and the taste of the food and the feel of the restaurant is like being back in japan .####13,14 -1,-1	FOOD#QUALITY#2
 64 | i have lived in japan for 7 years and the taste of the food and the feel of the restaurant is like being back in japan .####16,17 -1,-1	AMBIENCE#GENERAL#2
 65 | check out the secret back room .####4,6 3,4	AMBIENCE#GENERAL#2
 66 | thank you emilio .####2,3 -1,-1	RESTAURANT#GENERAL#2
 67 | the food was authentic .####1,2 3,4	FOOD#QUALITY#2
 68 | fantastic !####-1,-1 0,1	RESTAURANT#GENERAL#2
 69 | but the staff was so horrible to us .####2,3 5,6	SERVICE#GENERAL#0
 70 | decor is nice though service can be spot ##ty .####0,1 2,3	AMBIENCE#GENERAL#2
 71 | decor is nice though service can be spot ##ty .####4,5 7,9	SERVICE#GENERAL#0
 72 | just aw ##some .####-1,-1 1,3	FOOD#QUALITY#2
 73 | i had their eggs benedict for br ##un ##ch , which were the worst in my entire life , i tried removing the ho ##llon ##dai ##se sauce completely that was how failed it was .####3,5 13,14	FOOD#QUALITY#0
 74 | with the theater 2 blocks away we had a delicious meal in a beautiful room .####10,11 9,10	FOOD#QUALITY#2
 75 | with the theater 2 blocks away we had a delicious meal in a beautiful room .####14,15 13,14	AMBIENCE#GENERAL#2
 76 | with the theater 2 blocks away we had a delicious meal in a beautiful room .####-1,-1 -1,-1	LOCATION#GENERAL#2
 77 | the service was at ##ten ##tive .####1,2 3,6	SERVICE#GENERAL#2
 78 | pat ##ro ##on features a nice cigar bar and has great staff .####6,8 5,6	AMBIENCE#GENERAL#2
 79 | pat ##ro ##on features a nice cigar bar and has great staff .####11,12 10,11	SERVICE#GENERAL#2
 80 | ll ##oo ##v ##ve this place .####5,6 0,4	RESTAURANT#GENERAL#2
 81 | the menu is limited but almost all of the dishes are excellent .####1,2 3,4	FOOD#STYLE_OPTIONS#0
 82 | the menu is limited but almost all of the dishes are excellent .####9,10 11,12	FOOD#QUALITY#2
 83 | wine list is extensive without being over - priced .####0,2 3,4	DRINKS#STYLE_OPTIONS#2
 84 | wine list is extensive without being over - priced .####0,2 4,9	DRINKS#PRICES#2
 85 | the food was very good , a great deal , and the place its self was great .####1,2 4,5	FOOD#QUALITY#2
 86 | the food was very good , a great deal , and the place its self was great .####1,2 6,7	FOOD#PRICES#2
 87 | the food was very good , a great deal , and the place its self was great .####12,13 16,17	AMBIENCE#GENERAL#2
 88 | the wait staff is very fr ##ein ##dly , they make it feel like you ' re eating in a fr ##ein ##dly little european town .####1,3 5,8	SERVICE#GENERAL#2
 89 | the whole set up is truly un ##pro ##fe ##ssion ##al and i wish cafe noir would get some good staff , because despite the current one this is a great place .####20,21 6,21	SERVICE#GENERAL#0
 90 | the whole set up is truly un ##pro ##fe ##ssion ##al and i wish cafe noir would get some good staff , because despite the current one this is a great place .####14,16 30,31	RESTAURANT#GENERAL#2
 91 | you should pass on the cal ##ama ##ri .####5,8 -1,-1	FOOD#QUALITY#0
 92 | when asked about how a certain dish was prepared in comparison to a similar at other thai restaurants , he replied this is not mcdonald ' s , every place makes things differently####-1,-1 -1,-1	SERVICE#GENERAL#0
 93 | everything was wonderful ; food , drinks , staff , mile ##au .####4,5 2,3	FOOD#QUALITY#2
 94 | everything was wonderful ; food , drinks , staff , mile ##au .####6,7 2,3	DRINKS#QUALITY#2
 95 | everything was wonderful ; food , drinks , staff , mile ##au .####8,9 2,3	SERVICE#GENERAL#2
 96 | everything was wonderful ; food , drinks , staff , mile ##au .####10,12 2,3	AMBIENCE#GENERAL#2
 97 | everything was wonderful ; food , drinks , staff , mile ##au .####-1,-1 2,3	RESTAURANT#GENERAL#2
 98 | i would highly recommend this place !####5,6 3,4	RESTAURANT#GENERAL#2
 99 | fresh ingredients and everything is made to order .####1,2 0,1	FOOD#QUALITY#2
100 | fresh ingredients and everything is made to order .####-1,-1 -1,-1	FOOD#QUALITY#2
101 | friendly staff that actually lets you enjoy your meal and the company you ' re with .####1,2 0,1	SERVICE#GENERAL#2
102 | i will definitely be going back .####-1,-1 -1,-1	RESTAURANT#GENERAL#2
103 | a great choice at any cost and a great deal .####-1,-1 8,9	RESTAURANT#GENERAL#2
104 | a great choice at any cost and a great deal .####-1,-1 1,2	RESTAURANT#PRICES#2
105 | tha ##lia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up .####8,9 7,8	SERVICE#GENERAL#2
106 | tha ##lia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up .####14,15 15,22	FOOD#QUALITY#0
107 | tha ##lia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up .####0,2 4,5	AMBIENCE#GENERAL#2
108 | i ordered the smoked salmon and roe app ##eti ##zer and it was off flavor .####3,10 13,15	FOOD#QUALITY#0
109 | the food is good , especially their more basic dishes , and the drinks are delicious .####1,2 3,4	FOOD#QUALITY#2
110 | the food is good , especially their more basic dishes , and the drinks are delicious .####8,10 3,4	FOOD#QUALITY#2
111 | the food is good , especially their more basic dishes , and the drinks are delicious .####13,14 15,16	DRINKS#QUALITY#2
112 | the big complaint : no toast ##ing available .####-1,-1 2,3	SERVICE#GENERAL#0
113 | i ' ve been many time and have never been disappointed .####-1,-1 8,11	RESTAURANT#GENERAL#2
114 | the turkey burger ##s are scary !####1,4 5,6	FOOD#QUALITY#0
115 | for authentic thai food , look no further than too ##ns .####2,4 1,2	FOOD#QUALITY#2
116 | try the pad thai , or sample anything on the app ##eti ##zer menu . . . they ' re all delicious .####2,4 21,22	FOOD#QUALITY#2 FOOD#QUALITY#2
117 | service was good and food is wonderful .####0,1 2,3	SERVICE#GENERAL#2
118 | service was good and food is wonderful .####4,5 6,7	FOOD#QUALITY#2
119 | it is definitely a good spot for snacks and chat .####5,6 4,5	RESTAURANT#GENERAL#2
120 | do not get the go go hamburger ##s , no matter what the reviews say .####4,8 -1,-1	FOOD#QUALITY#0
121 | steamed fresh so brought hot hot hot to your table .####-1,-1 1,2	FOOD#QUALITY#2
122 | small serving ##s for main en ##tree , i had salmon ( wasn ##t impressed ) girlfriend had chicken , it was good .####10,11 12,15	FOOD#QUALITY#0
123 | small serving ##s for main en ##tree , i had salmon ( wasn ##t impressed ) girlfriend had chicken , it was good .####18,19 22,23	FOOD#QUALITY#2
124 | small serving ##s for main en ##tree , i had salmon ( wasn ##t impressed ) girlfriend had chicken , it was good .####1,7 0,1	FOOD#GENERAL#0
125 | cute and decorative .####-1,-1 0,1	AMBIENCE#GENERAL#2
126 | cute and decorative .####-1,-1 2,3	AMBIENCE#GENERAL#2
127 | excellent spot for holiday get together ##s with co - workers or friends that you have n ' t seen in a while .####1,2 0,1	RESTAURANT#MISCELLANEOUS#2
128 | what a great place !####3,4 2,3	RESTAURANT#GENERAL#2
129 | not the typical nyc gi ##mm ##ick theme restaurant .####8,9 0,3	AMBIENCE#GENERAL#2
130 | service was very prompt but slightly rushed .####0,1 3,4	SERVICE#GENERAL#2
131 | service was very prompt but slightly rushed .####0,1 6,7	SERVICE#GENERAL#2
132 | i really liked this place .####4,5 2,3	RESTAURANT#GENERAL#2
133 | everything i had was good , and i ' m a eater .####-1,-1 4,5	FOOD#QUALITY#2
134 | i also recommend the rice dishes or the different varieties of cong ##ee ( rice por ##ridge ) .####4,6 2,3	FOOD#QUALITY#2
135 | i also recommend the rice dishes or the different varieties of cong ##ee ( rice por ##ridge ) .####11,18 2,3	FOOD#QUALITY#2
136 | i recently tried su ##an and i thought that it was great .####3,5 11,12	RESTAURANT#GENERAL#2
137 | have been several times and it never di ##ssa ##points .####-1,-1 6,10	RESTAURANT#GENERAL#2
138 | this place is a great bargain .####1,2 4,6	RESTAURANT#PRICES#2
139 | people are always friendly .####0,1 3,4	SERVICE#GENERAL#2
140 | the best pad thai i ' ve ever had .####2,4 1,2	FOOD#QUALITY#2
141 | would n ' t rec ##ome ##nd it for dinner !####-1,-1 1,7	RESTAURANT#GENERAL#0
142 | ask for us ##ha , the nice ##st bartender in manhattan .####2,4 6,8	SERVICE#GENERAL#2
143 | the food ' s as good as ever .####1,2 5,6	FOOD#QUALITY#2
144 | best drums ##tick ##s over rice and sour spicy soup in town !####1,6 0,1	FOOD#QUALITY#2
145 | best drums ##tick ##s over rice and sour spicy soup in town !####7,10 0,1	FOOD#QUALITY#2
146 | for those that go once and do n ' t enjoy it , all i can say is that they just do n ' t get it .####-1,-1 -1,-1	RESTAURANT#MISCELLANEOUS#2
147 | not worth it .####-1,-1 0,2	FOOD#PRICES#0
148 | this dish is my favorite and i always get it when i go there and never get tired of it .####1,2 4,5	FOOD#QUALITY#2
149 | big wong is a great place to eat and fill your stomach .####0,2 4,5	RESTAURANT#GENERAL#2
150 | the food is okay and the prices here are med ##io ##cre .####1,2 3,4	FOOD#QUALITY#1
151 | the food is okay and the prices here are med ##io ##cre .####-1,-1 9,12	RESTAURANT#PRICES#1
152 | me and my girls will definitely go back .####-1,-1 -1,-1	RESTAURANT#GENERAL#2
153 | the food is great .####1,2 3,4	FOOD#QUALITY#2
154 | la rosa waltz ##es in , and i think they are doing it the best .####0,2 14,15	FOOD#QUALITY#2
155 | interesting selection , good wines , service fine , fun decor .####4,5 3,4	DRINKS#QUALITY#2
156 | interesting selection , good wines , service fine , fun decor .####6,7 7,8	SERVICE#GENERAL#2
157 | interesting selection , good wines , service fine , fun decor .####10,11 9,10	AMBIENCE#GENERAL#2
158 | interesting selection , good wines , service fine , fun decor .####1,2 0,1	FOOD#STYLE_OPTIONS#2
159 | the food here was med ##io ##cre at best .####1,2 4,7	FOOD#QUALITY#0
160 | the cypriot restaurant has a lot going for it .####1,3 -1,-1	RESTAURANT#GENERAL#2
161 | will comeback for sure , wish they have it here in la . .####-1,-1 -1,-1	RESTAURANT#GENERAL#2
162 | the space kind of feels like an alice in wonderland setting , without it trying to be that .####1,2 -1,-1	AMBIENCE#GENERAL#0
163 | i paid just about $ 60 for a good meal , though : )####9,10 8,9	FOOD#QUALITY#2
164 | i paid just about $ 60 for a good meal , though : )####9,10 -1,-1	FOOD#PRICES#2
165 | love it .####-1,-1 0,1	RESTAURANT#GENERAL#2
166 | the place is a bit hidden away , but once you get there , it ' s all worth it .####1,2 5,7	LOCATION#GENERAL#2
167 | the place is a bit hidden away , but once you get there , it ' s all worth it .####1,2 18,19	LOCATION#GENERAL#2
168 | i love their chicken pasta can ##t remember the name but is soo ##o good####3,5 1,2	FOOD#QUALITY#2
169 | i love their chicken pasta can ##t remember the name but is soo ##o good####3,5 14,15	FOOD#QUALITY#2
170 | way below average####-1,-1 1,3	RESTAURANT#GENERAL#0
171 | i think the pizza is so over ##rated and was under cooked .####3,4 6,8	FOOD#QUALITY#0
172 | i think the pizza is so over ##rated and was under cooked .####3,4 10,12	FOOD#QUALITY#0
173 | i love this place####3,4 1,2	RESTAURANT#GENERAL#2
174 | the service was quick and friendly .####1,2 3,4	SERVICE#GENERAL#2
175 | the service was quick and friendly .####1,2 5,6	SERVICE#GENERAL#2
176 | i thought the restaurant was nice and clean .####3,4 5,6	RESTAURANT#GENERAL#2
177 | i thought the restaurant was nice and clean .####3,4 7,8	AMBIENCE#GENERAL#2
178 | chicken ter ##iya ##ki had tomato or pi ##mento ##s on top ? ?####0,4 -1,-1	FOOD#STYLE_OPTIONS#0
179 | the waitress was not at ##ten ##tive at all .####1,2 3,7	SERVICE#GENERAL#0
180 | just go to ya ##mat ##o and order the red dragon roll .####3,6 -1,-1	RESTAURANT#GENERAL#2
181 | just go to ya ##mat ##o and order the red dragon roll .####9,12 -1,-1	FOOD#QUALITY#2
182 | favorite su ##shi in nyc####1,3 0,1	FOOD#QUALITY#2
183 | the rolls are creative and i have yet to find another su ##shi place that serves up more in ##vent ##ive yet delicious japanese food .####1,2 3,4	FOOD#STYLE_OPTIONS#2
184 | the rolls are creative and i have yet to find another su ##shi place that serves up more in ##vent ##ive yet delicious japanese food .####23,25 18,21	FOOD#STYLE_OPTIONS#2
185 | the rolls are creative and i have yet to find another su ##shi place that serves up more in ##vent ##ive yet delicious japanese food .####23,25 22,23	FOOD#QUALITY#2
186 | my que ##sa ##di ##lla tasted like it had been made by a three - year old with no sense of proportion or flavor .####1,5 -1,-1	FOOD#QUALITY#0 FOOD#STYLE_OPTIONS#0
187 | save your money and your time and go somewhere else .####-1,-1 -1,-1	RESTAURANT#GENERAL#0
188 | the spin ##ach is fresh , def ##inate ##ly not frozen . . .####1,3 4,5	FOOD#QUALITY#2
189 | decor needs to be upgraded but the food is amazing !####0,1 4,5	AMBIENCE#GENERAL#0
190 | decor needs to be upgraded but the food is amazing !####7,8 9,10	FOOD#QUALITY#2
191 | my daughter ' s wedding reception at water ' s edge received the highest compliment ##s from our guests .####7,11 13,16	RESTAURANT#MISCELLANEOUS#2
192 | the high prices you ' re going to pay is for the view not for the food .####12,13 1,3	LOCATION#GENERAL#1
193 | the high prices you ' re going to pay is for the view not for the food .####16,17 1,3	FOOD#QUALITY#0
194 | the high prices you ' re going to pay is for the view not for the food .####-1,-1 1,3	RESTAURANT#PRICES#0
195 | not what i would expect for the price and prestige of this location .####12,13 -1,-1	RESTAURANT#PRICES#1 RESTAURANT#MISCELLANEOUS#1
196 | not what i would expect for the price and prestige of this location .####-1,-1 -1,-1	SERVICE#GENERAL#0
197 | the food was ok and fair nothing to go crazy .####1,2 3,4	FOOD#QUALITY#1
198 | the food was ok and fair nothing to go crazy .####1,2 5,6	FOOD#QUALITY#1
199 | impressed . . .####-1,-1 0,1	RESTAURANT#GENERAL#2
200 | subtle food and service####1,2 0,1	FOOD#QUALITY#2
201 | subtle food and service####3,4 0,1	SERVICE#GENERAL#2
202 | food took some time to prepare , all worth waiting for .####0,1 8,9	FOOD#QUALITY#2
203 | food took some time to prepare , all worth waiting for .####-1,-1 -1,-1	SERVICE#GENERAL#1
204 | great find in the west village !####-1,-1 0,1	RESTAURANT#GENERAL#2
205 | when the bill came , nothing was com ##ped , so i told the manager very politely that we were willing to pay for the wine , but i did n ' t think i should have to pay for food with a mag ##go ##t in it .####-1,-1 -1,-1	SERVICE#GENERAL#0
206 | amazing food .####1,2 0,1	FOOD#QUALITY#2
207 | rather than preparing vegetarian dish , the chef presented me with a plate of steamed vegetables ( minus sauce , season ##ing , or any form or aesthetic presentation ) .####3,5 -1,-1	FOOD#QUALITY#0 FOOD#STYLE_OPTIONS#0
208 | rather than preparing vegetarian dish , the chef presented me with a plate of steamed vegetables ( minus sauce , season ##ing , or any form or aesthetic presentation ) .####7,8 -1,-1	SERVICE#GENERAL#0
209 | the only thing that strikes you is the decor ? ( not very pleasant ) .####8,9 11,14	AMBIENCE#GENERAL#0
210 | the martini ##s are amazing and very fairly priced .####1,3 4,5	DRINKS#QUALITY#2
211 | the martini ##s are amazing and very fairly priced .####1,3 7,9	DRINKS#PRICES#2
212 | i wanted to go there to see if it was worth it and sadly , curious ##ity got the best of me and i paid dear ##ly for it .####-1,-1 -1,-1	RESTAURANT#GENERAL#0 RESTAURANT#PRICES#0
213 | the environment is very upscale and you will see a lot of rich guys with trophy wives or just highly paid escorts .####1,2 4,5	AMBIENCE#GENERAL#1
214 | the environment is very upscale and you will see a lot of rich guys with trophy wives or just highly paid escorts .####-1,-1 4,5	RESTAURANT#MISCELLANEOUS#1
215 | however , our $ 14 drinks were were horrible !####5,6 8,9	DRINKS#QUALITY#0 DRINKS#PRICES#0
216 | once we finally got a table , despite indicating we wanted an alla cart ##e menu we were pushed into a table that was only price fixed !####-1,-1 -1,-1	SERVICE#GENERAL#0
217 | i do n ' t appreciate places or people that try to drive up the bill without the patron ' s knowledge so that was a huge turn ##off ( more than the price ) .####-1,-1 -1,-1	SERVICE#GENERAL#0 RESTAURANT#PRICES#0
218 | eat at your own risk .####-1,-1 -1,-1	FOOD#QUALITY#0
219 | the service was spectacular as the waiter knew everything about the menu and his recommendations were amazing !####1,2 3,4	SERVICE#GENERAL#2
220 | the service was spectacular as the waiter knew everything about the menu and his recommendations were amazing !####6,7 16,17	SERVICE#GENERAL#2
221 | the sake ’ s complimented the courses very well and is successfully easing me into the sake world .####1,4 8,9	DRINKS#QUALITY#2
222 | maybe it was the great company ( i had friends visiting from phil ##ly – yes , it was not a date this time ) or the super reasonable price point , but i just can ’ t say enough good things about this brass ##erie .####44,46 40,41	RESTAURANT#GENERAL#2 RESTAURANT#PRICES#2
223 | i tried a couple other dishes but was n ' t too impressed .####5,6 8,13	FOOD#QUALITY#1
224 | the family seafood en ##tree was very good .####1,5 7,8	FOOD#QUALITY#2
225 | the food they serve is not comforting , not app ##eti ##zing and un ##co ##oked .####1,2 5,7	FOOD#QUALITY#0
226 | the food they serve is not comforting , not app ##eti ##zing and un ##co ##oked .####1,2 8,12	FOOD#QUALITY#0
227 | the food they serve is not comforting , not app ##eti ##zing and un ##co ##oked .####1,2 13,16	FOOD#QUALITY#0
228 | super ##ci ##lio ##us sc ##orn is in .####-1,-1 0,4	SERVICE#GENERAL#0
229 | single worst restaurant in manhattan####2,3 1,2	RESTAURANT#GENERAL#0
230 | it is quite a spectacular scene i ' ll give them that .####5,6 4,5	AMBIENCE#GENERAL#2
231 | how this place survives the competitive west village market in this economy , or any other for that matter , is beyond me .####2,3 -1,-1	RESTAURANT#GENERAL#0
232 | though it ' s been crowded most times i ' ve gone here , bark always delivers on their food .####14,15 -1,-1	RESTAURANT#MISCELLANEOUS#1
233 | though it ' s been crowded most times i ' ve gone here , bark always delivers on their food .####19,20 -1,-1	FOOD#QUALITY#2
234 | but nonetheless - - great spot , great food .####5,6 4,5	RESTAURANT#GENERAL#2
235 | but nonetheless - - great spot , great food .####8,9 7,8	FOOD#QUALITY#2
236 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .####1,2 5,6	FOOD#QUALITY#2
237 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .####3,4 5,6	SERVICE#GENERAL#2
238 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .####9,13 15,19	SERVICE#GENERAL#0
239 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .####9,13 20,21	SERVICE#GENERAL#0
240 | a word to the wise : you ca n ' t din ##e here and disturb the mai ##tre - d ' s sense of ` ` table turnover ' ' , as w ##ha ##cked as it is , or else .####17,21 -1,-1	SERVICE#GENERAL#0
241 | i had the lamb special which was perfect .####3,5 7,8	FOOD#QUALITY#2
242 | do n ' t go to this place !####7,8 -1,-1	RESTAURANT#GENERAL#0
243 | when the main course finally arrived ( another 45 ##mins ) half of our order was missing .####-1,-1 -1,-1	SERVICE#GENERAL#0
244 | when we threatened to leave , we were offered a me ##ager discount even though half the order was missing .####-1,-1 -1,-1	SERVICE#GENERAL#0
245 | the bread was stale , the salad was over ##pr ##ice ##d and empty .####1,2 3,4	FOOD#QUALITY#0
246 | the bread was stale , the salad was over ##pr ##ice ##d and empty .####6,7 8,12	FOOD#PRICES#0
247 | the bread was stale , the salad was over ##pr ##ice ##d and empty .####6,7 13,14	FOOD#STYLE_OPTIONS#0
248 | shame on this place for the horrible rude staff and non - existent customer service .####8,9 6,7	SERVICE#GENERAL#0
249 | shame on this place for the horrible rude staff and non - existent customer service .####8,9 7,8	SERVICE#GENERAL#0
250 | shame on this place for the horrible rude staff and non - existent customer service .####13,15 10,13	SERVICE#GENERAL#0
251 | the food is good .####1,2 3,4	FOOD#QUALITY#2
252 | 


--------------------------------------------------------------------------------
/Extract-Classify-ACOS/tokenized_data/rest16_dev_quad_bert.tsv:
--------------------------------------------------------------------------------
  1 | ca n ' t wait wait for my next visit .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
  2 | their sake list was extensive , but we were looking for purple haze , which was n ' t listed but made for us upon request !	1,3 DRINKS#STYLE_OPTIONS 2 4,5	-1,-1 SERVICE#GENERAL 2 -1,-1
  3 | the spicy tuna roll was unusually good and the rock shrimp te ##mp ##ura was awesome , great app ##eti ##zer to share !	1,4 FOOD#QUALITY 2 6,7	9,14 FOOD#QUALITY 2 15,16
  4 | we love th pink pony .	3,5 RESTAURANT#GENERAL 2 1,2
  5 | this place has got to be the best japanese restaurant in the new york area .	1,2 RESTAURANT#GENERAL 2 7,8
  6 | i tend to judge a su ##shi restaurant by its sea ur ##chin , which was heavenly at su ##shi rose .	10,13 FOOD#QUALITY 2 16,17
  7 | the prix fix ##e menu is worth every penny and you get more than enough ( both in quantity and quality ) .	1,5 FOOD#QUALITY 2 6,7	1,5 FOOD#STYLE_OPTIONS 2 6,7	1,5 FOOD#PRICES 2 6,7
  8 | the food here is rather good , but only if you like to wait for it .	1,2 FOOD#QUALITY 2 5,6	-1,-1 SERVICE#GENERAL 0 -1,-1
  9 | also , specify if you like your food spicy - its rather bland if you do n ' t .	7,8 FOOD#QUALITY 0 12,13
 10 | the am ##bie ##nce is pretty and nice for conversation , so a casual lunch here would probably be best .	1,4 AMBIENCE#GENERAL 2 5,6	1,4 AMBIENCE#GENERAL 2 7,8	-1,-1 RESTAURANT#MISCELLANEOUS 2 19,20
 11 | it was horrible .	-1,-1 RESTAURANT#GENERAL 0 2,3
 12 | have been dozens of times and never failed to enjoy the experience .	-1,-1 RESTAURANT#GENERAL 2 9,10
 13 | make sure you try this place as often as you can .	5,6 RESTAURANT#GENERAL 2 7,8
 14 | i had a huge group for my birthday and we were well taken care of .	-1,-1 SERVICE#GENERAL 2 11,12
 15 | get the tuna of ga ##ri .	2,6 FOOD#QUALITY 2 -1,-1
 16 | make sure you have the spicy sc ##all ##op roll . . .	5,10 FOOD#QUALITY 2 -1,-1
 17 | rag ##a ' s is a romantic , cozy restaurant .	0,4 AMBIENCE#GENERAL 2 6,7	0,4 AMBIENCE#GENERAL 2 8,9
 18 | i had a great time at je ##ky ##ll and hyde !	6,11 RESTAURANT#GENERAL 2 3,4
 19 | i am bringing my whole family back next time .	-1,-1 RESTAURANT#MISCELLANEOUS 2 -1,-1
 20 | fine dining restaurant quality .	1,2 FOOD#QUALITY 2 0,1
 21 | we will return many times for this oasis in mid - town .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
 22 | the food options rule .	1,2 FOOD#STYLE_OPTIONS 2 -1,-1
 23 | my husband and i thou ##gt it would be great to go to the je ##ky ##ll and hyde pub for our anniversary , and to our surprise it was fantastic .	14,20 RESTAURANT#GENERAL 2 9,10	14,20 RESTAURANT#GENERAL 2 30,31
 24 | please take my advice , go and try this place .	9,10 RESTAURANT#GENERAL 2 -1,-1
 25 | they were served warm and had a soft fluffy interior .	-1,-1 FOOD#QUALITY 2 3,4	-1,-1 FOOD#QUALITY 2 7,8
 26 | but they do .	-1,-1 SERVICE#GENERAL 2 -1,-1
 27 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .	1,2 RESTAURANT#GENERAL 2 0,1	1,2 RESTAURANT#GENERAL 2 3,4	12,13 FOOD#QUALITY 2 14,15	12,13 FOOD#QUALITY 2 18,19
 28 | hats off to the chef .	4,5 FOOD#QUALITY 2 0,2
 29 | this is some really good , inexpensive su ##shi .	7,9 FOOD#QUALITY 2 4,5	7,9 FOOD#PRICES 2 6,7
 30 | this place is always very crowded and popular .	1,2 RESTAURANT#MISCELLANEOUS 2 5,6	1,2 RESTAURANT#MISCELLANEOUS 2 7,8
 31 | and evaluated on those terms past ##is is simply wonderful .	5,7 RESTAURANT#GENERAL 2 9,10
 32 | i ' m still mad that i had to pay for lou ##sy food .	13,14 FOOD#QUALITY 0 11,13
 33 | the hang ##er steak was like rubber and the tuna was flavor ##less not to mention it tasted like it had just been tha ##wed .	1,4 FOOD#QUALITY 0 6,7	9,10 FOOD#QUALITY 0 11,13
 34 | big thumbs up !	-1,-1 RESTAURANT#GENERAL 2 1,3
 35 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .	1,2 FOOD#QUALITY 2 5,6	3,4 DRINKS#QUALITY 2 5,6	8,9 SERVICE#GENERAL 2 5,6	20,23 AMBIENCE#GENERAL 2 -1,-1
 36 | it is one the nice ##st outdoor restaurants i have ever seen in ny - - i am from italy and this place rivals the ones in my country .	6,8 AMBIENCE#GENERAL 2 4,6
 37 | it is simply amazing .	-1,-1 FOOD#QUALITY 2 3,4
 38 | beautiful experience .	-1,-1 RESTAURANT#GENERAL 2 0,1
 39 | the menu is very limited - i think we counted 4 or 5 en ##tree ##s .	1,2 FOOD#STYLE_OPTIONS 0 4,5
 40 | we will go back every time we are in the city .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
 41 | the characters really make for an enjoyable experience .	1,2 AMBIENCE#GENERAL 2 6,7
 42 | however , i think je ##ck ##ll and hyde ##s t is one of those places that is fun to do once .	4,10 RESTAURANT#GENERAL 2 18,19
 43 | we had a good time .	-1,-1 RESTAURANT#GENERAL 2 3,4
 44 | a little over ##pr ##ice ##d but worth it once you take a bite .	-1,-1 FOOD#PRICES 0 2,6	-1,-1 FOOD#QUALITY 2 7,8
 45 | i have lived in japan for 7 years and the taste of the food and the feel of the restaurant is like being back in japan .	13,14 FOOD#QUALITY 2 -1,-1	16,17 AMBIENCE#GENERAL 2 -1,-1
 46 | check out the secret back room .	4,6 AMBIENCE#GENERAL 2 3,4
 47 | thank you emilio .	2,3 RESTAURANT#GENERAL 2 -1,-1
 48 | the food was authentic .	1,2 FOOD#QUALITY 2 3,4
 49 | fantastic !	-1,-1 RESTAURANT#GENERAL 2 0,1
 50 | but the staff was so horrible to us .	2,3 SERVICE#GENERAL 0 5,6
 51 | decor is nice though service can be spot ##ty .	0,1 AMBIENCE#GENERAL 2 2,3	4,5 SERVICE#GENERAL 0 7,9
 52 | just aw ##some .	-1,-1 FOOD#QUALITY 2 1,3
 53 | i had their eggs benedict for br ##un ##ch , which were the worst in my entire life , i tried removing the ho ##llon ##dai ##se sauce completely that was how failed it was .	3,5 FOOD#QUALITY 0 13,14
 54 | with the theater 2 blocks away we had a delicious meal in a beautiful room .	10,11 FOOD#QUALITY 2 9,10	14,15 AMBIENCE#GENERAL 2 13,14	-1,-1 LOCATION#GENERAL 2 -1,-1
 55 | the service was at ##ten ##tive .	1,2 SERVICE#GENERAL 2 3,6
 56 | pat ##ro ##on features a nice cigar bar and has great staff .	6,8 AMBIENCE#GENERAL 2 5,6	11,12 SERVICE#GENERAL 2 10,11
 57 | ll ##oo ##v ##ve this place .	5,6 RESTAURANT#GENERAL 2 0,4
 58 | the menu is limited but almost all of the dishes are excellent .	1,2 FOOD#STYLE_OPTIONS 0 3,4	9,10 FOOD#QUALITY 2 11,12
 59 | wine list is extensive without being over - priced .	0,2 DRINKS#STYLE_OPTIONS 2 3,4	0,2 DRINKS#PRICES 2 4,9
 60 | the food was very good , a great deal , and the place its self was great .	1,2 FOOD#QUALITY 2 4,5	1,2 FOOD#PRICES 2 6,7	12,13 AMBIENCE#GENERAL 2 16,17
 61 | the wait staff is very fr ##ein ##dly , they make it feel like you ' re eating in a fr ##ein ##dly little european town .	1,3 SERVICE#GENERAL 2 5,8
 62 | the whole set up is truly un ##pro ##fe ##ssion ##al and i wish cafe noir would get some good staff , because despite the current one this is a great place .	20,21 SERVICE#GENERAL 0 6,21	14,16 RESTAURANT#GENERAL 2 30,31
 63 | you should pass on the cal ##ama ##ri .	5,8 FOOD#QUALITY 0 -1,-1
 64 | when asked about how a certain dish was prepared in comparison to a similar at other thai restaurants , he replied this is not mcdonald ' s , every place makes things differently	-1,-1 SERVICE#GENERAL 0 -1,-1
 65 | everything was wonderful ; food , drinks , staff , mile ##au .	4,5 FOOD#QUALITY 2 2,3	6,7 DRINKS#QUALITY 2 2,3	8,9 SERVICE#GENERAL 2 2,3	10,12 AMBIENCE#GENERAL 2 2,3	-1,-1 RESTAURANT#GENERAL 2 2,3
 66 | i would highly recommend this place !	5,6 RESTAURANT#GENERAL 2 3,4
 67 | fresh ingredients and everything is made to order .	1,2 FOOD#QUALITY 2 0,1	-1,-1 FOOD#QUALITY 2 -1,-1
 68 | friendly staff that actually lets you enjoy your meal and the company you ' re with .	1,2 SERVICE#GENERAL 2 0,1
 69 | i will definitely be going back .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
 70 | a great choice at any cost and a great deal .	-1,-1 RESTAURANT#GENERAL 2 8,9	-1,-1 RESTAURANT#PRICES 2 1,2
 71 | tha ##lia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up .	8,9 SERVICE#GENERAL 2 7,8	14,15 FOOD#QUALITY 0 15,22	0,2 AMBIENCE#GENERAL 2 4,5
 72 | i ordered the smoked salmon and roe app ##eti ##zer and it was off flavor .	3,10 FOOD#QUALITY 0 13,15
 73 | the food is good , especially their more basic dishes , and the drinks are delicious .	1,2 FOOD#QUALITY 2 3,4	8,10 FOOD#QUALITY 2 3,4	13,14 DRINKS#QUALITY 2 15,16
 74 | the big complaint : no toast ##ing available .	-1,-1 SERVICE#GENERAL 0 2,3
 75 | i ' ve been many time and have never been disappointed .	-1,-1 RESTAURANT#GENERAL 2 8,11
 76 | the turkey burger ##s are scary !	1,4 FOOD#QUALITY 0 5,6
 77 | for authentic thai food , look no further than too ##ns .	2,4 FOOD#QUALITY 2 1,2
 78 | try the pad thai , or sample anything on the app ##eti ##zer menu . . . they ' re all delicious .	2,4 FOOD#QUALITY 2 21,22	2,4 FOOD#QUALITY 2 21,22
 79 | service was good and food is wonderful .	0,1 SERVICE#GENERAL 2 2,3	4,5 FOOD#QUALITY 2 6,7
 80 | it is definitely a good spot for snacks and chat .	5,6 RESTAURANT#GENERAL 2 4,5
 81 | do not get the go go hamburger ##s , no matter what the reviews say .	4,8 FOOD#QUALITY 0 -1,-1
 82 | steamed fresh so brought hot hot hot to your table .	-1,-1 FOOD#QUALITY 2 1,2
 83 | small serving ##s for main en ##tree , i had salmon ( wasn ##t impressed ) girlfriend had chicken , it was good .	10,11 FOOD#QUALITY 0 12,15	18,19 FOOD#QUALITY 2 22,23	1,7 FOOD#GENERAL 0 0,1
 84 | cute and decorative .	-1,-1 AMBIENCE#GENERAL 2 0,1	-1,-1 AMBIENCE#GENERAL 2 2,3
 85 | excellent spot for holiday get together ##s with co - workers or friends that you have n ' t seen in a while .	1,2 RESTAURANT#MISCELLANEOUS 2 0,1
 86 | what a great place !	3,4 RESTAURANT#GENERAL 2 2,3
 87 | not the typical nyc gi ##mm ##ick theme restaurant .	8,9 AMBIENCE#GENERAL 2 0,3
 88 | service was very prompt but slightly rushed .	0,1 SERVICE#GENERAL 2 3,4	0,1 SERVICE#GENERAL 2 6,7
 89 | i really liked this place .	4,5 RESTAURANT#GENERAL 2 2,3
 90 | everything i had was good , and i ' m a eater .	-1,-1 FOOD#QUALITY 2 4,5
 91 | i also recommend the rice dishes or the different varieties of cong ##ee ( rice por ##ridge ) .	4,6 FOOD#QUALITY 2 2,3	11,18 FOOD#QUALITY 2 2,3
 92 | i recently tried su ##an and i thought that it was great .	3,5 RESTAURANT#GENERAL 2 11,12
 93 | have been several times and it never di ##ssa ##points .	-1,-1 RESTAURANT#GENERAL 2 6,10
 94 | this place is a great bargain .	1,2 RESTAURANT#PRICES 2 4,6
 95 | people are always friendly .	0,1 SERVICE#GENERAL 2 3,4
 96 | the best pad thai i ' ve ever had .	2,4 FOOD#QUALITY 2 1,2
 97 | would n ' t rec ##ome ##nd it for dinner !	-1,-1 RESTAURANT#GENERAL 0 1,7
 98 | ask for us ##ha , the nice ##st bartender in manhattan .	2,4 SERVICE#GENERAL 2 6,8
 99 | the food ' s as good as ever .	1,2 FOOD#QUALITY 2 5,6
100 | best drums ##tick ##s over rice and sour spicy soup in town !	1,6 FOOD#QUALITY 2 0,1	7,10 FOOD#QUALITY 2 0,1
101 | for those that go once and do n ' t enjoy it , all i can say is that they just do n ' t get it .	-1,-1 RESTAURANT#MISCELLANEOUS 2 -1,-1
102 | not worth it .	-1,-1 FOOD#PRICES 0 0,2
103 | this dish is my favorite and i always get it when i go there and never get tired of it .	1,2 FOOD#QUALITY 2 4,5
104 | big wong is a great place to eat and fill your stomach .	0,2 RESTAURANT#GENERAL 2 4,5
105 | the food is okay and the prices here are med ##io ##cre .	1,2 FOOD#QUALITY 1 3,4	-1,-1 RESTAURANT#PRICES 1 9,12
106 | me and my girls will definitely go back .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
107 | the food is great .	1,2 FOOD#QUALITY 2 3,4
108 | la rosa waltz ##es in , and i think they are doing it the best .	0,2 FOOD#QUALITY 2 14,15
109 | interesting selection , good wines , service fine , fun decor .	4,5 DRINKS#QUALITY 2 3,4	6,7 SERVICE#GENERAL 2 7,8	10,11 AMBIENCE#GENERAL 2 9,10	1,2 FOOD#STYLE_OPTIONS 2 0,1
110 | the food here was med ##io ##cre at best .	1,2 FOOD#QUALITY 0 4,7
111 | the cypriot restaurant has a lot going for it .	1,3 RESTAURANT#GENERAL 2 -1,-1
112 | will comeback for sure , wish they have it here in la . .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
113 | the space kind of feels like an alice in wonderland setting , without it trying to be that .	1,2 AMBIENCE#GENERAL 0 -1,-1
114 | i paid just about $ 60 for a good meal , though : )	9,10 FOOD#QUALITY 2 8,9	9,10 FOOD#PRICES 2 -1,-1
115 | love it .	-1,-1 RESTAURANT#GENERAL 2 0,1
116 | the place is a bit hidden away , but once you get there , it ' s all worth it .	1,2 LOCATION#GENERAL 2 5,7	1,2 LOCATION#GENERAL 2 18,19
117 | i love their chicken pasta can ##t remember the name but is soo ##o good	3,5 FOOD#QUALITY 2 1,2	3,5 FOOD#QUALITY 2 14,15
118 | way below average	-1,-1 RESTAURANT#GENERAL 0 1,3
119 | i think the pizza is so over ##rated and was under cooked .	3,4 FOOD#QUALITY 0 6,8	3,4 FOOD#QUALITY 0 10,12
120 | i love this place	3,4 RESTAURANT#GENERAL 2 1,2
121 | the service was quick and friendly .	1,2 SERVICE#GENERAL 2 3,4	1,2 SERVICE#GENERAL 2 5,6
122 | i thought the restaurant was nice and clean .	3,4 RESTAURANT#GENERAL 2 5,6	3,4 AMBIENCE#GENERAL 2 7,8
123 | chicken ter ##iya ##ki had tomato or pi ##mento ##s on top ? ?	0,4 FOOD#STYLE_OPTIONS 0 -1,-1
124 | the waitress was not at ##ten ##tive at all .	1,2 SERVICE#GENERAL 0 3,7
125 | just go to ya ##mat ##o and order the red dragon roll .	3,6 RESTAURANT#GENERAL 2 -1,-1	9,12 FOOD#QUALITY 2 -1,-1
126 | favorite su ##shi in nyc	1,3 FOOD#QUALITY 2 0,1
127 | the rolls are creative and i have yet to find another su ##shi place that serves up more in ##vent ##ive yet delicious japanese food .	1,2 FOOD#STYLE_OPTIONS 2 3,4	23,25 FOOD#STYLE_OPTIONS 2 18,21	23,25 FOOD#QUALITY 2 22,23
128 | my que ##sa ##di ##lla tasted like it had been made by a three - year old with no sense of proportion or flavor .	1,5 FOOD#QUALITY 0 -1,-1	1,5 FOOD#STYLE_OPTIONS 0 -1,-1
129 | save your money and your time and go somewhere else .	-1,-1 RESTAURANT#GENERAL 0 -1,-1
130 | the spin ##ach is fresh , def ##inate ##ly not frozen . . .	1,3 FOOD#QUALITY 2 4,5
131 | decor needs to be upgraded but the food is amazing !	0,1 AMBIENCE#GENERAL 0 4,5	7,8 FOOD#QUALITY 2 9,10
132 | my daughter ' s wedding reception at water ' s edge received the highest compliment ##s from our guests .	7,11 RESTAURANT#MISCELLANEOUS 2 13,16
133 | the high prices you ' re going to pay is for the view not for the food .	12,13 LOCATION#GENERAL 1 1,3	16,17 FOOD#QUALITY 0 1,3	-1,-1 RESTAURANT#PRICES 0 1,3
134 | not what i would expect for the price and prestige of this location .	12,13 RESTAURANT#PRICES 1 -1,-1	12,13 RESTAURANT#MISCELLANEOUS 1 -1,-1	-1,-1 SERVICE#GENERAL 0 -1,-1
135 | the food was ok and fair nothing to go crazy .	1,2 FOOD#QUALITY 1 3,4	1,2 FOOD#QUALITY 1 5,6
136 | impressed . . .	-1,-1 RESTAURANT#GENERAL 2 0,1
137 | subtle food and service	1,2 FOOD#QUALITY 2 0,1	3,4 SERVICE#GENERAL 2 0,1
138 | food took some time to prepare , all worth waiting for .	0,1 FOOD#QUALITY 2 8,9	-1,-1 SERVICE#GENERAL 1 -1,-1
139 | great find in the west village !	-1,-1 RESTAURANT#GENERAL 2 0,1
140 | when the bill came , nothing was com ##ped , so i told the manager very politely that we were willing to pay for the wine , but i did n ' t think i should have to pay for food with a mag ##go ##t in it .	-1,-1 SERVICE#GENERAL 0 -1,-1
141 | amazing food .	1,2 FOOD#QUALITY 2 0,1
142 | rather than preparing vegetarian dish , the chef presented me with a plate of steamed vegetables ( minus sauce , season ##ing , or any form or aesthetic presentation ) .	3,5 FOOD#QUALITY 0 -1,-1	3,5 FOOD#STYLE_OPTIONS 0 -1,-1	7,8 SERVICE#GENERAL 0 -1,-1
143 | the only thing that strikes you is the decor ? ( not very pleasant ) .	8,9 AMBIENCE#GENERAL 0 11,14
144 | the martini ##s are amazing and very fairly priced .	1,3 DRINKS#QUALITY 2 4,5	1,3 DRINKS#PRICES 2 7,9
145 | i wanted to go there to see if it was worth it and sadly , curious ##ity got the best of me and i paid dear ##ly for it .	-1,-1 RESTAURANT#GENERAL 0 -1,-1	-1,-1 RESTAURANT#PRICES 0 -1,-1
146 | the environment is very upscale and you will see a lot of rich guys with trophy wives or just highly paid escorts .	1,2 AMBIENCE#GENERAL 1 4,5	-1,-1 RESTAURANT#MISCELLANEOUS 1 4,5
147 | however , our $ 14 drinks were were horrible !	5,6 DRINKS#QUALITY 0 8,9	5,6 DRINKS#PRICES 0 8,9
148 | once we finally got a table , despite indicating we wanted an alla cart ##e menu we were pushed into a table that was only price fixed !	-1,-1 SERVICE#GENERAL 0 -1,-1
149 | i do n ' t appreciate places or people that try to drive up the bill without the patron ' s knowledge so that was a huge turn ##off ( more than the price ) .	-1,-1 SERVICE#GENERAL 0 -1,-1	-1,-1 RESTAURANT#PRICES 0 -1,-1
150 | eat at your own risk .	-1,-1 FOOD#QUALITY 0 -1,-1
151 | the service was spectacular as the waiter knew everything about the menu and his recommendations were amazing !	1,2 SERVICE#GENERAL 2 3,4	6,7 SERVICE#GENERAL 2 16,17
152 | the sake ’ s complimented the courses very well and is successfully easing me into the sake world .	1,4 DRINKS#QUALITY 2 8,9
153 | maybe it was the great company ( i had friends visiting from phil ##ly – yes , it was not a date this time ) or the super reasonable price point , but i just can ’ t say enough good things about this brass ##erie .	44,46 RESTAURANT#GENERAL 2 40,41	44,46 RESTAURANT#PRICES 2 40,41
154 | i tried a couple other dishes but was n ' t too impressed .	5,6 FOOD#QUALITY 1 8,13
155 | the family seafood en ##tree was very good .	1,5 FOOD#QUALITY 2 7,8
156 | the food they serve is not comforting , not app ##eti ##zing and un ##co ##oked .	1,2 FOOD#QUALITY 0 5,7	1,2 FOOD#QUALITY 0 8,12	1,2 FOOD#QUALITY 0 13,16
157 | super ##ci ##lio ##us sc ##orn is in .	-1,-1 SERVICE#GENERAL 0 0,4
158 | single worst restaurant in manhattan	2,3 RESTAURANT#GENERAL 0 1,2
159 | it is quite a spectacular scene i ' ll give them that .	5,6 AMBIENCE#GENERAL 2 4,5
160 | how this place survives the competitive west village market in this economy , or any other for that matter , is beyond me .	2,3 RESTAURANT#GENERAL 0 -1,-1
161 | though it ' s been crowded most times i ' ve gone here , bark always delivers on their food .	14,15 RESTAURANT#MISCELLANEOUS 1 -1,-1	19,20 FOOD#QUALITY 2 -1,-1
162 | but nonetheless - - great spot , great food .	5,6 RESTAURANT#GENERAL 2 4,5	8,9 FOOD#QUALITY 2 7,8
163 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .	1,2 FOOD#QUALITY 2 5,6	3,4 SERVICE#GENERAL 2 5,6	9,13 SERVICE#GENERAL 0 15,19	9,13 SERVICE#GENERAL 0 20,21
164 | a word to the wise : you ca n ' t din ##e here and disturb the mai ##tre - d ' s sense of ` ` table turnover ' ' , as w ##ha ##cked as it is , or else .	17,21 SERVICE#GENERAL 0 -1,-1
165 | i had the lamb special which was perfect .	3,5 FOOD#QUALITY 2 7,8
166 | do n ' t go to this place !	7,8 RESTAURANT#GENERAL 0 -1,-1
167 | when the main course finally arrived ( another 45 ##mins ) half of our order was missing .	-1,-1 SERVICE#GENERAL 0 -1,-1
168 | when we threatened to leave , we were offered a me ##ager discount even though half the order was missing .	-1,-1 SERVICE#GENERAL 0 -1,-1
169 | the bread was stale , the salad was over ##pr ##ice ##d and empty .	1,2 FOOD#QUALITY 0 3,4	6,7 FOOD#PRICES 0 8,12	6,7 FOOD#STYLE_OPTIONS 0 13,14
170 | shame on this place for the horrible rude staff and non - existent customer service .	8,9 SERVICE#GENERAL 0 6,7	8,9 SERVICE#GENERAL 0 7,8	13,15 SERVICE#GENERAL 0 10,13
171 | the food is good .	1,2 FOOD#QUALITY 2 3,4
172 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <!-- # ACOS
 2 | 
 3 | We are making the final preparations for the release of our data and code. They will be coming soon. -->
 4 | 
 5 | # Aspect-Category-Opinion-Sentiment (ACOS) Quadruple Extraction
 6 | 
 7 | This repo contains the data sets and source code of our paper: 
 8 | 
 9 | Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions [[ACL 2021]](https://aclanthology.org/2021.acl-long.29.pdf).
10 | - We introduce a new ABSA task, named Aspect-Category-Opinion-Sentiment Quadruple (ACOS) Extraction, to extract fine-grained ABSA Quadruples from product reviews;
11 | - We construct two new datasets for the task, with ACOS quadruple annotations, and benchmark the task with four baseline systems;
12 | - Our task and datasets provide a good support for discovering implicit opinion targets and implicit opinion expressions in product reviews.
13 | 
14 | 
15 | ## Task
16 | The Aspect-Category-Opinion-Sentiment (ACOS) Quadruple Extraction aims to extract all aspect-category-opinion-sentiment quadruples, i.e., (aspect expression, aspect category, opinion expression, sentiment polarity), in a review sentence including implicit aspect and implicit opinion.
17 | 
18 | <p align="center">
19 | <img src="img/figure1.PNG" width="50%" />
20 | </p>
21 | 
22 | <!-- ![Alt text](img/figure1.PNG?raw=true "Example") -->
23 | 
24 | ## Datasets
25 | Two new datasets, Restaurant-ACOS and Laptop-ACOS, are constructed for the ACOS Quadruple Extraction task:
26 | - Restaurant-ACOS is an extension of the existing SemEval Restaurant dataset, based on which we add the annotation of implicit aspects, implicit opinions, and the quadruples;
27 | - Laptop-ACOS is a brand new one collected from the Amazon Laptop domain. It has twice size of the SemEval Loptop dataset, and is annotated with quadruples containing all explicit/implicit aspects and opinions.
28 | 
29 | The following table shows the comparison between our two ACOS Quadruple datasets and existing representative ABSA datasets.
30 | 
31 | <p align="center">
32 | <img src="img/stat.PNG" width="85%" />
33 | </p>
34 | <!-- ![Alt text](img/stat.PNG?raw=true "stat") -->
35 | 
36 | ## Methods
37 | We benchmark the ACOS Quadruple Extraction task with four baseline systems: 
38 | - Double-Propagation-ACOS
39 | - JET-ACOS
40 | - TAS-BERT-ACOS
41 | - Extract-Classify-ACOS
42 | 
43 | We provided the source code of Extract-Classify-ACOS. The source code of the other three methods will be provided soon.
44 | 
45 | Overview of our Extract-Classify-ACOS method. The first step performs aspect-opinion co-extraction, and the second step predicts category-sentiment given the aspect-opinion pairs.
46 | 
47 | <p align="center">
48 |   <img src="img/method.jpg" width="50%"/>
49 | </p>
50 | <!-- ![Alt text:center](img/method.PNG?raw=true "method") -->
51 | 
52 | ## Results
53 | The ACOS quadruple extraction performance of four different systems on the two datasets:
54 | 
55 | <p align="center">
56 |   <img src="img/main_results.PNG" width="70%"/>
57 | </p>
58 | 
59 | We further investigate the ability of different systems in addressing the implicit aspects/opinion problem:
60 | 
61 | <p align="center">
62 |   <img src="img/separate_results.PNG" width="80%"/>
63 | </p>
64 | 
65 | ## Citation
66 | If you use the data and code in your research, please cite our paper as follows:
67 | ```
68 | @inproceedings{cai2021aspect,
69 |   title={Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions},
70 |   author={Cai, Hongjie and Xia, Rui and Yu, Jianfei},
71 |   booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
72 |   pages={340--350},
73 |   year={2021}
74 | }
75 | ```
76 | 


--------------------------------------------------------------------------------
/data/Readme.md:
--------------------------------------------------------------------------------
1 | Each line consists of review text and its quadruples. Each quadruple is formalized as 'Aspect Category Sentiment Opinion'. The 0, 1, 2 in the Sentiment category represents negative, neutral, and positive, respectively.
2 | 


--------------------------------------------------------------------------------
/data/Restaurant-ACOS/rest16_quad_dev.tsv:
--------------------------------------------------------------------------------
  1 | ca n ' t wait wait for my next visit .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
  2 | their sake list was extensive , but we were looking for purple haze , which was n ' t listed but made for us upon request !	1,3 DRINKS#STYLE_OPTIONS 2 4,5	-1,-1 SERVICE#GENERAL 2 -1,-1
  3 | the spicy tuna roll was unusually good and the rock shrimp tempura was awesome , great appetizer to share !	1,4 FOOD#QUALITY 2 6,7	9,12 FOOD#QUALITY 2 13,14
  4 | we love th pink pony .	3,5 RESTAURANT#GENERAL 2 1,2
  5 | this place has got to be the best japanese restaurant in the new york area .	1,2 RESTAURANT#GENERAL 2 7,8
  6 | i tend to judge a sushi restaurant by its sea urchin , which was heavenly at sushi rose .	9,11 FOOD#QUALITY 2 14,15
  7 | the prix fixe menu is worth every penny and you get more than enough ( both in quantity and quality ) .	1,4 FOOD#QUALITY 2 5,6	1,4 FOOD#STYLE_OPTIONS 2 5,6	1,4 FOOD#PRICES 2 5,6
  8 | the food here is rather good , but only if you like to wait for it .	1,2 FOOD#QUALITY 2 5,6	-1,-1 SERVICE#GENERAL 0 -1,-1
  9 | also , specify if you like your food spicy - its rather bland if you do n ' t .	7,8 FOOD#QUALITY 0 12,13
 10 | the ambience is pretty and nice for conversation , so a casual lunch here would probably be best .	1,2 AMBIENCE#GENERAL 2 3,4	1,2 AMBIENCE#GENERAL 2 5,6	-1,-1 RESTAURANT#MISCELLANEOUS 2 17,18
 11 | it was horrible .	-1,-1 RESTAURANT#GENERAL 0 2,3
 12 | have been dozens of times and never failed to enjoy the experience .	-1,-1 RESTAURANT#GENERAL 2 9,10
 13 | make sure you try this place as often as you can .	5,6 RESTAURANT#GENERAL 2 7,8
 14 | i had a huge group for my birthday and we were well taken care of .	-1,-1 SERVICE#GENERAL 2 11,12
 15 | get the tuna of gari .	2,5 FOOD#QUALITY 2 -1,-1
 16 | make sure you have the spicy scallop roll . . .	5,8 FOOD#QUALITY 2 -1,-1
 17 | raga ' s is a romantic , cozy restaurant .	0,3 AMBIENCE#GENERAL 2 5,6	0,3 AMBIENCE#GENERAL 2 7,8
 18 | i had a great time at jekyll and hyde !	6,9 RESTAURANT#GENERAL 2 3,4
 19 | i am bringing my whole family back next time .	-1,-1 RESTAURANT#MISCELLANEOUS 2 -1,-1
 20 | fine dining restaurant quality .	1,2 FOOD#QUALITY 2 0,1
 21 | we will return many times for this oasis in mid - town .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
 22 | the food options rule .	1,2 FOOD#STYLE_OPTIONS 2 -1,-1
 23 | my husband and i thougt it would be great to go to the jekyll and hyde pub for our anniversary , and to our surprise it was fantastic .	13,17 RESTAURANT#GENERAL 2 8,9	13,17 RESTAURANT#GENERAL 2 27,28
 24 | please take my advice , go and try this place .	9,10 RESTAURANT#GENERAL 2 -1,-1
 25 | they were served warm and had a soft fluffy interior .	-1,-1 FOOD#QUALITY 2 3,4	-1,-1 FOOD#QUALITY 2 7,8
 26 | but they do .	-1,-1 SERVICE#GENERAL 2 -1,-1
 27 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .	1,2 RESTAURANT#GENERAL 2 0,1	1,2 RESTAURANT#GENERAL 2 3,4	12,13 FOOD#QUALITY 2 14,15	12,13 FOOD#QUALITY 2 18,19
 28 | hats off to the chef .	4,5 FOOD#QUALITY 2 0,2
 29 | this is some really good , inexpensive sushi .	7,8 FOOD#QUALITY 2 4,5	7,8 FOOD#PRICES 2 6,7
 30 | this place is always very crowded and popular .	1,2 RESTAURANT#MISCELLANEOUS 2 5,6	1,2 RESTAURANT#MISCELLANEOUS 2 7,8
 31 | and evaluated on those terms pastis is simply wonderful .	5,6 RESTAURANT#GENERAL 2 8,9
 32 | i ' m still mad that i had to pay for lousy food .	12,13 FOOD#QUALITY 0 11,12
 33 | the hanger steak was like rubber and the tuna was flavorless not to mention it tasted like it had just been thawed .	1,3 FOOD#QUALITY 0 5,6	8,9 FOOD#QUALITY 0 10,11
 34 | big thumbs up !	-1,-1 RESTAURANT#GENERAL 2 1,3
 35 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .	1,2 FOOD#QUALITY 2 5,6	3,4 DRINKS#QUALITY 2 5,6	8,9 SERVICE#GENERAL 2 5,6	20,23 AMBIENCE#GENERAL 2 -1,-1
 36 | it is one the nicest outdoor restaurants i have ever seen in ny - - i am from italy and this place rivals the ones in my country .	5,7 AMBIENCE#GENERAL 2 4,5
 37 | it is simply amazing .	-1,-1 FOOD#QUALITY 2 3,4
 38 | beautiful experience .	-1,-1 RESTAURANT#GENERAL 2 0,1
 39 | the menu is very limited - i think we counted 4 or 5 entrees .	1,2 FOOD#STYLE_OPTIONS 0 4,5
 40 | we will go back every time we are in the city .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
 41 | the characters really make for an enjoyable experience .	1,2 AMBIENCE#GENERAL 2 6,7
 42 | however , i think jeckll and hydes t is one of those places that is fun to do once .	4,7 RESTAURANT#GENERAL 2 15,16
 43 | we had a good time .	-1,-1 RESTAURANT#GENERAL 2 3,4
 44 | a little overpriced but worth it once you take a bite .	-1,-1 FOOD#PRICES 0 2,3	-1,-1 FOOD#QUALITY 2 4,5
 45 | i have lived in japan for 7 years and the taste of the food and the feel of the restaurant is like being back in japan .	13,14 FOOD#QUALITY 2 -1,-1	16,17 AMBIENCE#GENERAL 2 -1,-1
 46 | check out the secret back room .	4,6 AMBIENCE#GENERAL 2 3,4
 47 | thank you emilio .	2,3 RESTAURANT#GENERAL 2 -1,-1
 48 | the food was authentic .	1,2 FOOD#QUALITY 2 3,4
 49 | fantastic !	-1,-1 RESTAURANT#GENERAL 2 0,1
 50 | but the staff was so horrible to us .	2,3 SERVICE#GENERAL 0 5,6
 51 | decor is nice though service can be spotty .	0,1 AMBIENCE#GENERAL 2 2,3	4,5 SERVICE#GENERAL 0 7,8
 52 | just awsome .	-1,-1 FOOD#QUALITY 2 1,2
 53 | i had their eggs benedict for brunch , which were the worst in my entire life , i tried removing the hollondaise sauce completely that was how failed it was .	3,5 FOOD#QUALITY 0 11,12
 54 | with the theater 2 blocks away we had a delicious meal in a beautiful room .	10,11 FOOD#QUALITY 2 9,10	14,15 AMBIENCE#GENERAL 2 13,14	-1,-1 LOCATION#GENERAL 2 -1,-1
 55 | the service was attentive .	1,2 SERVICE#GENERAL 2 3,4
 56 | patroon features a nice cigar bar and has great staff .	4,6 AMBIENCE#GENERAL 2 3,4	9,10 SERVICE#GENERAL 2 8,9
 57 | lloovve this place .	2,3 RESTAURANT#GENERAL 2 0,1
 58 | the menu is limited but almost all of the dishes are excellent .	1,2 FOOD#STYLE_OPTIONS 0 3,4	9,10 FOOD#QUALITY 2 11,12
 59 | wine list is extensive without being over - priced .	0,2 DRINKS#STYLE_OPTIONS 2 3,4	0,2 DRINKS#PRICES 2 4,9
 60 | the food was very good , a great deal , and the place its self was great .	1,2 FOOD#QUALITY 2 4,5	1,2 FOOD#PRICES 2 6,7	12,13 AMBIENCE#GENERAL 2 16,17
 61 | the wait staff is very freindly , they make it feel like you ' re eating in a freindly little european town .	1,3 SERVICE#GENERAL 2 5,6
 62 | the whole set up is truly unprofessional and i wish cafe noir would get some good staff , because despite the current one this is a great place .	16,17 SERVICE#GENERAL 0 6,17	10,12 RESTAURANT#GENERAL 2 26,27
 63 | you should pass on the calamari .	5,6 FOOD#QUALITY 0 -1,-1
 64 | when asked about how a certain dish was prepared in comparison to a similar at other thai restaurants , he replied this is not mcdonald ' s , every place makes things differently	-1,-1 SERVICE#GENERAL 0 -1,-1
 65 | everything was wonderful ; food , drinks , staff , mileau .	4,5 FOOD#QUALITY 2 2,3	6,7 DRINKS#QUALITY 2 2,3	8,9 SERVICE#GENERAL 2 2,3	10,11 AMBIENCE#GENERAL 2 2,3	-1,-1 RESTAURANT#GENERAL 2 2,3
 66 | i would highly recommend this place !	5,6 RESTAURANT#GENERAL 2 3,4
 67 | fresh ingredients and everything is made to order .	1,2 FOOD#QUALITY 2 0,1	-1,-1 FOOD#QUALITY 2 -1,-1
 68 | friendly staff that actually lets you enjoy your meal and the company you ' re with .	1,2 SERVICE#GENERAL 2 0,1
 69 | i will definitely be going back .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
 70 | a great choice at any cost and a great deal .	-1,-1 RESTAURANT#GENERAL 2 8,9	-1,-1 RESTAURANT#PRICES 2 1,2
 71 | thalia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up .	7,8 SERVICE#GENERAL 2 6,7	13,14 FOOD#QUALITY 0 14,21	0,1 AMBIENCE#GENERAL 2 3,4
 72 | i ordered the smoked salmon and roe appetizer and it was off flavor .	3,8 FOOD#QUALITY 0 11,13
 73 | the food is good , especially their more basic dishes , and the drinks are delicious .	1,2 FOOD#QUALITY 2 3,4	8,10 FOOD#QUALITY 2 3,4	13,14 DRINKS#QUALITY 2 15,16
 74 | the big complaint : no toasting available .	-1,-1 SERVICE#GENERAL 0 2,3
 75 | i ' ve been many time and have never been disappointed .	-1,-1 RESTAURANT#GENERAL 2 8,11
 76 | the turkey burgers are scary !	1,3 FOOD#QUALITY 0 4,5
 77 | for authentic thai food , look no further than toons .	2,4 FOOD#QUALITY 2 1,2
 78 | try the pad thai , or sample anything on the appetizer menu . . . they ' re all delicious .	2,4 FOOD#QUALITY 2 19,20	2,4 FOOD#QUALITY 2 19,20
 79 | service was good and food is wonderful .	0,1 SERVICE#GENERAL 2 2,3	4,5 FOOD#QUALITY 2 6,7
 80 | it is definitely a good spot for snacks and chat .	5,6 RESTAURANT#GENERAL 2 4,5
 81 | do not get the go go hamburgers , no matter what the reviews say .	4,7 FOOD#QUALITY 0 -1,-1
 82 | steamed fresh so brought hot hot hot to your table .	-1,-1 FOOD#QUALITY 2 1,2
 83 | small servings for main entree , i had salmon ( wasnt impressed ) girlfriend had chicken , it was good .	8,9 FOOD#QUALITY 0 10,12	15,16 FOOD#QUALITY 2 19,20	1,5 FOOD#GENERAL 0 0,1
 84 | cute and decorative .	-1,-1 AMBIENCE#GENERAL 2 0,1	-1,-1 AMBIENCE#GENERAL 2 2,3
 85 | excellent spot for holiday get togethers with co - workers or friends that you have n ' t seen in a while .	1,2 RESTAURANT#MISCELLANEOUS 2 0,1
 86 | what a great place !	3,4 RESTAURANT#GENERAL 2 2,3
 87 | not the typical nyc gimmick theme restaurant .	6,7 AMBIENCE#GENERAL 2 0,3
 88 | service was very prompt but slightly rushed .	0,1 SERVICE#GENERAL 2 3,4	0,1 SERVICE#GENERAL 2 6,7
 89 | i really liked this place .	4,5 RESTAURANT#GENERAL 2 2,3
 90 | everything i had was good , and i ' m a eater .	-1,-1 FOOD#QUALITY 2 4,5
 91 | i also recommend the rice dishes or the different varieties of congee ( rice porridge ) .	4,6 FOOD#QUALITY 2 2,3	11,16 FOOD#QUALITY 2 2,3
 92 | i recently tried suan and i thought that it was great .	3,4 RESTAURANT#GENERAL 2 10,11
 93 | have been several times and it never dissapoints .	-1,-1 RESTAURANT#GENERAL 2 6,8
 94 | this place is a great bargain .	1,2 RESTAURANT#PRICES 2 4,6
 95 | people are always friendly .	0,1 SERVICE#GENERAL 2 3,4
 96 | the best pad thai i ' ve ever had .	2,4 FOOD#QUALITY 2 1,2
 97 | would n ' t recomend it for dinner !	-1,-1 RESTAURANT#GENERAL 0 1,5
 98 | ask for usha , the nicest bartender in manhattan .	2,3 SERVICE#GENERAL 2 5,6
 99 | the food ' s as good as ever .	1,2 FOOD#QUALITY 2 5,6
100 | best drumsticks over rice and sour spicy soup in town !	1,4 FOOD#QUALITY 2 0,1	5,8 FOOD#QUALITY 2 0,1
101 | for those that go once and do n ' t enjoy it , all i can say is that they just do n ' t get it .	-1,-1 RESTAURANT#MISCELLANEOUS 2 -1,-1
102 | not worth it .	-1,-1 FOOD#PRICES 0 0,2
103 | this dish is my favorite and i always get it when i go there and never get tired of it .	1,2 FOOD#QUALITY 2 4,5
104 | big wong is a great place to eat and fill your stomach .	0,2 RESTAURANT#GENERAL 2 4,5
105 | the food is okay and the prices here are mediocre .	1,2 FOOD#QUALITY 1 3,4	-1,-1 RESTAURANT#PRICES 1 9,10
106 | me and my girls will definitely go back .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
107 | the food is great .	1,2 FOOD#QUALITY 2 3,4
108 | la rosa waltzes in , and i think they are doing it the best .	0,2 FOOD#QUALITY 2 13,14
109 | interesting selection , good wines , service fine , fun decor .	4,5 DRINKS#QUALITY 2 3,4	6,7 SERVICE#GENERAL 2 7,8	10,11 AMBIENCE#GENERAL 2 9,10	1,2 FOOD#STYLE_OPTIONS 2 0,1
110 | the food here was mediocre at best .	1,2 FOOD#QUALITY 0 4,5
111 | the cypriot restaurant has a lot going for it .	1,3 RESTAURANT#GENERAL 2 -1,-1
112 | will comeback for sure , wish they have it here in la . .	-1,-1 RESTAURANT#GENERAL 2 -1,-1
113 | the space kind of feels like an alice in wonderland setting , without it trying to be that .	1,2 AMBIENCE#GENERAL 0 -1,-1
114 | i paid just about $ 60 for a good meal , though : )	9,10 FOOD#QUALITY 2 8,9	9,10 FOOD#PRICES 2 -1,-1
115 | love it .	-1,-1 RESTAURANT#GENERAL 2 0,1
116 | the place is a bit hidden away , but once you get there , it ' s all worth it .	1,2 LOCATION#GENERAL 2 5,7	1,2 LOCATION#GENERAL 2 18,19
117 | i love their chicken pasta cant remember the name but is sooo good	3,5 FOOD#QUALITY 2 1,2	3,5 FOOD#QUALITY 2 12,13
118 | way below average	-1,-1 RESTAURANT#GENERAL 0 1,3
119 | i think the pizza is so overrated and was under cooked .	3,4 FOOD#QUALITY 0 6,7	3,4 FOOD#QUALITY 0 9,11
120 | i love this place	3,4 RESTAURANT#GENERAL 2 1,2
121 | the service was quick and friendly .	1,2 SERVICE#GENERAL 2 3,4	1,2 SERVICE#GENERAL 2 5,6
122 | i thought the restaurant was nice and clean .	3,4 RESTAURANT#GENERAL 2 5,6	3,4 AMBIENCE#GENERAL 2 7,8
123 | chicken teriyaki had tomato or pimentos on top ? ?	0,2 FOOD#STYLE_OPTIONS 0 -1,-1
124 | the waitress was not attentive at all .	1,2 SERVICE#GENERAL 0 3,5
125 | just go to yamato and order the red dragon roll .	3,4 RESTAURANT#GENERAL 2 -1,-1	7,10 FOOD#QUALITY 2 -1,-1
126 | favorite sushi in nyc	1,2 FOOD#QUALITY 2 0,1
127 | the rolls are creative and i have yet to find another sushi place that serves up more inventive yet delicious japanese food .	1,2 FOOD#STYLE_OPTIONS 2 3,4	20,22 FOOD#STYLE_OPTIONS 2 17,18	20,22 FOOD#QUALITY 2 19,20
128 | my quesadilla tasted like it had been made by a three - year old with no sense of proportion or flavor .	1,2 FOOD#QUALITY 0 -1,-1	1,2 FOOD#STYLE_OPTIONS 0 -1,-1
129 | save your money and your time and go somewhere else .	-1,-1 RESTAURANT#GENERAL 0 -1,-1
130 | the spinach is fresh , definately not frozen . . .	1,2 FOOD#QUALITY 2 3,4
131 | decor needs to be upgraded but the food is amazing !	0,1 AMBIENCE#GENERAL 0 4,5	7,8 FOOD#QUALITY 2 9,10
132 | my daughter ' s wedding reception at water ' s edge received the highest compliments from our guests .	7,11 RESTAURANT#MISCELLANEOUS 2 13,15
133 | the high prices you ' re going to pay is for the view not for the food .	12,13 LOCATION#GENERAL 1 1,3	16,17 FOOD#QUALITY 0 1,3	-1,-1 RESTAURANT#PRICES 0 1,3
134 | not what i would expect for the price and prestige of this location .	12,13 RESTAURANT#PRICES 1 -1,-1	12,13 RESTAURANT#MISCELLANEOUS 1 -1,-1	-1,-1 SERVICE#GENERAL 0 -1,-1
135 | the food was ok and fair nothing to go crazy .	1,2 FOOD#QUALITY 1 3,4	1,2 FOOD#QUALITY 1 5,6
136 | impressed . . .	-1,-1 RESTAURANT#GENERAL 2 0,1
137 | subtle food and service	1,2 FOOD#QUALITY 2 0,1	3,4 SERVICE#GENERAL 2 0,1
138 | food took some time to prepare , all worth waiting for .	0,1 FOOD#QUALITY 2 8,9	-1,-1 SERVICE#GENERAL 1 -1,-1
139 | great find in the west village !	-1,-1 RESTAURANT#GENERAL 2 0,1
140 | when the bill came , nothing was comped , so i told the manager very politely that we were willing to pay for the wine , but i did n ' t think i should have to pay for food with a maggot in it .	-1,-1 SERVICE#GENERAL 0 -1,-1
141 | amazing food .	1,2 FOOD#QUALITY 2 0,1
142 | rather than preparing vegetarian dish , the chef presented me with a plate of steamed vegetables ( minus sauce , seasoning , or any form or aesthetic presentation ) .	3,5 FOOD#QUALITY 0 -1,-1	3,5 FOOD#STYLE_OPTIONS 0 -1,-1	7,8 SERVICE#GENERAL 0 -1,-1
143 | the only thing that strikes you is the decor ? ( not very pleasant ) .	8,9 AMBIENCE#GENERAL 0 11,14
144 | the martinis are amazing and very fairly priced .	1,2 DRINKS#QUALITY 2 3,4	1,2 DRINKS#PRICES 2 6,8
145 | i wanted to go there to see if it was worth it and sadly , curiousity got the best of me and i paid dearly for it .	-1,-1 RESTAURANT#GENERAL 0 -1,-1	-1,-1 RESTAURANT#PRICES 0 -1,-1
146 | the environment is very upscale and you will see a lot of rich guys with trophy wives or just highly paid escorts .	1,2 AMBIENCE#GENERAL 1 4,5	-1,-1 RESTAURANT#MISCELLANEOUS 1 4,5
147 | however , our $ 14 drinks were were horrible !	5,6 DRINKS#QUALITY 0 8,9	5,6 DRINKS#PRICES 0 8,9
148 | once we finally got a table , despite indicating we wanted an alla carte menu we were pushed into a table that was only price fixed !	-1,-1 SERVICE#GENERAL 0 -1,-1
149 | i do n ' t appreciate places or people that try to drive up the bill without the patron ' s knowledge so that was a huge turnoff ( more than the price ) .	-1,-1 SERVICE#GENERAL 0 -1,-1	-1,-1 RESTAURANT#PRICES 0 -1,-1
150 | eat at your own risk .	-1,-1 FOOD#QUALITY 0 -1,-1
151 | the service was spectacular as the waiter knew everything about the menu and his recommendations were amazing !	1,2 SERVICE#GENERAL 2 3,4	6,7 SERVICE#GENERAL 2 16,17
152 | the sake ’ s complimented the courses very well and is successfully easing me into the sake world .	1,4 DRINKS#QUALITY 2 8,9
153 | maybe it was the great company ( i had friends visiting from philly – yes , it was not a date this time ) or the super reasonable price point , but i just can ’ t say enough good things about this brasserie .	43,44 RESTAURANT#GENERAL 2 39,40	43,44 RESTAURANT#PRICES 2 39,40
154 | i tried a couple other dishes but was n ' t too impressed .	5,6 FOOD#QUALITY 1 8,13
155 | the family seafood entree was very good .	1,4 FOOD#QUALITY 2 6,7
156 | the food they serve is not comforting , not appetizing and uncooked .	1,2 FOOD#QUALITY 0 5,7	1,2 FOOD#QUALITY 0 8,10	1,2 FOOD#QUALITY 0 11,12
157 | supercilious scorn is in .	-1,-1 SERVICE#GENERAL 0 0,1
158 | single worst restaurant in manhattan	2,3 RESTAURANT#GENERAL 0 1,2
159 | it is quite a spectacular scene i ' ll give them that .	5,6 AMBIENCE#GENERAL 2 4,5
160 | how this place survives the competitive west village market in this economy , or any other for that matter , is beyond me .	2,3 RESTAURANT#GENERAL 0 -1,-1
161 | though it ' s been crowded most times i ' ve gone here , bark always delivers on their food .	14,15 RESTAURANT#MISCELLANEOUS 1 -1,-1	19,20 FOOD#QUALITY 2 -1,-1
162 | but nonetheless - - great spot , great food .	5,6 RESTAURANT#GENERAL 2 4,5	8,9 FOOD#QUALITY 2 7,8
163 | the food and service were fine , however the maitre - d was incredibly unwelcoming and arrogant .	1,2 FOOD#QUALITY 2 5,6	3,4 SERVICE#GENERAL 2 5,6	9,12 SERVICE#GENERAL 0 14,15	9,12 SERVICE#GENERAL 0 16,17
164 | a word to the wise : you ca n ' t dine here and disturb the maitre - d ' s sense of ` ` table turnover ' ' , as whacked as it is , or else .	16,19 SERVICE#GENERAL 0 -1,-1
165 | i had the lamb special which was perfect .	3,5 FOOD#QUALITY 2 7,8
166 | do n ' t go to this place !	7,8 RESTAURANT#GENERAL 0 -1,-1
167 | when the main course finally arrived ( another 45mins ) half of our order was missing .	-1,-1 SERVICE#GENERAL 0 -1,-1
168 | when we threatened to leave , we were offered a meager discount even though half the order was missing .	-1,-1 SERVICE#GENERAL 0 -1,-1
169 | the bread was stale , the salad was overpriced and empty .	1,2 FOOD#QUALITY 0 3,4	6,7 FOOD#PRICES 0 8,9	6,7 FOOD#STYLE_OPTIONS 0 10,11
170 | shame on this place for the horrible rude staff and non - existent customer service .	8,9 SERVICE#GENERAL 0 6,7	8,9 SERVICE#GENERAL 0 7,8	13,15 SERVICE#GENERAL 0 10,13
171 | the food is good .	1,2 FOOD#QUALITY 2 3,4
172 | 


--------------------------------------------------------------------------------
/img/figure1.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/figure1.PNG


--------------------------------------------------------------------------------
/img/main_results.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/main_results.PNG


--------------------------------------------------------------------------------
/img/method.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/method.PNG


--------------------------------------------------------------------------------
/img/method.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/method.jpg


--------------------------------------------------------------------------------
/img/separate_results.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/separate_results.PNG


--------------------------------------------------------------------------------
/img/stat.PNG:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/stat.PNG


--------------------------------------------------------------------------------