├── Extract-Classify-ACOS ├── Readme.md ├── __pycache__ │ ├── dataset_utils.cpython-38.pyc │ ├── dataset_utils.cpython-39.pyc │ ├── eval_metrics.cpython-38.pyc │ ├── file_utils.cpython-38.pyc │ ├── manager.cpython-38.pyc │ ├── manager.cpython-39.pyc │ ├── modeling.cpython-38.pyc │ └── run_classifier_dataset_utils.cpython-38.pyc ├── bert_utils │ ├── __init__.py │ ├── __pycache__ │ │ ├── __init__.cpython-37.pyc │ │ ├── __init__.cpython-38.pyc │ │ ├── file_utils.cpython-37.pyc │ │ ├── file_utils.cpython-38.pyc │ │ ├── optimization.cpython-37.pyc │ │ ├── optimization.cpython-38.pyc │ │ ├── tokenization.cpython-37.pyc │ │ └── tokenization.cpython-38.pyc │ ├── file_utils.py │ ├── optimization.py │ └── tokenization.py ├── dataset_utils.py ├── eval_metrics.py ├── file_utils.py ├── manager.py ├── modeling.py ├── run.sh ├── run_classifier_dataset_utils.py ├── run_step1.py ├── run_step2.py └── tokenized_data │ ├── get_1st_pairs.py │ ├── laptop_dev_pair.tsv │ ├── laptop_dev_quad_bert.tsv │ ├── laptop_test_pair.tsv │ ├── laptop_test_pair_1st.tsv │ ├── laptop_test_quad_bert.tsv │ ├── laptop_train_pair.tsv │ ├── laptop_train_quad_bert.tsv │ ├── rest16_dev_pair.tsv │ ├── rest16_dev_quad_bert.tsv │ ├── rest16_test_pair.tsv │ ├── rest16_test_pair_1st.tsv │ ├── rest16_test_quad_bert.tsv │ ├── rest16_train_pair.tsv │ └── rest16_train_quad_bert.tsv ├── README.md ├── data ├── Laptop-ACOS │ ├── laptop_quad_dev.tsv │ ├── laptop_quad_test.tsv │ └── laptop_quad_train.tsv ├── Readme.md └── Restaurant-ACOS │ ├── rest16_quad_dev.tsv │ ├── rest16_quad_test.tsv │ └── rest16_quad_train.tsv └── img ├── figure1.PNG ├── main_results.PNG ├── method.PNG ├── method.jpg ├── separate_results.PNG └── stat.PNG /Extract-Classify-ACOS/Readme.md: -------------------------------------------------------------------------------- 1 | 2 | ## Running 3 | 4 | Modify the corresponding BERT_BASE_DIR, DATA_DIR and output_dir to run the script. 5 | 6 | BERT_BASE_DIR: The directory containing config, pytorch_model, and vocab files of BERT (the pytorch BERT model should be added here). 7 | 8 | BASE_DIR: The directory of current project. 9 | 10 | DATA_DIR: The data directory DIR, where data files are stored at 'DIR/tokenized_data/.' as DOMAIN_YEAR_train.tsv (e.g., rest16_train_quad_bert.tsv). 11 | 12 | output_dir: Output directory containing the fine-tuned language model. 13 | 14 | ## Requirements 15 | * Python 3.7 16 | * Pytorch 1.8 17 | * pytorch-crf 0.7.2 18 | 19 | **Running** 20 | 21 | Modify the corresponding BERT_BASE_DIR, BASE_DIR, DATA_DIR and output_dir to run the script: 22 | ``` 23 | sh run.sh 24 | ``` 25 | 26 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/__pycache__/dataset_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/dataset_utils.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/__pycache__/dataset_utils.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/dataset_utils.cpython-39.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/__pycache__/eval_metrics.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/eval_metrics.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/__pycache__/file_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/file_utils.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/__pycache__/manager.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/manager.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/__pycache__/manager.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/manager.cpython-39.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/__pycache__/modeling.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/modeling.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/__pycache__/run_classifier_dataset_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/__pycache__/run_classifier_dataset_utils.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__init__.py: -------------------------------------------------------------------------------- 1 | # -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__pycache__/__init__.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/__init__.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__pycache__/file_utils.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/file_utils.cpython-37.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__pycache__/file_utils.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/file_utils.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__pycache__/optimization.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/optimization.cpython-37.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__pycache__/optimization.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/optimization.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__pycache__/tokenization.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/tokenization.cpython-37.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/__pycache__/tokenization.cpython-38.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/Extract-Classify-ACOS/bert_utils/__pycache__/tokenization.cpython-38.pyc -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/file_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utilities for working with the local dataset cache. 3 | This file is adapted from the AllenNLP library at https://github.com/allenai/allennlp 4 | Copyright by the AllenNLP authors. 5 | """ 6 | from __future__ import (absolute_import, division, print_function, unicode_literals) 7 | 8 | import sys 9 | import json 10 | import logging 11 | import os 12 | import shutil 13 | import tempfile 14 | import fnmatch 15 | from functools import wraps 16 | from hashlib import sha256 17 | import sys 18 | from io import open 19 | 20 | import boto3 21 | import requests 22 | from botocore.exceptions import ClientError 23 | from tqdm import tqdm 24 | 25 | try: 26 | from torch.hub import _get_torch_home 27 | torch_cache_home = _get_torch_home() 28 | except ImportError: 29 | torch_cache_home = os.path.expanduser( 30 | os.getenv('TORCH_HOME', os.path.join( 31 | os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch'))) 32 | default_cache_path = os.path.join(torch_cache_home, 'pytorch_pretrained_bert') 33 | 34 | try: 35 | from urllib.parse import urlparse 36 | except ImportError: 37 | from urlparse import urlparse 38 | 39 | try: 40 | from pathlib import Path 41 | PYTORCH_PRETRAINED_BERT_CACHE = Path( 42 | os.getenv('PYTORCH_PRETRAINED_BERT_CACHE', default_cache_path)) 43 | except (AttributeError, ImportError): 44 | PYTORCH_PRETRAINED_BERT_CACHE = os.getenv('PYTORCH_PRETRAINED_BERT_CACHE', 45 | default_cache_path) 46 | 47 | CONFIG_NAME = "config.json" 48 | WEIGHTS_NAME = "pytorch_model.bin" 49 | 50 | logger = logging.getLogger(__name__) # pylint: disable=invalid-name 51 | 52 | 53 | def url_to_filename(url, etag=None): 54 | """ 55 | Convert `url` into a hashed filename in a repeatable way. 56 | If `etag` is specified, append its hash to the url's, delimited 57 | by a period. 58 | """ 59 | url_bytes = url.encode('utf-8') 60 | url_hash = sha256(url_bytes) 61 | filename = url_hash.hexdigest() 62 | 63 | if etag: 64 | etag_bytes = etag.encode('utf-8') 65 | etag_hash = sha256(etag_bytes) 66 | filename += '.' + etag_hash.hexdigest() 67 | 68 | return filename 69 | 70 | 71 | def filename_to_url(filename, cache_dir=None): 72 | """ 73 | Return the url and etag (which may be ``None``) stored for `filename`. 74 | Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist. 75 | """ 76 | if cache_dir is None: 77 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 78 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 79 | cache_dir = str(cache_dir) 80 | 81 | cache_path = os.path.join(cache_dir, filename) 82 | if not os.path.exists(cache_path): 83 | raise EnvironmentError("file {} not found".format(cache_path)) 84 | 85 | meta_path = cache_path + '.json' 86 | if not os.path.exists(meta_path): 87 | raise EnvironmentError("file {} not found".format(meta_path)) 88 | 89 | with open(meta_path, encoding="utf-8") as meta_file: 90 | metadata = json.load(meta_file) 91 | url = metadata['url'] 92 | etag = metadata['etag'] 93 | 94 | return url, etag 95 | 96 | 97 | def cached_path(url_or_filename, cache_dir=None): 98 | """ 99 | Given something that might be a URL (or might be a local path), 100 | determine which. If it's a URL, download the file and cache it, and 101 | return the path to the cached file. If it's already a local path, 102 | make sure the file exists and then return the path. 103 | """ 104 | if cache_dir is None: 105 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 106 | if sys.version_info[0] == 3 and isinstance(url_or_filename, Path): 107 | url_or_filename = str(url_or_filename) 108 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 109 | cache_dir = str(cache_dir) 110 | 111 | parsed = urlparse(url_or_filename) 112 | 113 | if parsed.scheme in ('http', 'https', 's3'): 114 | # URL, so get it from the cache (downloading if necessary) 115 | return get_from_cache(url_or_filename, cache_dir) 116 | elif os.path.exists(url_or_filename): 117 | # File, and it exists. 118 | return url_or_filename 119 | elif parsed.scheme == '': 120 | # File, but it doesn't exist. 121 | raise EnvironmentError("file {} not found".format(url_or_filename)) 122 | else: 123 | # Something unknown 124 | raise ValueError("unable to parse {} as a URL or as a local path".format(url_or_filename)) 125 | 126 | 127 | def split_s3_path(url): 128 | """Split a full s3 path into the bucket name and path.""" 129 | parsed = urlparse(url) 130 | if not parsed.netloc or not parsed.path: 131 | raise ValueError("bad s3 path {}".format(url)) 132 | bucket_name = parsed.netloc 133 | s3_path = parsed.path 134 | # Remove '/' at beginning of path. 135 | if s3_path.startswith("/"): 136 | s3_path = s3_path[1:] 137 | return bucket_name, s3_path 138 | 139 | 140 | def s3_request(func): 141 | """ 142 | Wrapper function for s3 requests in order to create more helpful error 143 | messages. 144 | """ 145 | 146 | @wraps(func) 147 | def wrapper(url, *args, **kwargs): 148 | try: 149 | return func(url, *args, **kwargs) 150 | except ClientError as exc: 151 | if int(exc.response["Error"]["Code"]) == 404: 152 | raise EnvironmentError("file {} not found".format(url)) 153 | else: 154 | raise 155 | 156 | return wrapper 157 | 158 | 159 | @s3_request 160 | def s3_etag(url): 161 | """Check ETag on S3 object.""" 162 | s3_resource = boto3.resource("s3") 163 | bucket_name, s3_path = split_s3_path(url) 164 | s3_object = s3_resource.Object(bucket_name, s3_path) 165 | return s3_object.e_tag 166 | 167 | 168 | @s3_request 169 | def s3_get(url, temp_file): 170 | """Pull a file directly from S3.""" 171 | s3_resource = boto3.resource("s3") 172 | bucket_name, s3_path = split_s3_path(url) 173 | s3_resource.Bucket(bucket_name).download_fileobj(s3_path, temp_file) 174 | 175 | 176 | def http_get(url, temp_file): 177 | req = requests.get(url, stream=True) 178 | content_length = req.headers.get('Content-Length') 179 | total = int(content_length) if content_length is not None else None 180 | progress = tqdm(unit="B", total=total) 181 | for chunk in req.iter_content(chunk_size=1024): 182 | if chunk: # filter out keep-alive new chunks 183 | progress.update(len(chunk)) 184 | temp_file.write(chunk) 185 | progress.close() 186 | 187 | 188 | def get_from_cache(url, cache_dir=None): 189 | """ 190 | Given a URL, look for the corresponding dataset in the local cache. 191 | If it's not there, download it. Then return the path to the cached file. 192 | """ 193 | if cache_dir is None: 194 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 195 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 196 | cache_dir = str(cache_dir) 197 | 198 | if not os.path.exists(cache_dir): 199 | os.makedirs(cache_dir) 200 | 201 | # Get eTag to add to filename, if it exists. 202 | if url.startswith("s3://"): 203 | etag = s3_etag(url) 204 | else: 205 | try: 206 | response = requests.head(url, allow_redirects=True) 207 | if response.status_code != 200: 208 | etag = None 209 | else: 210 | etag = response.headers.get("ETag") 211 | except EnvironmentError: 212 | etag = None 213 | 214 | if sys.version_info[0] == 2 and etag is not None: 215 | etag = etag.decode('utf-8') 216 | filename = url_to_filename(url, etag) 217 | 218 | # get cache path to put the file 219 | cache_path = os.path.join(cache_dir, filename) 220 | 221 | # If we don't have a connection (etag is None) and can't identify the file 222 | # try to get the last downloaded one 223 | if not os.path.exists(cache_path) and etag is None: 224 | matching_files = fnmatch.filter(os.listdir(cache_dir), filename + '.*') 225 | matching_files = list(filter(lambda s: not s.endswith('.json'), matching_files)) 226 | if matching_files: 227 | cache_path = os.path.join(cache_dir, matching_files[-1]) 228 | 229 | if not os.path.exists(cache_path): 230 | # Download to temporary file, then copy to cache dir once finished. 231 | # Otherwise you get corrupt cache entries if the download gets interrupted. 232 | with tempfile.NamedTemporaryFile() as temp_file: 233 | logger.info("%s not found in cache, downloading to %s", url, temp_file.name) 234 | 235 | # GET file object 236 | if url.startswith("s3://"): 237 | s3_get(url, temp_file) 238 | else: 239 | http_get(url, temp_file) 240 | 241 | # we are copying the file before closing it, so flush to avoid truncation 242 | temp_file.flush() 243 | # shutil.copyfileobj() starts at the current position, so go to the start 244 | temp_file.seek(0) 245 | 246 | logger.info("copying %s to cache at %s", temp_file.name, cache_path) 247 | with open(cache_path, 'wb') as cache_file: 248 | shutil.copyfileobj(temp_file, cache_file) 249 | 250 | logger.info("creating metadata file for %s", cache_path) 251 | meta = {'url': url, 'etag': etag} 252 | meta_path = cache_path + '.json' 253 | with open(meta_path, 'w') as meta_file: 254 | output_string = json.dumps(meta) 255 | if sys.version_info[0] == 2 and isinstance(output_string, str): 256 | output_string = unicode(output_string, 'utf-8') # The beauty of python 2 257 | meta_file.write(output_string) 258 | 259 | logger.info("removing temp file %s", temp_file.name) 260 | 261 | return cache_path 262 | 263 | 264 | def read_set_from_file(filename): 265 | ''' 266 | Extract a de-duped collection (set) of text from a file. 267 | Expected file format is one item per line. 268 | ''' 269 | collection = set() 270 | with open(filename, 'r', encoding='utf-8') as file_: 271 | for line in file_: 272 | collection.add(line.rstrip()) 273 | return collection 274 | 275 | 276 | def get_file_extension(path, dot=True, lower=True): 277 | ext = os.path.splitext(path)[1] 278 | ext = ext if dot else ext[1:] 279 | return ext.lower() if lower else ext 280 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/optimization.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | """PyTorch optimization for BERT model.""" 16 | 17 | import math 18 | import torch 19 | from torch.optim import Optimizer 20 | from torch.optim.optimizer import required 21 | from torch.nn.utils import clip_grad_norm_ 22 | import logging 23 | import abc 24 | import sys 25 | 26 | logger = logging.getLogger(__name__) 27 | 28 | 29 | if sys.version_info >= (3, 4): 30 | ABC = abc.ABC 31 | else: 32 | ABC = abc.ABCMeta('ABC', (), {}) 33 | 34 | 35 | class _LRSchedule(ABC): 36 | """ Parent of all LRSchedules here. """ 37 | warn_t_total = False # is set to True for schedules where progressing beyond t_total steps doesn't make sense 38 | def __init__(self, warmup=0.002, t_total=-1, **kw): 39 | """ 40 | :param warmup: what fraction of t_total steps will be used for linear warmup 41 | :param t_total: how many training steps (updates) are planned 42 | :param kw: 43 | """ 44 | super(_LRSchedule, self).__init__(**kw) 45 | if t_total < 0: 46 | logger.warning("t_total value of {} results in schedule not being applied".format(t_total)) 47 | if not 0.0 <= warmup < 1.0 and not warmup == -1: 48 | raise ValueError("Invalid warmup: {} - should be in [0.0, 1.0[ or -1".format(warmup)) 49 | warmup = max(warmup, 0.) 50 | self.warmup, self.t_total = float(warmup), float(t_total) 51 | self.warned_for_t_total_at_progress = -1 52 | 53 | def get_lr(self, step, nowarn=False): 54 | """ 55 | :param step: which of t_total steps we're on 56 | :param nowarn: set to True to suppress warning regarding training beyond specified 't_total' steps 57 | :return: learning rate multiplier for current update 58 | """ 59 | if self.t_total < 0: 60 | return 1. 61 | progress = float(step) / self.t_total 62 | ret = self.get_lr_(progress) 63 | # warning for exceeding t_total (only active with warmup_linear 64 | if not nowarn and self.warn_t_total and progress > 1. and progress > self.warned_for_t_total_at_progress: 65 | logger.warning( 66 | "Training beyond specified 't_total'. Learning rate multiplier set to {}. Please set 't_total' of {} correctly." 67 | .format(ret, self.__class__.__name__)) 68 | self.warned_for_t_total_at_progress = progress 69 | # end warning 70 | return ret 71 | 72 | @abc.abstractmethod 73 | def get_lr_(self, progress): 74 | """ 75 | :param progress: value between 0 and 1 (unless going beyond t_total steps) specifying training progress 76 | :return: learning rate multiplier for current update 77 | """ 78 | return 1. 79 | 80 | 81 | class ConstantLR(_LRSchedule): 82 | def get_lr_(self, progress): 83 | return 1. 84 | 85 | 86 | class WarmupCosineSchedule(_LRSchedule): 87 | """ 88 | Linearly increases learning rate from 0 to 1 over `warmup` fraction of training steps. 89 | Decreases learning rate from 1. to 0. over remaining `1 - warmup` steps following a cosine curve. 90 | If `cycles` (default=0.5) is different from default, learning rate follows cosine function after warmup. 91 | """ 92 | warn_t_total = True 93 | def __init__(self, warmup=0.002, t_total=-1, cycles=.5, **kw): 94 | """ 95 | :param warmup: see LRSchedule 96 | :param t_total: see LRSchedule 97 | :param cycles: number of cycles. Default: 0.5, corresponding to cosine decay from 1. at progress==warmup and 0 at progress==1. 98 | :param kw: 99 | """ 100 | super(WarmupCosineSchedule, self).__init__(warmup=warmup, t_total=t_total, **kw) 101 | self.cycles = cycles 102 | 103 | def get_lr_(self, progress): 104 | if progress < self.warmup: 105 | return progress / self.warmup 106 | else: 107 | progress = (progress - self.warmup) / (1 - self.warmup) # progress after warmup 108 | return 0.5 * (1. + math.cos(math.pi * self.cycles * 2 * progress)) 109 | 110 | 111 | class WarmupCosineWithHardRestartsSchedule(WarmupCosineSchedule): 112 | """ 113 | Linearly increases learning rate from 0 to 1 over `warmup` fraction of training steps. 114 | If `cycles` (default=1.) is different from default, learning rate follows `cycles` times a cosine decaying 115 | learning rate (with hard restarts). 116 | """ 117 | def __init__(self, warmup=0.002, t_total=-1, cycles=1., **kw): 118 | super(WarmupCosineWithHardRestartsSchedule, self).__init__(warmup=warmup, t_total=t_total, cycles=cycles, **kw) 119 | assert(cycles >= 1.) 120 | 121 | def get_lr_(self, progress): 122 | if progress < self.warmup: 123 | return progress / self.warmup 124 | else: 125 | progress = (progress - self.warmup) / (1 - self.warmup) # progress after warmup 126 | ret = 0.5 * (1. + math.cos(math.pi * ((self.cycles * progress) % 1))) 127 | return ret 128 | 129 | 130 | class WarmupCosineWithWarmupRestartsSchedule(WarmupCosineWithHardRestartsSchedule): 131 | """ 132 | All training progress is divided in `cycles` (default=1.) parts of equal length. 133 | Every part follows a schedule with the first `warmup` fraction of the training steps linearly increasing from 0. to 1., 134 | followed by a learning rate decreasing from 1. to 0. following a cosine curve. 135 | """ 136 | def __init__(self, warmup=0.002, t_total=-1, cycles=1., **kw): 137 | assert(warmup * cycles < 1.) 138 | warmup = warmup * cycles if warmup >= 0 else warmup 139 | super(WarmupCosineWithWarmupRestartsSchedule, self).__init__(warmup=warmup, t_total=t_total, cycles=cycles, **kw) 140 | 141 | def get_lr_(self, progress): 142 | progress = progress * self.cycles % 1. 143 | if progress < self.warmup: 144 | return progress / self.warmup 145 | else: 146 | progress = (progress - self.warmup) / (1 - self.warmup) # progress after warmup 147 | ret = 0.5 * (1. + math.cos(math.pi * progress)) 148 | return ret 149 | 150 | 151 | class WarmupConstantSchedule(_LRSchedule): 152 | """ 153 | Linearly increases learning rate from 0 to 1 over `warmup` fraction of training steps. 154 | Keeps learning rate equal to 1. after warmup. 155 | """ 156 | def get_lr_(self, progress): 157 | if progress < self.warmup: 158 | return progress / self.warmup 159 | return 1. 160 | 161 | 162 | class WarmupLinearSchedule(_LRSchedule): 163 | """ 164 | Linearly increases learning rate from 0 to 1 over `warmup` fraction of training steps. 165 | Linearly decreases learning rate from 1. to 0. over remaining `1 - warmup` steps. 166 | """ 167 | warn_t_total = True 168 | def get_lr_(self, progress): 169 | if progress < self.warmup: 170 | return progress / self.warmup 171 | return max((progress - 1.) / (self.warmup - 1.), 0.) 172 | 173 | 174 | SCHEDULES = { 175 | None: ConstantLR, 176 | "none": ConstantLR, 177 | "warmup_cosine": WarmupCosineSchedule, 178 | "warmup_constant": WarmupConstantSchedule, 179 | "warmup_linear": WarmupLinearSchedule 180 | } 181 | 182 | 183 | class BertAdam(Optimizer): 184 | """Implements BERT version of Adam algorithm with weight decay fix. 185 | Params: 186 | lr: learning rate 187 | warmup: portion of t_total for the warmup, -1 means no warmup. Default: -1 188 | t_total: total number of training steps for the learning 189 | rate schedule, -1 means constant learning rate of 1. (no warmup regardless of warmup setting). Default: -1 190 | schedule: schedule to use for the warmup (see above). 191 | Can be `'warmup_linear'`, `'warmup_constant'`, `'warmup_cosine'`, `'none'`, `None` or a `_LRSchedule` object (see below). 192 | If `None` or `'none'`, learning rate is always kept constant. 193 | Default : `'warmup_linear'` 194 | b1: Adams b1. Default: 0.9 195 | b2: Adams b2. Default: 0.999 196 | e: Adams epsilon. Default: 1e-6 197 | weight_decay: Weight decay. Default: 0.01 198 | max_grad_norm: Maximum norm for the gradients (-1 means no clipping). Default: 1.0 199 | """ 200 | def __init__(self, params, lr=required, warmup=-1, t_total=-1, schedule='warmup_linear', 201 | b1=0.9, b2=0.999, e=1e-6, weight_decay=0.01, max_grad_norm=1.0, **kwargs): 202 | if lr is not required and lr < 0.0: 203 | raise ValueError("Invalid learning rate: {} - should be >= 0.0".format(lr)) 204 | if not isinstance(schedule, _LRSchedule) and schedule not in SCHEDULES: 205 | raise ValueError("Invalid schedule parameter: {}".format(schedule)) 206 | if not 0.0 <= b1 < 1.0: 207 | raise ValueError("Invalid b1 parameter: {} - should be in [0.0, 1.0[".format(b1)) 208 | if not 0.0 <= b2 < 1.0: 209 | raise ValueError("Invalid b2 parameter: {} - should be in [0.0, 1.0[".format(b2)) 210 | if not e >= 0.0: 211 | raise ValueError("Invalid epsilon value: {} - should be >= 0.0".format(e)) 212 | # initialize schedule object 213 | if not isinstance(schedule, _LRSchedule): 214 | schedule_type = SCHEDULES[schedule] 215 | schedule = schedule_type(warmup=warmup, t_total=t_total) 216 | else: 217 | if warmup != -1 or t_total != -1: 218 | logger.warning("warmup and t_total on the optimizer are ineffective when _LRSchedule object is provided as schedule. " 219 | "Please specify custom warmup and t_total in _LRSchedule object.") 220 | defaults = dict(lr=lr, schedule=schedule, 221 | b1=b1, b2=b2, e=e, weight_decay=weight_decay, 222 | max_grad_norm=max_grad_norm) 223 | super(BertAdam, self).__init__(params, defaults) 224 | 225 | def get_lr(self): 226 | lr = [] 227 | for group in self.param_groups: 228 | for p in group['params']: 229 | state = self.state[p] 230 | if len(state) == 0: 231 | return [0] 232 | lr_scheduled = group['lr'] 233 | lr_scheduled *= group['schedule'].get_lr(state['step']) 234 | lr.append(lr_scheduled) 235 | return lr 236 | 237 | def step(self, closure=None): 238 | """Performs a single optimization step. 239 | 240 | Arguments: 241 | closure (callable, optional): A closure that reevaluates the model 242 | and returns the loss. 243 | """ 244 | loss = None 245 | if closure is not None: 246 | loss = closure() 247 | 248 | for group in self.param_groups: 249 | for p in group['params']: 250 | if p.grad is None: 251 | continue 252 | grad = p.grad.data 253 | if grad.is_sparse: 254 | raise RuntimeError('Adam does not support sparse gradients, please consider SparseAdam instead') 255 | 256 | state = self.state[p] 257 | 258 | # State initialization 259 | if len(state) == 0: 260 | state['step'] = 0 261 | # Exponential moving average of gradient values 262 | state['next_m'] = torch.zeros_like(p.data) 263 | # Exponential moving average of squared gradient values 264 | state['next_v'] = torch.zeros_like(p.data) 265 | 266 | next_m, next_v = state['next_m'], state['next_v'] 267 | beta1, beta2 = group['b1'], group['b2'] 268 | 269 | # Add grad clipping 270 | if group['max_grad_norm'] > 0: 271 | clip_grad_norm_(p, group['max_grad_norm']) 272 | 273 | # Decay the first and second moment running average coefficient 274 | # In-place operations to update the averages at the same time 275 | next_m.mul_(beta1).add_(1 - beta1, grad) 276 | next_v.mul_(beta2).addcmul_(1 - beta2, grad, grad) 277 | update = next_m / (next_v.sqrt() + group['e']) 278 | 279 | # Just adding the square of the weights to the loss function is *not* 280 | # the correct way of using L2 regularization/weight decay with Adam, 281 | # since that will interact with the m and v parameters in strange ways. 282 | # 283 | # Instead we want to decay the weights in a manner that doesn't interact 284 | # with the m/v parameters. This is equivalent to adding the square 285 | # of the weights to the loss with plain (non-momentum) SGD. 286 | if group['weight_decay'] > 0.0: 287 | update += group['weight_decay'] * p.data 288 | 289 | lr_scheduled = group['lr'] 290 | lr_scheduled *= group['schedule'].get_lr(state['step']) 291 | 292 | update_with_lr = lr_scheduled * update 293 | p.data.add_(-update_with_lr) 294 | 295 | state['step'] += 1 296 | 297 | # step_size = lr_scheduled * math.sqrt(bias_correction2) / bias_correction1 298 | # No bias correction 299 | # bias_correction1 = 1 - beta1 ** state['step'] 300 | # bias_correction2 = 1 - beta2 ** state['step'] 301 | 302 | return loss 303 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/bert_utils/tokenization.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | """Tokenization classes.""" 16 | 17 | from __future__ import absolute_import, division, print_function, unicode_literals 18 | 19 | import collections 20 | import logging 21 | import os 22 | import unicodedata 23 | from io import open 24 | 25 | from .file_utils import cached_path 26 | 27 | logger = logging.getLogger(__name__) 28 | 29 | PRETRAINED_VOCAB_ARCHIVE_MAP = { 30 | 'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt", 31 | 'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt", 32 | 'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt", 33 | 'bert-large-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt", 34 | 'bert-base-multilingual-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt", 35 | 'bert-base-multilingual-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-cased-vocab.txt", 36 | 'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese-vocab.txt", 37 | 'bert-base-german-cased': "https://int-deepset-models-bert.s3.eu-central-1.amazonaws.com/pytorch/bert-base-german-cased-vocab.txt", 38 | 'bert-large-uncased-whole-word-masking': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-vocab.txt", 39 | 'bert-large-cased-whole-word-masking': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-whole-word-masking-vocab.txt", 40 | 'bert-large-uncased-whole-word-masking-finetuned-squad': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-whole-word-masking-finetuned-squad-vocab.txt", 41 | 'bert-large-cased-whole-word-masking-finetuned-squad': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-whole-word-masking-finetuned-squad-vocab.txt", 42 | 'bert-base-cased-finetuned-mrpc': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-finetuned-mrpc-vocab.txt", 43 | } 44 | PRETRAINED_VOCAB_POSITIONAL_EMBEDDINGS_SIZE_MAP = { 45 | 'bert-base-uncased': 512, 46 | 'bert-large-uncased': 512, 47 | 'bert-base-cased': 512, 48 | 'bert-large-cased': 512, 49 | 'bert-base-multilingual-uncased': 512, 50 | 'bert-base-multilingual-cased': 512, 51 | 'bert-base-chinese': 512, 52 | 'bert-base-german-cased': 512, 53 | 'bert-large-uncased-whole-word-masking': 512, 54 | 'bert-large-cased-whole-word-masking': 512, 55 | 'bert-large-uncased-whole-word-masking-finetuned-squad': 512, 56 | 'bert-large-cased-whole-word-masking-finetuned-squad': 512, 57 | 'bert-base-cased-finetuned-mrpc': 512, 58 | } 59 | VOCAB_NAME = 'vocab.txt' 60 | 61 | 62 | def load_vocab(vocab_file): 63 | """Loads a vocabulary file into a dictionary.""" 64 | vocab = collections.OrderedDict() 65 | index = 0 66 | with open(vocab_file, "r", encoding="utf-8") as reader: 67 | while True: 68 | token = reader.readline() 69 | if not token: 70 | break 71 | token = token.strip() 72 | vocab[token] = index 73 | index += 1 74 | return vocab 75 | 76 | 77 | def whitespace_tokenize(text): 78 | """Runs basic whitespace cleaning and splitting on a piece of text.""" 79 | text = text.strip() 80 | if not text: 81 | return [] 82 | tokens = text.split() 83 | return tokens 84 | 85 | 86 | class BertTokenizer(object): 87 | """Runs end-to-end tokenization: punctuation splitting + wordpiece""" 88 | 89 | def __init__(self, vocab_file, do_lower_case=True, max_len=None, do_basic_tokenize=True, 90 | never_split=("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]")): 91 | """Constructs a BertTokenizer. 92 | 93 | Args: 94 | vocab_file: Path to a one-wordpiece-per-line vocabulary file 95 | do_lower_case: Whether to lower case the input 96 | Only has an effect when do_wordpiece_only=False 97 | do_basic_tokenize: Whether to do basic tokenization before wordpiece. 98 | max_len: An artificial maximum length to truncate tokenized sequences to; 99 | Effective maximum length is always the minimum of this 100 | value (if specified) and the underlying BERT model's 101 | sequence length. 102 | never_split: List of tokens which will never be split during tokenization. 103 | Only has an effect when do_wordpiece_only=False 104 | """ 105 | if not os.path.isfile(vocab_file): 106 | raise ValueError( 107 | "Can't find a vocabulary file at path '{}'. To load the vocabulary from a Google pretrained " 108 | "model use `tokenizer = BertTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)`".format(vocab_file)) 109 | self.vocab = load_vocab(vocab_file) 110 | self.ids_to_tokens = collections.OrderedDict( 111 | [(ids, tok) for tok, ids in self.vocab.items()]) 112 | self.do_basic_tokenize = do_basic_tokenize 113 | if do_basic_tokenize: 114 | self.basic_tokenizer = BasicTokenizer(do_lower_case=do_lower_case, 115 | never_split=never_split) 116 | self.wordpiece_tokenizer = WordpieceTokenizer(vocab=self.vocab) 117 | self.max_len = max_len if max_len is not None else int(1e12) 118 | 119 | def tokenize(self, text): 120 | split_tokens = [] 121 | if self.do_basic_tokenize: 122 | for token in self.basic_tokenizer.tokenize(text): 123 | for sub_token in self.wordpiece_tokenizer.tokenize(token): 124 | split_tokens.append(sub_token) 125 | else: 126 | split_tokens = self.wordpiece_tokenizer.tokenize(text) 127 | return split_tokens 128 | 129 | def convert_tokens_to_ids(self, tokens): 130 | """Converts a sequence of tokens into ids using the vocab.""" 131 | ids = [] 132 | for token in tokens: 133 | ids.append(self.vocab[token]) 134 | if len(ids) > self.max_len: 135 | logger.warning( 136 | "Token indices sequence length is longer than the specified maximum " 137 | " sequence length for this BERT model ({} > {}). Running this" 138 | " sequence through BERT will result in indexing errors".format(len(ids), self.max_len) 139 | ) 140 | return ids 141 | 142 | def convert_ids_to_tokens(self, ids): 143 | """Converts a sequence of ids in wordpiece tokens using the vocab.""" 144 | tokens = [] 145 | for i in ids: 146 | tokens.append(self.ids_to_tokens[i]) 147 | return tokens 148 | 149 | def save_vocabulary(self, vocab_path): 150 | """Save the tokenizer vocabulary to a directory or file.""" 151 | index = 0 152 | if os.path.isdir(vocab_path): 153 | vocab_file = os.path.join(vocab_path, VOCAB_NAME) 154 | with open(vocab_file, "w", encoding="utf-8") as writer: 155 | for token, token_index in sorted(self.vocab.items(), key=lambda kv: kv[1]): 156 | if index != token_index: 157 | logger.warning("Saving vocabulary to {}: vocabulary indices are not consecutive." 158 | " Please check that the vocabulary is not corrupted!".format(vocab_file)) 159 | index = token_index 160 | writer.write(token + u'\n') 161 | index += 1 162 | return vocab_file 163 | 164 | @classmethod 165 | def from_pretrained(cls, pretrained_model_name_or_path, cache_dir=None, *inputs, **kwargs): 166 | """ 167 | Instantiate a PreTrainedBertModel from a pre-trained model file. 168 | Download and cache the pre-trained model file if needed. 169 | """ 170 | if pretrained_model_name_or_path in PRETRAINED_VOCAB_ARCHIVE_MAP: 171 | vocab_file = PRETRAINED_VOCAB_ARCHIVE_MAP[pretrained_model_name_or_path] 172 | if '-cased' in pretrained_model_name_or_path and kwargs.get('do_lower_case', True): 173 | logger.warning("The pre-trained model you are loading is a cased model but you have not set " 174 | "`do_lower_case` to False. We are setting `do_lower_case=False` for you but " 175 | "you may want to check this behavior.") 176 | kwargs['do_lower_case'] = False 177 | elif '-cased' not in pretrained_model_name_or_path and not kwargs.get('do_lower_case', True): 178 | logger.warning("The pre-trained model you are loading is an uncased model but you have set " 179 | "`do_lower_case` to False. We are setting `do_lower_case=True` for you " 180 | "but you may want to check this behavior.") 181 | kwargs['do_lower_case'] = True 182 | else: 183 | vocab_file = pretrained_model_name_or_path 184 | if os.path.isdir(vocab_file): 185 | vocab_file = os.path.join(vocab_file, VOCAB_NAME) 186 | # redirect to the cache, if necessary 187 | try: 188 | resolved_vocab_file = cached_path(vocab_file, cache_dir=cache_dir) 189 | except EnvironmentError: 190 | if pretrained_model_name_or_path in PRETRAINED_VOCAB_ARCHIVE_MAP: 191 | logger.error( 192 | "Couldn't reach server at '{}' to download vocabulary.".format( 193 | vocab_file)) 194 | else: 195 | logger.error( 196 | "Model name '{}' was not found in model name list ({}). " 197 | "We assumed '{}' was a path or url but couldn't find any file " 198 | "associated to this path or url.".format( 199 | pretrained_model_name_or_path, 200 | ', '.join(PRETRAINED_VOCAB_ARCHIVE_MAP.keys()), 201 | vocab_file)) 202 | return None 203 | if resolved_vocab_file == vocab_file: 204 | logger.info("loading vocabulary file {}".format(vocab_file)) 205 | else: 206 | logger.info("loading vocabulary file {} from cache at {}".format( 207 | vocab_file, resolved_vocab_file)) 208 | if pretrained_model_name_or_path in PRETRAINED_VOCAB_POSITIONAL_EMBEDDINGS_SIZE_MAP: 209 | # if we're using a pretrained model, ensure the tokenizer wont index sequences longer 210 | # than the number of positional embeddings 211 | max_len = PRETRAINED_VOCAB_POSITIONAL_EMBEDDINGS_SIZE_MAP[pretrained_model_name_or_path] 212 | kwargs['max_len'] = min(kwargs.get('max_len', int(1e12)), max_len) 213 | # Instantiate tokenizer. 214 | tokenizer = cls(resolved_vocab_file, *inputs, **kwargs) 215 | return tokenizer 216 | 217 | 218 | class BasicTokenizer(object): 219 | """Runs basic tokenization (punctuation splitting, lower casing, etc.).""" 220 | 221 | def __init__(self, 222 | do_lower_case=True, 223 | never_split=("[UNK]", "[SEP]", "[PAD]", "[CLS]", "[MASK]")): 224 | """Constructs a BasicTokenizer. 225 | 226 | Args: 227 | do_lower_case: Whether to lower case the input. 228 | """ 229 | self.do_lower_case = do_lower_case 230 | self.never_split = never_split 231 | 232 | def tokenize(self, text): 233 | """Tokenizes a piece of text.""" 234 | text = self._clean_text(text) 235 | # This was added on November 1st, 2018 for the multilingual and Chinese 236 | # models. This is also applied to the English models now, but it doesn't 237 | # matter since the English models were not trained on any Chinese data 238 | # and generally don't have any Chinese data in them (there are Chinese 239 | # characters in the vocabulary because Wikipedia does have some Chinese 240 | # words in the English Wikipedia.). 241 | text = self._tokenize_chinese_chars(text) 242 | orig_tokens = whitespace_tokenize(text) 243 | split_tokens = [] 244 | for token in orig_tokens: 245 | if self.do_lower_case and token not in self.never_split: 246 | token = token.lower() 247 | token = self._run_strip_accents(token) 248 | split_tokens.extend(self._run_split_on_punc(token)) 249 | 250 | output_tokens = whitespace_tokenize(" ".join(split_tokens)) 251 | return output_tokens 252 | 253 | def _run_strip_accents(self, text): 254 | """Strips accents from a piece of text.""" 255 | text = unicodedata.normalize("NFD", text) 256 | output = [] 257 | for char in text: 258 | cat = unicodedata.category(char) 259 | if cat == "Mn": 260 | continue 261 | output.append(char) 262 | return "".join(output) 263 | 264 | def _run_split_on_punc(self, text): 265 | """Splits punctuation on a piece of text.""" 266 | if text in self.never_split: 267 | return [text] 268 | chars = list(text) 269 | i = 0 270 | start_new_word = True 271 | output = [] 272 | while i < len(chars): 273 | char = chars[i] 274 | if _is_punctuation(char): 275 | output.append([char]) 276 | start_new_word = True 277 | else: 278 | if start_new_word: 279 | output.append([]) 280 | start_new_word = False 281 | output[-1].append(char) 282 | i += 1 283 | 284 | return ["".join(x) for x in output] 285 | 286 | def _tokenize_chinese_chars(self, text): 287 | """Adds whitespace around any CJK character.""" 288 | output = [] 289 | for char in text: 290 | cp = ord(char) 291 | if self._is_chinese_char(cp): 292 | output.append(" ") 293 | output.append(char) 294 | output.append(" ") 295 | else: 296 | output.append(char) 297 | return "".join(output) 298 | 299 | def _is_chinese_char(self, cp): 300 | """Checks whether CP is the codepoint of a CJK character.""" 301 | # This defines a "chinese character" as anything in the CJK Unicode block: 302 | # https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_(Unicode_block) 303 | # 304 | # Note that the CJK Unicode block is NOT all Japanese and Korean characters, 305 | # despite its name. The modern Korean Hangul alphabet is a different block, 306 | # as is Japanese Hiragana and Katakana. Those alphabets are used to write 307 | # space-separated words, so they are not treated specially and handled 308 | # like the all of the other languages. 309 | if ((cp >= 0x4E00 and cp <= 0x9FFF) or # 310 | (cp >= 0x3400 and cp <= 0x4DBF) or # 311 | (cp >= 0x20000 and cp <= 0x2A6DF) or # 312 | (cp >= 0x2A700 and cp <= 0x2B73F) or # 313 | (cp >= 0x2B740 and cp <= 0x2B81F) or # 314 | (cp >= 0x2B820 and cp <= 0x2CEAF) or 315 | (cp >= 0xF900 and cp <= 0xFAFF) or # 316 | (cp >= 0x2F800 and cp <= 0x2FA1F)): # 317 | return True 318 | 319 | return False 320 | 321 | def _clean_text(self, text): 322 | """Performs invalid character removal and whitespace cleanup on text.""" 323 | output = [] 324 | for char in text: 325 | cp = ord(char) 326 | if cp == 0 or cp == 0xfffd or _is_control(char): 327 | continue 328 | if _is_whitespace(char): 329 | output.append(" ") 330 | else: 331 | output.append(char) 332 | return "".join(output) 333 | 334 | 335 | class WordpieceTokenizer(object): 336 | """Runs WordPiece tokenization.""" 337 | 338 | def __init__(self, vocab, unk_token="[UNK]", max_input_chars_per_word=100): 339 | self.vocab = vocab 340 | self.unk_token = unk_token 341 | self.max_input_chars_per_word = max_input_chars_per_word 342 | 343 | def tokenize(self, text): 344 | """Tokenizes a piece of text into its word pieces. 345 | 346 | This uses a greedy longest-match-first algorithm to perform tokenization 347 | using the given vocabulary. 348 | 349 | For example: 350 | input = "unaffable" 351 | output = ["un", "##aff", "##able"] 352 | 353 | Args: 354 | text: A single token or whitespace separated tokens. This should have 355 | already been passed through `BasicTokenizer`. 356 | 357 | Returns: 358 | A list of wordpiece tokens. 359 | """ 360 | 361 | output_tokens = [] 362 | for token in whitespace_tokenize(text): 363 | chars = list(token) 364 | if len(chars) > self.max_input_chars_per_word: 365 | output_tokens.append(self.unk_token) 366 | continue 367 | 368 | is_bad = False 369 | start = 0 370 | sub_tokens = [] 371 | while start < len(chars): 372 | end = len(chars) 373 | cur_substr = None 374 | while start < end: 375 | substr = "".join(chars[start:end]) 376 | if start > 0: 377 | substr = "##" + substr 378 | if substr in self.vocab: 379 | cur_substr = substr 380 | break 381 | end -= 1 382 | if cur_substr is None: 383 | is_bad = True 384 | break 385 | sub_tokens.append(cur_substr) 386 | start = end 387 | 388 | if is_bad: 389 | output_tokens.append(self.unk_token) 390 | else: 391 | output_tokens.extend(sub_tokens) 392 | return output_tokens 393 | 394 | 395 | def _is_whitespace(char): 396 | """Checks whether `chars` is a whitespace character.""" 397 | # \t, \n, and \r are technically contorl characters but we treat them 398 | # as whitespace since they are generally considered as such. 399 | if char == " " or char == "\t" or char == "\n" or char == "\r": 400 | return True 401 | cat = unicodedata.category(char) 402 | if cat == "Zs": 403 | return True 404 | return False 405 | 406 | 407 | def _is_control(char): 408 | """Checks whether `chars` is a control character.""" 409 | # These are technically control characters but we count them as whitespace 410 | # characters. 411 | if char == "\t" or char == "\n" or char == "\r": 412 | return False 413 | cat = unicodedata.category(char) 414 | if cat.startswith("C"): 415 | return True 416 | return False 417 | 418 | 419 | def _is_punctuation(char): 420 | """Checks whether `chars` is a punctuation character.""" 421 | cp = ord(char) 422 | # We treat all non-letter/number ASCII as punctuation. 423 | # Characters such as "^", "$", and "`" are not in the Unicode 424 | # Punctuation class but we treat them as punctuation anyways, for 425 | # consistency. 426 | if ((cp >= 33 and cp <= 47) or (cp >= 58 and cp <= 64) or 427 | (cp >= 91 and cp <= 96) or (cp >= 123 and cp <= 126)): 428 | return True 429 | cat = unicodedata.category(char) 430 | if cat.startswith("P"): 431 | return True 432 | return False 433 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/dataset_utils.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | 3 | import codecs as cs 4 | import os 5 | import sys 6 | import pdb 7 | sys.path.insert(0, '/mnt/nfs-storage-titan/BERT/pytorch_pretrained_BERT') 8 | from pytorch_pretrained_bert.tokenization import BertTokenizer 9 | 10 | def read_pair_gold(f, args): 11 | # key: text + aspect span + opinion span; value: corresponding category-sentiment type number 12 | 13 | tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case) 14 | quad_text = [] 15 | quad_gold = [] 16 | for line in f: 17 | cur_quad_gold = [[]] 18 | line = line.strip().split('\t') 19 | text = line[0].split('####')[0] 20 | text = text.split(' ') 21 | cur_text = tokenizer.convert_tokens_to_ids(text) 22 | # while len(cur_text) < args.max_seq_length: 23 | # cur_text.append(0) 24 | quad_text.append(cur_text) 25 | cur_quad_gold[0].append(line[0].split('####')[1]) 26 | for ele in line[1:]: 27 | if ele not in cur_quad_gold[0]: 28 | cur_quad_gold[0].append(ele) 29 | quad_gold += cur_quad_gold 30 | return quad_text, quad_gold 31 | 32 | 33 | def read_triplet_gold(f, args): 34 | # key: text + aspect span + opinion span + sentiment type; value: corresponding category type number 35 | tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case) 36 | quad_text = [] 37 | quad_gold = [] 38 | for line in f: 39 | cur_quad_gold = [[]] 40 | line = line.strip().split('\t') 41 | text = line[0].split('####')[0] 42 | text = text.split(' ') 43 | cur_text = tokenizer.convert_tokens_to_ids(text) 44 | # while len(cur_text) < args.max_seq_length: 45 | # cur_text.append(0) 46 | quad_text.append(cur_text) 47 | cur_quad_gold[0].append(line[0].split('####')[1]) 48 | for ele in line[1:]: 49 | if ele not in cur_quad_gold[0]: 50 | cur_quad_gold[0].append(ele) 51 | quad_gold += cur_quad_gold 52 | 53 | return quad_text, quad_gold -------------------------------------------------------------------------------- /Extract-Classify-ACOS/eval_metrics.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | 3 | from __future__ import absolute_import, division, print_function 4 | 5 | import argparse 6 | import logging 7 | import os 8 | import sys 9 | import random 10 | from tqdm import tqdm, trange 11 | import pdb 12 | import warnings 13 | import codecs as cs 14 | import copy 15 | import re 16 | # warnings.filterwarnings('ignore') 17 | 18 | import numpy as np 19 | 20 | import torch 21 | from torch.nn import CrossEntropyLoss, MSELoss, MultiLabelSoftMarginLoss, BCEWithLogitsLoss 22 | 23 | from run_classifier_dataset_utils import compute_metrics 24 | 25 | def measureQuad(pred, gold): 26 | tp = .0 27 | fp = .0 28 | fn = .0 29 | for text in pred: 30 | cnt = 0 31 | if text in gold: 32 | for pair in pred[text]: 33 | if pair in gold[text]: 34 | cnt += 1 35 | tp += cnt 36 | fp += len(pred[text])-cnt 37 | if text in gold: 38 | fn += len(gold[text])-cnt 39 | for text in gold: 40 | if text not in pred: 41 | fn += len(gold[text]) 42 | 43 | print("tp: {}. fp: {}. fn: {}.".format(tp, fp, fn)) 44 | p = 0 if tp + fp == 0 else 1.*tp / (tp + fp) 45 | r = 0 if tp + fn == 0 else 1.*tp / (tp + fn) 46 | f = 0 if p + r == 0 else 2 * p * r / (p + r) 47 | return {'precision':p, 'recall':r, 'micro-F1':f} 48 | 49 | def pred_eval(_e, args, logger, tokenizer, model, dataloader, eval_gold, label_list, device, task_name, eval_type='valid'): 50 | 51 | preds = {} 52 | golds = {} 53 | ids_to_token = {} 54 | pred_aspect_tag = [] 55 | pred_imp_aspect = [] 56 | pred_imp_opinion = [] 57 | input_text, pairgold = eval_gold 58 | _all_tokens_len = [] 59 | input_length_map = {} 60 | entity_label = r'32*' 61 | opinion_entity_label = r'54*' 62 | label_map_seq = {label : i for i, label in enumerate(label_list[1])} 63 | 64 | for index in range(0, len(pairgold), 3): 65 | cur_quad = pairgold[index] 66 | gold_imp_aspect = pairgold[index+1] 67 | gold_imp_opinion = pairgold[index+2] 68 | gold_tag = [] 69 | cur_aspect_tag = ''.join(str(ele) for ele in cur_quad) 70 | max_len = len(cur_aspect_tag) 71 | for ele in re.finditer(entity_label, cur_aspect_tag): 72 | gold_tag.append('a-' + str(ele.start()) + ',' + str(ele.end())) 73 | if gold_imp_aspect == 1: 74 | gold_tag.append('a--1,-1') 75 | 76 | for ele in re.finditer(opinion_entity_label, cur_aspect_tag): 77 | gold_tag.append('o-' + str(ele.start()) + ',' + str(ele.end())) 78 | if gold_imp_opinion == 1: 79 | gold_tag.append('o--1,-1') 80 | 81 | cur_input = ' '.join(str(ele) for ele in input_text[index//3]) 82 | golds[cur_input] = gold_tag 83 | ids_to_token[cur_input] = ' '.join(ele for ele in tokenizer.convert_ids_to_tokens(input_text[index//3])) 84 | 85 | for step, batch in enumerate(dataloader): 86 | 87 | if step % 500 == 0 and step>0: 88 | print(step) 89 | 90 | _all_tokens_len += batch[0].numpy().tolist() 91 | batch = tuple(t.to(device) for t in batch) 92 | _tokens_len, _aspect_input_ids, _aspect_input_mask, _aspect_ids, _aspect_segment_ids, \ 93 | _exist_imp_aspect, _exist_imp_opinion = batch 94 | 95 | with torch.no_grad(): 96 | _, logits = model(aspect_input_ids=_aspect_input_ids, aspect_labels=_aspect_ids, 97 | aspect_token_type_ids=_aspect_segment_ids, aspect_attention_mask=_aspect_input_mask, 98 | exist_imp_aspect=_exist_imp_aspect, exist_imp_opinion=_exist_imp_opinion) 99 | 100 | # input '[CLS] text [SEP] category/sentiment [SEP]', obtain '[CLS] text' only, first '[SEP]' is used to predict 101 | # the existence of implicit aspect or opinion. 102 | 103 | logits_imp_aspect = np.argmax(logits[1].detach().cpu().numpy(), axis=-1).tolist() 104 | logits_imp_opinion = np.argmax(logits[2].detach().cpu().numpy(), axis=-1).tolist() 105 | for i, ele in enumerate(logits[0]): 106 | pred_aspect_tag.append(ele) 107 | for i, ele in enumerate(logits_imp_aspect): 108 | pred_imp_aspect.append(ele) 109 | for i, ele in enumerate(logits_imp_opinion): 110 | pred_imp_opinion.append(ele) 111 | 112 | for i in range(len(pred_aspect_tag)): 113 | cur_aspect_tag = ''.join(str(ele) for ele in pred_aspect_tag[i]) 114 | pred_tag = [] 115 | for ele in re.finditer(entity_label, cur_aspect_tag): 116 | pred_tag.append('a-'+str(ele.start()-1) + ',' + str(ele.end()-1)) 117 | if pred_imp_aspect[i] == 1: 118 | pred_tag.append('a--1,-1') 119 | 120 | for ele in re.finditer(opinion_entity_label, cur_aspect_tag): 121 | pred_tag.append('o-'+str(ele.start()-1) + ',' + str(ele.end()-1)) 122 | if pred_imp_opinion[i] == 1: 123 | pred_tag.append('o--1,-1') 124 | 125 | cur_input = ' '.join(str(ele) for ele in input_text[i]) 126 | preds[cur_input] = pred_tag 127 | input_length_map[cur_input] = _all_tokens_len[i] 128 | 129 | res = measureQuad(preds, golds) 130 | if eval_type == 'valid': 131 | pipeline_file = cs.open(args.output_dir+os.sep+'valid.txt', 'w') 132 | else: 133 | pipeline_file = cs.open(args.output_dir+os.sep+'pred4pipeline.txt', 'w') 134 | for text in preds: 135 | length = input_length_map[text]-1 136 | cur_text = ids_to_token[text] 137 | cur_text = cur_text.split(' ')[1:length] 138 | if len(preds[text]) > 0: 139 | pipeline_file.write(' '.join(ele for ele in cur_text)+'\t'+'\t'.join(ele for ele in preds[text])+'\n') 140 | 141 | if eval_type == 'valid': 142 | logger.info("***** Eval results *****") 143 | for key in sorted(res.keys()): 144 | logger.info(" %s = %s", key, str(res[key])) 145 | return res 146 | 147 | elif eval_type == 'test': 148 | logger.info("***** Test results *****") 149 | for key in sorted(res.keys()): 150 | logger.info(" %s = %s", key, str(res[key])) 151 | return res 152 | 153 | 154 | def getTextType(gold): 155 | text_type = {} 156 | for text in gold: 157 | if text not in text_type: 158 | text_type[text] = [] 159 | 160 | for ele in gold[text]: 161 | if 4 not in text_type[text]: 162 | text_type[text].append(4) 163 | if '-1' not in ele[2] and '-1' not in ele[3]: 164 | if 0 not in text_type[text]: 165 | text_type[text].append(0) 166 | elif '-1' in ele[2] and '-1' not in ele[3]: 167 | if 1 not in text_type[text]: 168 | text_type[text].append(1) 169 | elif '-1' not in ele[2] and '-1' in ele[3]: 170 | if 2 not in text_type[text]: 171 | text_type[text].append(2) 172 | elif '-1' in ele[2] and '-1' in ele[3]: 173 | if 3 not in text_type[text]: 174 | text_type[text].append(3) 175 | 176 | return text_type 177 | 178 | def measureQuad_imp(pred, gold, text_type): 179 | tp = [.0, .0, .0, .0, .0] 180 | fp = [.0, .0, .0, .0, .0] 181 | fn = [.0, .0, .0, .0, .0] 182 | 183 | # text_set = set() 184 | # for text in gold: 185 | # text_set.add(text) 186 | # for text in text_set: 187 | # for dt in text_type[text]: 188 | # cnt = 0 189 | # for ele in pred[text]: 190 | # if ele in gold[text]: 191 | # cnt += 1 192 | # tp[dt] += cnt 193 | # fp[dt] += len(pred[text])-cnt 194 | 195 | # for ele in gold[text]: 196 | # if ele not in pred[text]: 197 | # fn[dt] += 1 198 | 199 | for text in pred: 200 | for dt in text_type[text]: 201 | cnt = 0 202 | if text in gold: 203 | for pair in pred[text]: 204 | if pair in gold[text]: 205 | cnt += 1 206 | tp[dt] += cnt 207 | fp[dt] += len(pred[text])-cnt 208 | if text in gold: 209 | fn[dt] += len(gold[text])-cnt 210 | for text in gold: 211 | for dt in text_type[text]: 212 | if text not in pred: 213 | fn[dt] += len(gold[text]) 214 | 215 | for i in range(5): 216 | print("tp: {}. fp: {}. fn: {}.".format(tp[i], fp[i], fn[i])) 217 | p = 0 if tp[i] + fp[i] == 0 else 1.*tp[i] / (tp[i] + fp[i]) 218 | r = 0 if tp[i] + fn[i] == 0 else 1.*tp[i] / (tp[i] + fn[i]) 219 | f = 0 if p + r == 0 else 2 * p * r / (p + r) 220 | print(i, ': ', {'precision':p, 'recall':r, 'micro-F1':f}) 221 | return {'precision':p, 'recall':r, 'micro-F1':f} 222 | 223 | def pair_eval(_e, args, logger, tokenizer, model, dataloader, gold, label_list, device, task_name, eval_type='valid'): 224 | preds = {} 225 | golds = {} 226 | quad_preds = {} 227 | quad_golds = {} 228 | ids_to_token = {} 229 | catesenti_dict = {i: label for i, label in enumerate(label_list[0])} 230 | input_text, quadgold = gold 231 | for index, cur_quad in enumerate(quadgold): 232 | cur_input = ' '.join(str(ele) for ele in input_text[index]) 233 | cur_input = cur_input+' '+cur_quad[0] 234 | golds[cur_input] = cur_quad[1:] 235 | ori_text = ' '.join(ele for ele in tokenizer.convert_ids_to_tokens(input_text[index])) 236 | ids_to_token[cur_input] = ori_text+' '+cur_quad[0] 237 | 238 | quad_pairs = [] 239 | for ele in cur_quad[1:]: 240 | ele = ele.split('#') 241 | cate = '#'.join(item for item in ele[:-1]); senti = ele[-1] 242 | asp = cur_quad[0].split(' ')[0]; opi = cur_quad[0].split(' ')[1] 243 | tmp_quad = [cate, senti, asp, opi] 244 | if tmp_quad not in quad_pairs: 245 | quad_pairs.append(tmp_quad) 246 | if ori_text in quad_golds: 247 | quad_golds[ori_text] += quad_pairs 248 | else: 249 | quad_golds[ori_text] = quad_pairs 250 | tmp_cnt = 0 251 | for step, batch in enumerate(dataloader): 252 | 253 | batch = tuple(t.to(device) for t in batch) 254 | _tokens_len, _aspect_input_ids, _aspect_input_mask, _aspect_segment_ids, _candidate_aspect, \ 255 | _candidate_opinion, _label_id = batch 256 | 257 | # define a new function to compute loss values for both output_modes 258 | with torch.no_grad(): 259 | loss, logits = model(tokenizer, _e, aspect_input_ids=_aspect_input_ids, 260 | aspect_token_type_ids=_aspect_segment_ids, aspect_attention_mask=_aspect_input_mask, 261 | candidate_aspect=_candidate_aspect, candidate_opinion=_candidate_opinion, label_id=_label_id) 262 | 263 | logits = logits[0].detach().cpu().numpy() 264 | # pair_matrix = logits[0].view(len(_tokens_len), logits[1].item(), logits[1].item(), 3).detach().cpu().numpy() 265 | 266 | for i in range(len(_tokens_len)): 267 | #得到输入文本作为key,相应的类别预测结果作为value 268 | aspect_len = _aspect_input_mask[i].detach().cpu().numpy().sum() 269 | aspect_tags = _candidate_aspect[i].detach().cpu().numpy() 270 | opinion_tags = _candidate_opinion[i].detach().cpu().numpy() 271 | entity_label = r'11*' 272 | 273 | aspect_labels = ''.join(str(ele) for ele in aspect_tags) 274 | cur_aspect = [] 275 | for ele in re.finditer(entity_label, aspect_labels): 276 | # if (ele.end()-ele.start())<_tokens_len[i]-2: 277 | if ele.start() == 0 and '-1,-1' not in cur_aspect: 278 | cur_aspect.append('-1,-1') 279 | elif (ele.start() > 0 and ele.end() 0 and ele.end()0) 294 | for ele in ind[0]: 295 | pred_res.append(catesenti_dict[int(ele)]) 296 | ttt = (_aspect_input_ids[i].detach().cpu().numpy().tolist())[1:(_tokens_len[i]-1)] 297 | cur_input = ' '.join(str(ele) for ele in ttt)+' '+cur_ao 298 | ids_to_token[cur_input] = ' '.join(ele for ele in tokenizer.convert_ids_to_tokens(ttt))+' '+cur_ao 299 | 300 | preds[cur_input] = pred_res 301 | 302 | quad_pairs = [] 303 | for ele in pred_res: 304 | ele = ele.split('#') 305 | cate = '#'.join(item for item in ele[:-1]); senti = ele[-1] 306 | tmp_quad = [cate, senti, cur_aspect[0], cur_opinion[0]] 307 | if tmp_quad not in quad_pairs: 308 | quad_pairs.append(tmp_quad) 309 | tmp_cnt += len(quad_pairs) 310 | ori_text = ' '.join(ele for ele in tokenizer.convert_ids_to_tokens(ttt)) 311 | if ori_text in quad_preds: 312 | quad_preds[ori_text] += quad_pairs 313 | else: 314 | quad_preds[ori_text] = quad_pairs 315 | print("Quad num: {}".format(tmp_cnt)) 316 | # pdb.set_trace() 317 | res = measureQuad(preds, golds) 318 | text_type = getTextType(quad_golds) 319 | # tmp = measureQuad_imp(quad_preds, quad_golds) 320 | 321 | if eval_type == 'valid': 322 | logger.info("***** Eval results *****") 323 | for key in sorted(res.keys()): 324 | logger.info(" %s = %s", key, str(res[key])) 325 | return res 326 | 327 | elif eval_type == 'test': 328 | 329 | # evaluation for all sub-tasks, we do quad extraction, so the element number is 4. 330 | ele_num = 4 331 | index_to_name = {0:'category', 1:'sentiment', 2:'aspect', 3:'opinion'} 332 | for comb_choice in range(1, (1<>= 1 340 | sub_preds = {} 341 | sub_golds = {} 342 | for cur_key in quad_preds: 343 | cur_subs = [] 344 | for quad in quad_preds[cur_key]: 345 | cur_sub = [quad[index] for index in exist_index] 346 | if cur_sub not in cur_subs: 347 | cur_subs.append(cur_sub) 348 | sub_preds[cur_key] = cur_subs 349 | for cur_key in quad_golds: 350 | cur_subs = [] 351 | for quad in quad_golds[cur_key]: 352 | cur_sub = [quad[index] for index in exist_index] 353 | if cur_sub not in cur_subs: 354 | cur_subs.append(cur_sub) 355 | sub_golds[cur_key] = cur_subs 356 | sub_res = measureQuad_imp(sub_preds, sub_golds, text_type) 357 | subtask_name = ' '.join(index_to_name[ele] for ele in exist_index) 358 | # if subtask_name == 'aspect': 359 | # pdb.set_trace() 360 | logger.info("***** %s results *****", subtask_name) 361 | for key in sorted(sub_res.keys()): 362 | logger.info(" {} = {:.2%}".format(key, sub_res[key])) 363 | logger.info("-----------------------------------") 364 | 365 | pipeline_res = cs.open(args.output_dir+os.sep+'result.txt', 'w') 366 | for key in golds: 367 | pipeline_res.write(ids_to_token[key]+'\n') 368 | for cur_pair in golds[key]: 369 | pipeline_res.write(cur_pair+'\t') 370 | pipeline_res.write('\n') 371 | if key in preds: 372 | for cur_pair in preds[key]: 373 | pipeline_res.write(cur_pair+'\t') 374 | pipeline_res.write('\n\n') 375 | for key in preds: 376 | if key not in golds: 377 | pipeline_res.write(ids_to_token[key]+'\n') 378 | pipeline_res.write('\n') 379 | for cur_pair in preds[key]: 380 | pipeline_res.write(cur_pair+'\t') 381 | pipeline_res.write('\n\n') 382 | 383 | logger.info("***** Test results *****") 384 | for key in sorted(res.keys()): 385 | logger.info(" %s = %s", key, str(res[key])) 386 | return res -------------------------------------------------------------------------------- /Extract-Classify-ACOS/file_utils.py: -------------------------------------------------------------------------------- 1 | """ 2 | Utilities for working with the local dataset cache. 3 | This file is adapted from the AllenNLP library at https://github.com/allenai/allennlp 4 | Copyright by the AllenNLP authors. 5 | """ 6 | from __future__ import (absolute_import, division, print_function, unicode_literals) 7 | 8 | import sys 9 | import json 10 | import logging 11 | import os 12 | import shutil 13 | import tempfile 14 | import fnmatch 15 | from functools import wraps 16 | from hashlib import sha256 17 | import sys 18 | from io import open 19 | 20 | import boto3 21 | import requests 22 | from botocore.exceptions import ClientError 23 | from tqdm import tqdm 24 | 25 | try: 26 | from torch.hub import _get_torch_home 27 | torch_cache_home = _get_torch_home() 28 | except ImportError: 29 | torch_cache_home = os.path.expanduser( 30 | os.getenv('TORCH_HOME', os.path.join( 31 | os.getenv('XDG_CACHE_HOME', '~/.cache'), 'torch'))) 32 | default_cache_path = os.path.join(torch_cache_home, 'pytorch_pretrained_bert') 33 | 34 | try: 35 | from urllib.parse import urlparse 36 | except ImportError: 37 | from urlparse import urlparse 38 | 39 | try: 40 | from pathlib import Path 41 | PYTORCH_PRETRAINED_BERT_CACHE = Path( 42 | os.getenv('PYTORCH_PRETRAINED_BERT_CACHE', default_cache_path)) 43 | except (AttributeError, ImportError): 44 | PYTORCH_PRETRAINED_BERT_CACHE = os.getenv('PYTORCH_PRETRAINED_BERT_CACHE', 45 | default_cache_path) 46 | 47 | CONFIG_NAME = "config.json" 48 | WEIGHTS_NAME = "pytorch_model.bin" 49 | 50 | logger = logging.getLogger(__name__) # pylint: disable=invalid-name 51 | 52 | 53 | def url_to_filename(url, etag=None): 54 | """ 55 | Convert `url` into a hashed filename in a repeatable way. 56 | If `etag` is specified, append its hash to the url's, delimited 57 | by a period. 58 | """ 59 | url_bytes = url.encode('utf-8') 60 | url_hash = sha256(url_bytes) 61 | filename = url_hash.hexdigest() 62 | 63 | if etag: 64 | etag_bytes = etag.encode('utf-8') 65 | etag_hash = sha256(etag_bytes) 66 | filename += '.' + etag_hash.hexdigest() 67 | 68 | return filename 69 | 70 | 71 | def filename_to_url(filename, cache_dir=None): 72 | """ 73 | Return the url and etag (which may be ``None``) stored for `filename`. 74 | Raise ``EnvironmentError`` if `filename` or its stored metadata do not exist. 75 | """ 76 | if cache_dir is None: 77 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 78 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 79 | cache_dir = str(cache_dir) 80 | 81 | cache_path = os.path.join(cache_dir, filename) 82 | if not os.path.exists(cache_path): 83 | raise EnvironmentError("file {} not found".format(cache_path)) 84 | 85 | meta_path = cache_path + '.json' 86 | if not os.path.exists(meta_path): 87 | raise EnvironmentError("file {} not found".format(meta_path)) 88 | 89 | with open(meta_path, encoding="utf-8") as meta_file: 90 | metadata = json.load(meta_file) 91 | url = metadata['url'] 92 | etag = metadata['etag'] 93 | 94 | return url, etag 95 | 96 | 97 | def cached_path(url_or_filename, cache_dir=None): 98 | """ 99 | Given something that might be a URL (or might be a local path), 100 | determine which. If it's a URL, download the file and cache it, and 101 | return the path to the cached file. If it's already a local path, 102 | make sure the file exists and then return the path. 103 | """ 104 | if cache_dir is None: 105 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 106 | if sys.version_info[0] == 3 and isinstance(url_or_filename, Path): 107 | url_or_filename = str(url_or_filename) 108 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 109 | cache_dir = str(cache_dir) 110 | 111 | parsed = urlparse(url_or_filename) 112 | 113 | if parsed.scheme in ('http', 'https', 's3'): 114 | # URL, so get it from the cache (downloading if necessary) 115 | return get_from_cache(url_or_filename, cache_dir) 116 | elif os.path.exists(url_or_filename): 117 | # File, and it exists. 118 | return url_or_filename 119 | elif parsed.scheme == '': 120 | # File, but it doesn't exist. 121 | raise EnvironmentError("file {} not found".format(url_or_filename)) 122 | else: 123 | # Something unknown 124 | raise ValueError("unable to parse {} as a URL or as a local path".format(url_or_filename)) 125 | 126 | 127 | def split_s3_path(url): 128 | """Split a full s3 path into the bucket name and path.""" 129 | parsed = urlparse(url) 130 | if not parsed.netloc or not parsed.path: 131 | raise ValueError("bad s3 path {}".format(url)) 132 | bucket_name = parsed.netloc 133 | s3_path = parsed.path 134 | # Remove '/' at beginning of path. 135 | if s3_path.startswith("/"): 136 | s3_path = s3_path[1:] 137 | return bucket_name, s3_path 138 | 139 | 140 | def s3_request(func): 141 | """ 142 | Wrapper function for s3 requests in order to create more helpful error 143 | messages. 144 | """ 145 | 146 | @wraps(func) 147 | def wrapper(url, *args, **kwargs): 148 | try: 149 | return func(url, *args, **kwargs) 150 | except ClientError as exc: 151 | if int(exc.response["Error"]["Code"]) == 404: 152 | raise EnvironmentError("file {} not found".format(url)) 153 | else: 154 | raise 155 | 156 | return wrapper 157 | 158 | 159 | @s3_request 160 | def s3_etag(url): 161 | """Check ETag on S3 object.""" 162 | s3_resource = boto3.resource("s3") 163 | bucket_name, s3_path = split_s3_path(url) 164 | s3_object = s3_resource.Object(bucket_name, s3_path) 165 | return s3_object.e_tag 166 | 167 | 168 | @s3_request 169 | def s3_get(url, temp_file): 170 | """Pull a file directly from S3.""" 171 | s3_resource = boto3.resource("s3") 172 | bucket_name, s3_path = split_s3_path(url) 173 | s3_resource.Bucket(bucket_name).download_fileobj(s3_path, temp_file) 174 | 175 | 176 | def http_get(url, temp_file): 177 | req = requests.get(url, stream=True) 178 | content_length = req.headers.get('Content-Length') 179 | total = int(content_length) if content_length is not None else None 180 | progress = tqdm(unit="B", total=total) 181 | for chunk in req.iter_content(chunk_size=1024): 182 | if chunk: # filter out keep-alive new chunks 183 | progress.update(len(chunk)) 184 | temp_file.write(chunk) 185 | progress.close() 186 | 187 | 188 | def get_from_cache(url, cache_dir=None): 189 | """ 190 | Given a URL, look for the corresponding dataset in the local cache. 191 | If it's not there, download it. Then return the path to the cached file. 192 | """ 193 | if cache_dir is None: 194 | cache_dir = PYTORCH_PRETRAINED_BERT_CACHE 195 | if sys.version_info[0] == 3 and isinstance(cache_dir, Path): 196 | cache_dir = str(cache_dir) 197 | 198 | if not os.path.exists(cache_dir): 199 | os.makedirs(cache_dir) 200 | 201 | # Get eTag to add to filename, if it exists. 202 | if url.startswith("s3://"): 203 | etag = s3_etag(url) 204 | else: 205 | try: 206 | response = requests.head(url, allow_redirects=True) 207 | if response.status_code != 200: 208 | etag = None 209 | else: 210 | etag = response.headers.get("ETag") 211 | except EnvironmentError: 212 | etag = None 213 | 214 | if sys.version_info[0] == 2 and etag is not None: 215 | etag = etag.decode('utf-8') 216 | filename = url_to_filename(url, etag) 217 | 218 | # get cache path to put the file 219 | cache_path = os.path.join(cache_dir, filename) 220 | 221 | # If we don't have a connection (etag is None) and can't identify the file 222 | # try to get the last downloaded one 223 | if not os.path.exists(cache_path) and etag is None: 224 | matching_files = fnmatch.filter(os.listdir(cache_dir), filename + '.*') 225 | matching_files = list(filter(lambda s: not s.endswith('.json'), matching_files)) 226 | if matching_files: 227 | cache_path = os.path.join(cache_dir, matching_files[-1]) 228 | 229 | if not os.path.exists(cache_path): 230 | # Download to temporary file, then copy to cache dir once finished. 231 | # Otherwise you get corrupt cache entries if the download gets interrupted. 232 | with tempfile.NamedTemporaryFile() as temp_file: 233 | logger.info("%s not found in cache, downloading to %s", url, temp_file.name) 234 | 235 | # GET file object 236 | if url.startswith("s3://"): 237 | s3_get(url, temp_file) 238 | else: 239 | http_get(url, temp_file) 240 | 241 | # we are copying the file before closing it, so flush to avoid truncation 242 | temp_file.flush() 243 | # shutil.copyfileobj() starts at the current position, so go to the start 244 | temp_file.seek(0) 245 | 246 | logger.info("copying %s to cache at %s", temp_file.name, cache_path) 247 | with open(cache_path, 'wb') as cache_file: 248 | shutil.copyfileobj(temp_file, cache_file) 249 | 250 | logger.info("creating metadata file for %s", cache_path) 251 | meta = {'url': url, 'etag': etag} 252 | meta_path = cache_path + '.json' 253 | with open(meta_path, 'w') as meta_file: 254 | output_string = json.dumps(meta) 255 | if sys.version_info[0] == 2 and isinstance(output_string, str): 256 | output_string = unicode(output_string, 'utf-8') # The beauty of python 2 257 | meta_file.write(output_string) 258 | 259 | logger.info("removing temp file %s", temp_file.name) 260 | 261 | return cache_path 262 | 263 | 264 | def read_set_from_file(filename): 265 | ''' 266 | Extract a de-duped collection (set) of text from a file. 267 | Expected file format is one item per line. 268 | ''' 269 | collection = set() 270 | with open(filename, 'r', encoding='utf-8') as file_: 271 | for line in file_: 272 | collection.add(line.rstrip()) 273 | return collection 274 | 275 | 276 | def get_file_extension(path, dot=True, lower=True): 277 | ext = os.path.splitext(path)[1] 278 | ext = ext if dot else ext[1:] 279 | return ext.lower() if lower else ext 280 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/manager.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | Created on Mon Aug 7 19:38:30 2017 4 | 5 | @author: Quantum Liu 6 | """ 7 | 8 | ''' 9 | Example: 10 | gm=GPUManager() 11 | with gm.auto_choice(): 12 | blabla 13 | ''' 14 | 15 | import os 16 | import pdb 17 | import sched, time 18 | import datetime 19 | #from tensorflow.python.client import device_lib 20 | 21 | def check_gpus(): 22 | ''' 23 | GPU available check 24 | reference : http://feisky.xyz/machine-learning/tensorflow/gpu_list.html 25 | ''' 26 | # ============================================================================= 27 | # all_gpus = [x.name for x in device_lib.list_local_devices() if x.device_type == 'GPU'] 28 | # ============================================================================= 29 | first_gpus = os.popen('nvidia-smi --query-gpu=index --format=csv,noheader').readlines()[0].strip() 30 | if not first_gpus=='0': 31 | print('This script could only be used to manage NVIDIA GPUs,but no GPU found in your device') 32 | return False 33 | elif not 'NVIDIA System Management' in os.popen('nvidia-smi -h').read(): 34 | print("'nvidia-smi' tool not found.") 35 | return False 36 | return True 37 | 38 | def parse(line,qargs): 39 | ''' 40 | line: 41 | a line of text 42 | qargs: 43 | query arguments 44 | return: 45 | a dict of gpu infos 46 | Pasing a line of csv format text returned by nvidia-smi 47 | 解析一行nvidia-smi返回的csv格式文本 48 | ''' 49 | numberic_args = ['memory.free', 'memory.total', 'power.draw', 'power.limit']#可计数的参数 50 | power_manage_enable=lambda v:(not 'Not Support' in v)#lambda表达式,显卡是否滋瓷power management(笔记本可能不滋瓷) 51 | to_numberic=lambda v:float(v.upper().strip().replace('MIB','').replace('W',''))#带单位字符串去掉单位 52 | process = lambda k,v:((int(to_numberic(v)) if power_manage_enable(v) else 1) if k in numberic_args else v.strip()) 53 | return {k:process(k,v) for k,v in zip(qargs,line.strip().split(','))} 54 | 55 | def query_gpu(qargs=[]): 56 | ''' 57 | qargs: 58 | query arguments 59 | return: 60 | a list of dict 61 | Querying GPUs infos 62 | 查询GPU信息 63 | ''' 64 | qargs =['index','gpu_name', 'memory.free', 'memory.total', 'power.draw', 'power.limit', 'utilization.gpu']+ qargs 65 | cmd = 'nvidia-smi --query-gpu={} --format=csv,noheader'.format(','.join(qargs)) 66 | results = os.popen(cmd).readlines() 67 | return [parse(line,qargs) for line in results] 68 | 69 | def by_power(d): 70 | ''' 71 | helper function fo sorting gpus by power 72 | ''' 73 | power_infos=(d['power.draw'],d['power.limit']) 74 | if any(v==1 for v in power_infos): 75 | print('Power management unable for GPU {}'.format(d['index'])) 76 | return 1 77 | return float(d['power.draw'])/d['power.limit'] 78 | 79 | class GPUManager(): 80 | ''' 81 | qargs: 82 | query arguments 83 | A manager which can list all available GPU devices 84 | and sort them and choice the most free one.Unspecified 85 | ones pref. 86 | GPU设备管理器,考虑列举出所有可用GPU设备,并加以排序,自动选出 87 | 最空闲的设备。在一个GPUManager对象内会记录每个GPU是否已被指定, 88 | 优先选择未指定的GPU。 89 | ''' 90 | def __init__(self,qargs=[]): 91 | ''' 92 | ''' 93 | self.qargs=qargs 94 | self.gpus=query_gpu(qargs) 95 | for gpu in self.gpus: 96 | gpu['specified']=False 97 | self.gpu_num=len(self.gpus) 98 | 99 | def _sort_by_memory(self,gpus,by_size=False): 100 | if by_size: 101 | print('Sorted by free memory size') 102 | return sorted(gpus,key=lambda d:d['memory.free'],reverse=True) 103 | else: 104 | print('Sorted by free memory rate') 105 | return sorted(gpus,key=lambda d:float(d['memory.free'])/ d['memory.total'],reverse=True) 106 | 107 | def _sort_by_power(self,gpus): 108 | return sorted(gpus,key=by_power) 109 | 110 | def _sort_by_custom(self,gpus,key,reverse=False,qargs=[]): 111 | if isinstance(key,str) and (key in qargs): 112 | return sorted(gpus,key=lambda d:d[key],reverse=reverse) 113 | if isinstance(key,type(lambda a:a)): 114 | return sorted(gpus,key=key,reverse=reverse) 115 | raise ValueError("The argument 'key' must be a function or a key in query args,please read the documention of nvidia-smi") 116 | 117 | def auto_choice(self,mode=0): 118 | ''' 119 | mode: 120 | 0:(default)sorted by free memory size 121 | return: 122 | a TF device object 123 | ''' 124 | def check_free_gpu(unspecified_gpus): 125 | FLAG = False 126 | for gpu_dict in unspecified_gpus: 127 | if gpu_dict['memory.free'] >= 18: 128 | FLAG = True 129 | break 130 | return FLAG 131 | 132 | st_time = time.time() 133 | def wait(): 134 | print("waiting for free gpu ...") 135 | seconds = int(time.time() - st_time) 136 | print("Have waited for {}".format(str(datetime.timedelta(seconds=seconds)))) 137 | 138 | for old_infos,new_infos in zip(self.gpus,query_gpu(self.qargs)): 139 | old_infos.update(new_infos) 140 | unspecified_gpus=[gpu for gpu in self.gpus if not gpu['specified']] or self.gpus 141 | 142 | if mode==0: 143 | scheduler = sched.scheduler(time.time, time.sleep) 144 | while(True): 145 | if check_free_gpu(unspecified_gpus): 146 | break 147 | scheduler.enter(10, 1, wait) 148 | scheduler.run() 149 | print('Choosing the GPU device has largest free memory...') 150 | tmp = self._sort_by_memory(unspecified_gpus,True) 151 | # for ele in tmp: 152 | # print('ele is : {}'.format(ele)) 153 | chosen_gpu=self._sort_by_memory(unspecified_gpus,True)[0] 154 | elif mode==1: 155 | print('Choosing the GPU device has highest free memory rate...') 156 | chosen_gpu=self._sort_by_power(unspecified_gpus)[0] 157 | elif mode==2: 158 | print('Choosing the GPU device by power...') 159 | chosen_gpu=self._sort_by_power(unspecified_gpus)[0] 160 | else: 161 | print('Given an unaviliable mode,will be chosen by memory') 162 | chosen_gpu=self._sort_by_memory(unspecified_gpus)[0] 163 | chosen_gpu['specified']=True 164 | index=chosen_gpu['index'] 165 | print('Using GPU {i}:\n{info}'.format(i=index,info='\n'.join([str(k)+':'+str(v) for k,v in chosen_gpu.items()]))) 166 | return index 167 | # else: 168 | # raise ImportError('GPU available check failed') 169 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/run.sh: -------------------------------------------------------------------------------- 1 | BERT_BASE_DIR=/mnt/nfs-storage-titan/BERT/uncased_L-12_H-768_A-12 2 | BASE_DIR=/mnt/nfs-storage-titan/BERT/pytorch_pretrained_BERT 3 | DATA_DIR=$BASE_DIR/ACOS-main/Extract-Classify-ACOS 4 | TASK_NAME=quad 5 | MODEL=quad 6 | DOMAIN=rest16 7 | 8 | echo 'DOMAIN is chosen from [rest16, laptop]' 9 | python run_step1.py \ 10 | --task_name $TASK_NAME \ 11 | --do_train \ 12 | --do_eval \ 13 | --domain_type $DOMAIN \ 14 | --model_type $MODEL\ 15 | --do_lower_case \ 16 | --data_dir $DATA_DIR \ 17 | --bert_model $BERT_BASE_DIR\ 18 | --max_seq_length 128 \ 19 | --train_batch_size 24 \ 20 | --learning_rate 2e-5 \ 21 | --num_train_epochs 30 \ 22 | --output_dir $BASE_DIR/output/Extract-Classify-QUAD/${DOMAIN}_1st/ 23 | 24 | 25 | python tokenized_data/get_1st_pairs.py $BASE_DIR $DOMAIN 26 | 27 | TASK_NAME=categorysenti 28 | MODEL=categorysenti 29 | 30 | python run_step2.py \ 31 | --task_name $TASK_NAME \ 32 | --do_train \ 33 | --do_eval \ 34 | --domain_type $DOMAIN \ 35 | --model_type $MODEL\ 36 | --do_lower_case \ 37 | --data_dir $DATA_DIR \ 38 | --bert_model $BERT_BASE_DIR\ 39 | --max_seq_length 128 \ 40 | --train_batch_size 16 \ 41 | --learning_rate 5e-5 \ 42 | --num_train_epochs 30 \ 43 | --output_dir $BASE_DIR/output/Extract-Classify-QUAD/${DOMAIN}_2nd 44 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/run_classifier_dataset_utils.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team. 3 | # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """ BERT classification fine-tuning: utilities to work with GLUE tasks """ 17 | 18 | from __future__ import absolute_import, division, print_function 19 | 20 | import csv 21 | import logging 22 | import numpy as np 23 | import matplotlib.pyplot as plt 24 | import seaborn as sns 25 | import os 26 | import copy 27 | import pdb 28 | import sys 29 | 30 | from scipy.stats import pearsonr, spearmanr 31 | from sklearn.metrics import matthews_corrcoef, f1_score, hamming_loss, precision_score, recall_score 32 | 33 | logger = logging.getLogger(__name__) 34 | 35 | 36 | class InputExample(object): 37 | """A single training/test example for simple sequence classification.""" 38 | 39 | def __init__(self, guid, text_a, text_cate=None, text_senti=None, label=None): 40 | """Constructs a InputExample. 41 | 42 | Args: 43 | guid: Unique id for the example. 44 | text_a: string. The untokenized text of the first sequence. For single 45 | sequence tasks, only this sequence must be specified. 46 | text_b: (Optional) string. The untokenized text of the second sequence. 47 | Only must be specified for sequence pair tasks. 48 | label: (Optional) string. The label of the example. This should be 49 | specified for train and dev examples, but not for test examples. 50 | """ 51 | self.guid = guid 52 | self.text_a = text_a 53 | self.label = label 54 | 55 | class InputFeatures(object): 56 | """A single set of features of data.""" 57 | 58 | def __init__(self, tokens_len, aspect_input_ids, aspect_input_mask, aspect_ids, aspect_segment_ids, aspect_labels, 59 | exist_imp_aspect, exist_imp_opinion): 60 | self.tokens_len = tokens_len 61 | self.aspect_input_ids=aspect_input_ids 62 | self.aspect_input_mask=aspect_input_mask 63 | self.aspect_ids=aspect_ids 64 | self.aspect_segment_ids=aspect_segment_ids 65 | self.aspect_labels=aspect_labels 66 | self.exist_imp_aspect=exist_imp_aspect 67 | self.exist_imp_opinion=exist_imp_opinion 68 | 69 | class InputExample2nd(object): 70 | """A single training/test example for simple sequence classification.""" 71 | 72 | def __init__(self, guid, text_a, text_b=None, label=None): 73 | """Constructs a InputExample. 74 | 75 | Args: 76 | guid: Unique id for the example. 77 | text_a: string. The untokenized text of the first sequence. For single 78 | sequence tasks, only this sequence must be specified. 79 | text_b: (Optional) string. The untokenized text of the second sequence. 80 | Only must be specified for sequence pair tasks. 81 | label: (Optional) string. The label of the example. This should be 82 | specified for train and dev examples, but not for test examples. 83 | """ 84 | self.guid = guid 85 | self.text_a = text_a 86 | self.text_b = text_b 87 | self.label = label 88 | 89 | 90 | class InputFeatures2nd(object): 91 | """A single set of features of data.""" 92 | 93 | def __init__(self, tokens_len, aspect_tokens, aspect_input_ids, aspect_input_mask, aspect_segment_ids, 94 | candidate_aspect, candidate_opinion, label_id): 95 | 96 | self.tokens_len=tokens_len 97 | self.aspect_tokens=aspect_tokens 98 | self.aspect_input_ids=aspect_input_ids 99 | self.aspect_input_mask=aspect_input_mask 100 | self.aspect_segment_ids=aspect_segment_ids 101 | 102 | self.candidate_aspect=candidate_aspect 103 | self.candidate_opinion=candidate_opinion 104 | self.label_id=label_id 105 | 106 | class DataProcessor(object): 107 | """Base class for data converters for sequence classification data sets.""" 108 | 109 | def get_train_examples(self, data_dir): 110 | """Gets a collection of `InputExample`s for the train set.""" 111 | raise NotImplementedError() 112 | 113 | def get_dev_examples(self, data_dir): 114 | """Gets a collection of `InputExample`s for the dev set.""" 115 | raise NotImplementedError() 116 | 117 | def get_labels(self): 118 | """Gets the list of labels for this data set.""" 119 | raise NotImplementedError() 120 | 121 | @classmethod 122 | def _read_tsv(cls, input_file, quotechar=None): 123 | """Reads a tab separated value file.""" 124 | with open(input_file, "r", encoding="utf-8") as f: 125 | reader = csv.reader(f, delimiter="\t", quotechar=quotechar) 126 | lines = [] 127 | for line in reader: 128 | if sys.version_info[0] == 2: 129 | line = list(unicode(cell, 'utf-8') for cell in line) 130 | lines.append(line) 131 | return lines 132 | 133 | 134 | class QuadProcessor(DataProcessor): 135 | """Processor for the MRPC data set (GLUE version).""" 136 | 137 | def get_train_examples(self, data_dir, domain_type): 138 | """See base class.""" 139 | string = domain_type 140 | logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_train_quad_bert.tsv"))) 141 | return self._create_examples( 142 | self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_train_quad_bert.tsv")), "train") 143 | 144 | def get_valid_examples(self, data_dir, domain_type): 145 | """See base class.""" 146 | string = domain_type 147 | logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_dev_quad_bert.tsv"))) 148 | return self._create_examples( 149 | self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_dev_quad_bert.tsv")), "valid") 150 | 151 | def get_dev_examples(self, data_dir, domain_type): 152 | """See base class.""" 153 | string = domain_type 154 | logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_test_quad_bert.tsv"))) 155 | return self._create_examples( 156 | self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_test_quad_bert.tsv")), "test") 157 | 158 | def get_labels(self, domain_type): 159 | """See base class.""" 160 | 161 | sentiment = ['negative', 'neutral', 'positive'] 162 | # seqlabs = ['O', 'I'] 163 | # 'P' means PAD, 'M' means IMP. 164 | seqlabs = ['[CLS]', 'O', 'I-A', 'B-A', 'I-O', 'B-O'] 165 | # seqlabs = ['O', 'I-A', 'B-A', 'M-A', 'I-O', 'B-O', 'M-O'] 166 | label_list = [] 167 | 168 | label_list.append(sentiment) 169 | label_list.append(seqlabs) 170 | return label_list 171 | 172 | def _create_examples(self, lines, set_type): 173 | """Creates examples for the training and dev sets.""" 174 | examples = [] 175 | for (i, line) in enumerate(lines): 176 | guid = "%s-%s" % (set_type, i) 177 | try: 178 | text_a = line[0] 179 | except: 180 | pdb.set_trace() 181 | labels = line[1:] 182 | examples.append( 183 | InputExample(guid=guid, text_a=text_a, label=labels)) 184 | return examples 185 | 186 | 187 | class CategorySentiProcessor(DataProcessor): 188 | """Processor for the MRPC data set (GLUE version).""" 189 | 190 | def get_train_examples(self, data_dir, domain_type): 191 | """See base class.""" 192 | string = domain_type 193 | logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_train_pair.tsv"))) 194 | return self._create_examples( 195 | self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_train_pair.tsv")), "train") 196 | 197 | def get_valid_examples(self, data_dir, domain_type): 198 | """See base class.""" 199 | string = domain_type 200 | logger.info("LOOKING AT {}".format(os.path.join(data_dir, "tokenized_data/"+string+"_dev_pair.tsv"))) 201 | return self._create_examples( 202 | self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_dev_pair.tsv")), "valid") 203 | 204 | def get_dev_examples(self, data_dir, domain_type): 205 | """See base class.""" 206 | string = domain_type 207 | return self._create_examples( 208 | self._read_tsv(os.path.join(data_dir, "tokenized_data/"+string+"_test_pair_1st.tsv")), "test") 209 | 210 | def get_labels(self, domain_type): 211 | """See base class.""" 212 | l = None 213 | sentiment = None 214 | if domain_type.startswith('rest'): 215 | l = ['RESTAURANT#GENERAL', 'SERVICE#GENERAL', 'FOOD#GENERAL', 'FOOD#QUALITY', 'FOOD#STYLE_OPTIONS', 'DRINKS#STYLE_OPTIONS', 'DRINKS#PRICES', 216 | 'AMBIENCE#GENERAL', 'RESTAURANT#PRICES', 'FOOD#PRICES', 'RESTAURANT#MISCELLANEOUS', 'DRINKS#QUALITY', 'LOCATION#GENERAL'] 217 | elif domain_type == 'laptop': 218 | l = ['MULTIMEDIA_DEVICES#PRICE', 'OS#QUALITY', 'SHIPPING#QUALITY', 'GRAPHICS#OPERATION_PERFORMANCE', 'CPU#OPERATION_PERFORMANCE', 219 | 'COMPANY#DESIGN_FEATURES', 'MEMORY#OPERATION_PERFORMANCE', 'SHIPPING#PRICE', 'POWER_SUPPLY#CONNECTIVITY', 'SOFTWARE#USABILITY', 220 | 'FANS&COOLING#GENERAL', 'GRAPHICS#DESIGN_FEATURES', 'BATTERY#GENERAL', 'HARD_DISC#USABILITY', 'FANS&COOLING#DESIGN_FEATURES', 221 | 'MEMORY#DESIGN_FEATURES', 'MOUSE#USABILITY', 'CPU#GENERAL', 'LAPTOP#QUALITY', 'POWER_SUPPLY#GENERAL', 'PORTS#QUALITY', 222 | 'KEYBOARD#PORTABILITY', 'SUPPORT#DESIGN_FEATURES', 'MULTIMEDIA_DEVICES#USABILITY', 'MOUSE#GENERAL', 'KEYBOARD#MISCELLANEOUS', 223 | 'MULTIMEDIA_DEVICES#DESIGN_FEATURES', 'OS#MISCELLANEOUS', 'LAPTOP#MISCELLANEOUS', 'SOFTWARE#PRICE', 'FANS&COOLING#OPERATION_PERFORMANCE', 224 | 'MEMORY#QUALITY', 'OPTICAL_DRIVES#OPERATION_PERFORMANCE', 'HARD_DISC#GENERAL', 'MEMORY#GENERAL', 'DISPLAY#OPERATION_PERFORMANCE', 225 | 'MULTIMEDIA_DEVICES#GENERAL', 'LAPTOP#GENERAL', 'MOTHERBOARD#QUALITY', 'LAPTOP#PORTABILITY', 'KEYBOARD#PRICE', 'SUPPORT#OPERATION_PERFORMANCE', 226 | 'GRAPHICS#GENERAL', 'MOTHERBOARD#OPERATION_PERFORMANCE', 'DISPLAY#GENERAL', 'BATTERY#QUALITY', 'LAPTOP#USABILITY', 'LAPTOP#DESIGN_FEATURES', 227 | 'PORTS#CONNECTIVITY', 'HARDWARE#QUALITY', 'SUPPORT#GENERAL', 'MOTHERBOARD#GENERAL', 'PORTS#USABILITY', 'KEYBOARD#QUALITY', 'GRAPHICS#USABILITY', 228 | 'HARD_DISC#PRICE', 'OPTICAL_DRIVES#USABILITY', 'MULTIMEDIA_DEVICES#CONNECTIVITY', 'HARDWARE#DESIGN_FEATURES', 'MEMORY#USABILITY', 229 | 'SHIPPING#GENERAL', 'CPU#PRICE', 'Out_Of_Scope#DESIGN_FEATURES', 'MULTIMEDIA_DEVICES#QUALITY', 'OS#PRICE', 'SUPPORT#QUALITY', 230 | 'OPTICAL_DRIVES#GENERAL', 'HARDWARE#USABILITY', 'DISPLAY#DESIGN_FEATURES', 'PORTS#GENERAL', 'COMPANY#OPERATION_PERFORMANCE', 231 | 'COMPANY#GENERAL', 'Out_Of_Scope#GENERAL', 'KEYBOARD#DESIGN_FEATURES', 'Out_Of_Scope#OPERATION_PERFORMANCE', 232 | 'OPTICAL_DRIVES#DESIGN_FEATURES', 'LAPTOP#OPERATION_PERFORMANCE', 'KEYBOARD#USABILITY', 'DISPLAY#USABILITY', 'POWER_SUPPLY#QUALITY', 233 | 'HARD_DISC#DESIGN_FEATURES', 'DISPLAY#QUALITY', 'MOUSE#DESIGN_FEATURES', 'COMPANY#QUALITY', 'HARDWARE#GENERAL', 'COMPANY#PRICE', 234 | 'MULTIMEDIA_DEVICES#OPERATION_PERFORMANCE', 'KEYBOARD#OPERATION_PERFORMANCE', 'SOFTWARE#PORTABILITY', 'HARD_DISC#OPERATION_PERFORMANCE', 235 | 'BATTERY#DESIGN_FEATURES', 'CPU#QUALITY', 'WARRANTY#GENERAL', 'OS#DESIGN_FEATURES', 'OS#OPERATION_PERFORMANCE', 'OS#USABILITY', 236 | 'SOFTWARE#GENERAL', 'SUPPORT#PRICE', 'SHIPPING#OPERATION_PERFORMANCE', 'DISPLAY#PRICE', 'LAPTOP#PRICE', 'OS#GENERAL', 'HARDWARE#PRICE', 237 | 'SOFTWARE#DESIGN_FEATURES', 'HARD_DISC#MISCELLANEOUS', 'PORTS#PORTABILITY', 'FANS&COOLING#QUALITY', 'BATTERY#OPERATION_PERFORMANCE', 238 | 'CPU#DESIGN_FEATURES', 'PORTS#OPERATION_PERFORMANCE', 'SOFTWARE#OPERATION_PERFORMANCE', 'KEYBOARD#GENERAL', 'SOFTWARE#QUALITY', 239 | 'LAPTOP#CONNECTIVITY', 'POWER_SUPPLY#DESIGN_FEATURES', 'HARDWARE#OPERATION_PERFORMANCE', 'WARRANTY#QUALITY', 'HARD_DISC#QUALITY', 240 | 'POWER_SUPPLY#OPERATION_PERFORMANCE', 'PORTS#DESIGN_FEATURES', 'Out_Of_Scope#USABILITY'] 241 | sentiment = ['0', '1', '2'] 242 | label_list = [] 243 | # label_list.append(l) 244 | # label_list.append(sentiment) 245 | cate_senti = [] 246 | for cate in l: 247 | for senti in sentiment: 248 | cate_senti.append(cate+'#'+senti) 249 | label_list.append(cate_senti) 250 | return label_list 251 | 252 | def _create_examples(self, lines, set_type): 253 | """Creates examples for the training and dev sets.""" 254 | examples = [] 255 | for (i, line) in enumerate(lines): 256 | guid = "%s-%s" % (set_type, i) 257 | text_a = line[0] 258 | labels = line[1:] 259 | examples.append( 260 | InputExample2nd(guid=guid, text_a=text_a, text_b=None, label=labels)) 261 | return examples 262 | 263 | 264 | def convert_examples_to_features(examples, label_list, max_seq_length, 265 | tokenizer, output_mode, task_name): 266 | """Loads a data file into a list of `InputBatch`s.""" 267 | 268 | label_map_senti = {label : i for i, label in enumerate(label_list[0])} 269 | label_map_seq = {label : i for i, label in enumerate(label_list[1])} 270 | 271 | features = [] 272 | 273 | for (ex_index, example) in enumerate(examples): 274 | # pdb.set_trace() 275 | if ex_index % 10000 == 0: 276 | logger.info("Writing example %d of %d" % (ex_index, len(examples))) 277 | 278 | orig_tokens = example.text_a.strip().split() 279 | labels = example.label 280 | 281 | exist_imp_aspect = 0 282 | exist_imp_opinion = 0 283 | 284 | bert_tokens_a = orig_tokens 285 | 286 | aspect_labels = ['O' for ele in range(len(orig_tokens))] 287 | for quad in labels: 288 | cur_aspect = quad.split(' ')[0]; cur_opinion = quad.split(' ')[-1] 289 | a_st = int(cur_aspect.split(',')[0]); a_ed = int(cur_aspect.split(',')[1]) 290 | if a_ed != -1: 291 | aspect_labels[a_st] = 'B-A' 292 | for i in range(a_st+1, a_ed): 293 | aspect_labels[i] = 'I-A' 294 | else: 295 | exist_imp_aspect = 1 296 | o_st = int(cur_opinion.split(',')[0]); o_ed = int(cur_opinion.split(',')[1]) 297 | if o_ed != -1: 298 | aspect_labels[o_st] = 'B-O' 299 | for i in range(o_st+1, o_ed): 300 | aspect_labels[i] = 'I-O' 301 | else: 302 | exist_imp_opinion = 1 303 | 304 | _truncate_seq_pair(bert_tokens_a, aspect_labels, max_seq_length - 2) 305 | 306 | aspect_ids = [] 307 | 308 | aspect_tokens = [] 309 | aspect_segment_ids = [] 310 | 311 | aspect_tokens.append("[CLS]") 312 | aspect_ids.append(label_map_seq['[CLS]']) 313 | aspect_segment_ids.append(0) 314 | 315 | for i, token in enumerate(bert_tokens_a): 316 | aspect_tokens.append(token) 317 | aspect_ids.append(label_map_seq[aspect_labels[i]]) 318 | aspect_segment_ids.append(0) 319 | 320 | aspect_tokens.append("[CLS]") 321 | tokens_len = len(aspect_tokens) 322 | 323 | aspect_ids.append(label_map_seq['[CLS]']) 324 | aspect_segment_ids.append(0) 325 | 326 | aspect_input_ids = tokenizer.convert_tokens_to_ids(aspect_tokens) 327 | # The mask has 1 for real tokens and 0 for padding tokens. Only real 328 | # tokens are attended to. 329 | aspect_input_mask = [1] * len(aspect_input_ids) 330 | # if example.text_a.startswith('it has all the features that we'): 331 | # pdb.set_trace() 332 | 333 | # Zero-pad up to the sequence length. 334 | while len(aspect_input_ids) < max_seq_length: 335 | aspect_input_ids.append(0) 336 | aspect_input_mask.append(0) 337 | aspect_ids.append(label_map_seq["O"]) 338 | aspect_segment_ids.append(0) 339 | 340 | assert len(aspect_input_ids) == max_seq_length 341 | assert len(aspect_input_mask) == max_seq_length 342 | assert len(aspect_ids) == max_seq_length 343 | assert len(aspect_segment_ids) == max_seq_length 344 | 345 | if ex_index < 5: 346 | logger.info("*** Example ***") 347 | logger.info("guid: %s" % (example.guid)) 348 | logger.info("tokens_len: %s" % (tokens_len)) 349 | logger.info("guid: %s" % (exist_imp_aspect)) 350 | logger.info("guid: %s" % (exist_imp_opinion)) 351 | 352 | logger.info("aspect tokens: %s" % " ".join( 353 | [str(x) for x in aspect_tokens])) 354 | logger.info("aspect_input_ids: %s" % " ".join([str(x) for x in aspect_input_ids])) 355 | logger.info("aspect_input_mask: %s" % " ".join([str(x) for x in aspect_input_mask])) 356 | logger.info("aspect_ids: %s" % " ".join([str(x) for x in aspect_ids])) 357 | logger.info( 358 | "aspect_segment_ids: %s" % " ".join([str(x) for x in aspect_segment_ids])) 359 | 360 | features.append( 361 | InputFeatures(tokens_len, 362 | aspect_input_ids=aspect_input_ids, 363 | aspect_input_mask=aspect_input_mask, 364 | aspect_ids=aspect_ids, 365 | aspect_segment_ids=aspect_segment_ids, 366 | aspect_labels=aspect_labels, 367 | exist_imp_aspect=exist_imp_aspect, 368 | exist_imp_opinion=exist_imp_opinion)) 369 | return features 370 | 371 | 372 | def _truncate_seq_pair(bert_tokens_a, aspect_labels, max_length): 373 | """Truncates a sequence pair in place to the maximum length.""" 374 | 375 | # This is a simple heuristic which will always truncate the longer sequence 376 | # one token at a time. This makes more sense than truncating an equal percent 377 | # of tokens from each, since if one sequence is very short then each token 378 | # that's truncated likely contains more information than a longer sequence. 379 | while True: 380 | total_length = len(bert_tokens_a) 381 | if total_length <= max_length: 382 | break 383 | bert_tokens_a.pop() 384 | aspect_labels.pop() 385 | 386 | 387 | def convert_examples_to_features2nd(examples, label_list, max_seq_length, 388 | tokenizer, output_mode): 389 | """Loads a data file into a list of `InputBatch`s.""" 390 | 391 | category_senti_map = {label : i for i, label in enumerate(label_list[0])} 392 | 393 | features = [] 394 | 395 | for (ex_index, example) in enumerate(examples): 396 | # pdb.set_trace() 397 | if ex_index % 10000 == 0: 398 | logger.info("Writing example %d of %d" % (ex_index, len(examples))) 399 | 400 | orig_tokens, ao_tags = example.text_a.strip().split('####') 401 | # label for examples with negative samples 402 | # labels = example.label[:-1] 403 | orig_tokens = orig_tokens.split() 404 | labels = example.label 405 | 406 | bert_tokens_a = orig_tokens 407 | bert_tokens_b = None 408 | 409 | _truncate_seq_pair2nd(bert_tokens_a, max_seq_length - 2) 410 | 411 | aspect_tokens = [] 412 | aspect_segment_ids = [] 413 | 414 | aspect_tokens.append("[CLS]") 415 | aspect_segment_ids.append(0) 416 | 417 | for i, token in enumerate(bert_tokens_a): 418 | aspect_tokens.append(token) 419 | aspect_segment_ids.append(0) 420 | aspect_tokens.append("[CLS]") 421 | tokens_len = len(aspect_tokens) 422 | aspect_segment_ids.append(0) 423 | 424 | aspect_input_ids = tokenizer.convert_tokens_to_ids(aspect_tokens) 425 | # The mask has 1 for real tokens and 0 for padding tokens. Only real 426 | # tokens are attended to. 427 | aspect_input_mask = [1] * len(aspect_input_ids) 428 | imp_opinion_pos = len(aspect_input_ids) 429 | # if example.text_a.startswith('it has all the features that we'): 430 | # pdb.set_trace() 431 | 432 | # Zero-pad up to the sequence length. 433 | while len(aspect_input_ids) < max_seq_length: 434 | aspect_input_ids.append(0) 435 | aspect_input_mask.append(0) 436 | aspect_segment_ids.append(0) 437 | 438 | assert len(aspect_input_ids) == max_seq_length 439 | assert len(aspect_input_mask) == max_seq_length 440 | assert len(aspect_segment_ids) == max_seq_length 441 | 442 | # get candidate aspect and opinion 443 | label_id = [0] * len(label_list[0]) 444 | candidate_aspect = [0 for i in range(max_seq_length)] 445 | candidate_opinion = [0 for i in range(max_seq_length)] 446 | cur_aspect = ao_tags.split()[0]; cur_opinion = ao_tags.split()[1] 447 | a_st = int(cur_aspect.split(',')[0]); a_ed = int(cur_aspect.split(',')[1]) 448 | o_st = int(cur_opinion.split(',')[0]); o_ed = int(cur_opinion.split(',')[1]) 449 | if a_st == -1: 450 | a_ed = 0 451 | if o_st == -1: 452 | o_st = imp_opinion_pos - 2; o_ed = imp_opinion_pos - 1 453 | for i in range(a_st+1, a_ed+1): 454 | candidate_aspect[i] = 1 455 | for i in range(o_st+1, o_ed+1): 456 | candidate_opinion[i] = 1 457 | if len(labels) > 0: 458 | for ele in labels[0].split(): 459 | label_id[category_senti_map[ele]] = 1 460 | 461 | if ex_index < 5: 462 | logger.info("*** Example ***") 463 | logger.info("guid: %s" % (example.guid)) 464 | logger.info("tokens_len: %s" % (tokens_len)) 465 | 466 | logger.info("aspect tokens: %s" % " ".join( 467 | [str(x) for x in aspect_tokens])) 468 | logger.info("aspect_input_ids: %s" % " ".join([str(x) for x in aspect_input_ids])) 469 | logger.info("aspect_input_mask: %s" % " ".join([str(x) for x in aspect_input_mask])) 470 | logger.info( 471 | "aspect_segment_ids: %s" % " ".join([str(x) for x in aspect_segment_ids])) 472 | logger.info( 473 | "candidate_aspect: %s" % " ".join([str(x) for x in candidate_aspect])) 474 | logger.info( 475 | "candidate_opinion: %s" % " ".join([str(x) for x in candidate_opinion])) 476 | logger.info( 477 | "label_id: %s" % " ".join([str(x) for x in label_id])) 478 | 479 | features.append( 480 | InputFeatures2nd( 481 | tokens_len=tokens_len, 482 | aspect_tokens=aspect_tokens, 483 | aspect_input_ids=aspect_input_ids, 484 | aspect_input_mask=aspect_input_mask, 485 | aspect_segment_ids=aspect_segment_ids, 486 | 487 | candidate_aspect=candidate_aspect, 488 | candidate_opinion=candidate_opinion, 489 | label_id=label_id, 490 | )) 491 | return features 492 | 493 | 494 | def _truncate_seq_pair2nd(tokens_a, max_length): 495 | """Truncates a sequence pair in place to the maximum length.""" 496 | 497 | # This is a simple heuristic which will always truncate the longer sequence 498 | # one token at a time. This makes more sense than truncating an equal percent 499 | # of tokens from each, since if one sequence is very short then each token 500 | # that's truncated likely contains more information than a longer sequence. 501 | while True: 502 | total_length = len(tokens_a) 503 | if total_length <= max_length: 504 | break 505 | tokens_a.pop() 506 | 507 | 508 | def simple_accuracy(preds, labels): 509 | return (preds == labels).mean() 510 | 511 | 512 | def acc_and_f1(preds, labels): 513 | acc = simple_accuracy(preds, labels) 514 | precision = precision_score(labels, preds, average='micro') 515 | recall = recall_score(labels, preds, average='micro') 516 | f1 = f1_score(y_true=labels, y_pred=preds, average='micro') 517 | macro = f1_score(y_true=labels, y_pred=preds, average='macro') 518 | hamming = hamming_loss(y_true=labels, y_pred=preds) 519 | return { 520 | "acc": acc, 521 | "precision": precision, 522 | "recall": recall, 523 | "micro-f1": f1, 524 | "macro-f1": macro, 525 | "hamming_loss":hamming, 526 | "acc_and_f1": (acc + f1) / 2, 527 | } 528 | 529 | 530 | def pearson_and_spearman(preds, labels): 531 | pearson_corr = pearsonr(preds, labels)[0] 532 | spearman_corr = spearmanr(preds, labels)[0] 533 | return { 534 | "pearson": pearson_corr, 535 | "spearmanr": spearman_corr, 536 | "corr": (pearson_corr + spearman_corr) / 2, 537 | } 538 | 539 | 540 | def compute_metrics(task_name, preds, labels): 541 | assert len(preds) == len(labels) 542 | return acc_and_f1(preds, labels) 543 | 544 | processors = { 545 | "quad": QuadProcessor, 546 | "categorysenti": CategorySentiProcessor, 547 | } 548 | 549 | output_modes = { 550 | "quad": "classification", 551 | "categorysenti": "classification", 552 | } 553 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/run_step1.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | # Copyright 2018 The Google AI Language Team Authors and The HuggingFace Inc. team. 3 | # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. 4 | # 5 | # Licensed under the Apache License, Version 2.0 (the "License"); 6 | # you may not use this file except in compliance with the License. 7 | # You may obtain a copy of the License at 8 | # 9 | # http://www.apache.org/licenses/LICENSE-2.0 10 | # 11 | # Unless required by applicable law or agreed to in writing, software 12 | # distributed under the License is distributed on an "AS IS" BASIS, 13 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 14 | # See the License for the specific language governing permissions and 15 | # limitations under the License. 16 | """BERT finetuning runner.""" 17 | 18 | from __future__ import absolute_import, division, print_function 19 | 20 | import argparse 21 | import logging 22 | import os 23 | import sys 24 | import random 25 | from tqdm import tqdm, trange 26 | import pdb 27 | from collections import defaultdict, namedtuple 28 | from manager import * 29 | import math 30 | import codecs as cs 31 | from sklearn.model_selection import KFold 32 | 33 | gm = GPUManager() 34 | device = gm.auto_choice(mode=0) 35 | os.environ["CUDA_VISIBLE_DEVICES"] = str(device) 36 | 37 | import numpy as np 38 | 39 | import torch 40 | from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler, 41 | TensorDataset) 42 | from torch.utils.data.distributed import DistributedSampler 43 | from torch.nn import CrossEntropyLoss, MSELoss, MultiLabelSoftMarginLoss, BCEWithLogitsLoss 44 | 45 | from modeling import BertForQuadABSA 46 | from bert_utils.tokenization import BertTokenizer 47 | from bert_utils.optimization import BertAdam, WarmupLinearSchedule 48 | 49 | from run_classifier_dataset_utils import * 50 | from eval_metrics import * 51 | import gc 52 | 53 | if sys.version_info[0] == 2: 54 | import cPickle as pickle 55 | else: 56 | import pickle 57 | 58 | CONFIG_NAME = "config.json" 59 | WEIGHTS_NAME = "pytorch_model.bin" 60 | 61 | logger = logging.getLogger(__name__) 62 | 63 | def main(): 64 | parser = argparse.ArgumentParser() 65 | 66 | ## Required parameters 67 | parser.add_argument("--data_dir", 68 | default=None, 69 | type=str, 70 | required=True, 71 | help="The input source data dir. Should contain the .tsv files (or other data files) for the task.") 72 | parser.add_argument("--bert_model", default=None, type=str, required=True, 73 | help="Bert pre-trained model selected in the list: bert-base-uncased, " 74 | "bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, " 75 | "bert-base-multilingual-cased, bert-base-chinese.") 76 | parser.add_argument("--task_name", 77 | default=None, 78 | type=str, 79 | required=True, 80 | help="The name of the task to train.") 81 | parser.add_argument("--output_dir", 82 | default=None, 83 | type=str, 84 | required=True, 85 | help="The output directory where the model predictions and checkpoints will be written.") 86 | 87 | parser.add_argument("--domain_type", 88 | default=None, 89 | type=str, 90 | required=True, 91 | help="domain to choose.") 92 | 93 | parser.add_argument("--model_type", 94 | default=None, 95 | type=str, 96 | required=True, 97 | help="model to choose.") 98 | 99 | ## Other parameters 100 | parser.add_argument("--cache_dir", 101 | default="", 102 | type=str, 103 | help="Where do you want to store the pre-trained models downloaded from s3") 104 | parser.add_argument("--max_seq_length", 105 | default=128, 106 | type=int, 107 | help="The maximum total input sequence length after WordPiece tokenization. \n" 108 | "Sequences longer than this will be truncated, and sequences shorter \n" 109 | "than this will be padded.") 110 | parser.add_argument("--do_train", 111 | action='store_true', 112 | help="Whether to run training.") 113 | parser.add_argument("--do_eval", 114 | action='store_true', 115 | help="Whether to run eval on the dev set.") 116 | parser.add_argument("--do_lower_case", 117 | action='store_true', 118 | help="Set this flag if you are using an uncased model.") 119 | parser.add_argument("--train_batch_size", 120 | default=32, 121 | type=int, 122 | help="Total batch size for training.") 123 | parser.add_argument("--eval_batch_size", 124 | default=8, 125 | type=int, 126 | help="Total batch size for eval.") 127 | parser.add_argument("--learning_rate", 128 | default=5e-5, 129 | type=float, 130 | help="The initial learning rate for Adam.") 131 | parser.add_argument("--num_train_epochs", 132 | default=3.0, 133 | type=float, 134 | help="Total number of training epochs to perform.") 135 | parser.add_argument("--warmup_proportion", 136 | default=0.1, 137 | type=float, 138 | help="Proportion of training to perform linear learning rate warmup for. " 139 | "E.g., 0.1 = 10%% of training.") 140 | parser.add_argument("--no_cuda", 141 | action='store_true', 142 | help="Whether not to use CUDA when available") 143 | parser.add_argument('--overwrite_output_dir', 144 | action='store_true', 145 | help="Overwrite the content of the output directory") 146 | parser.add_argument("--local_rank", 147 | type=int, 148 | default=-1, 149 | help="local_rank for distributed training on gpus") 150 | parser.add_argument('--seed', 151 | type=int, 152 | default=42, 153 | help="random seed for initialization") 154 | parser.add_argument('--gradient_accumulation_steps', 155 | type=int, 156 | default=1, 157 | help="Number of updates steps to accumulate before performing a backward/update pass.") 158 | parser.add_argument('--fp16', 159 | action='store_true', 160 | help="Whether to use 16-bit float precision instead of 32-bit") 161 | parser.add_argument('--loss_scale', 162 | type=float, default=0, 163 | help="Loss scaling to improve fp16 numeric stability. Only used when fp16 set to True.\n" 164 | "0 (default value): dynamic loss scaling.\n" 165 | "Positive power of 2: static loss scaling value.\n") 166 | args = parser.parse_args() 167 | 168 | if args.local_rank == -1 or args.no_cuda: 169 | device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu") 170 | n_gpu = torch.cuda.device_count() 171 | else: 172 | torch.cuda.set_device(args.local_rank) 173 | device = torch.device("cuda", args.local_rank) 174 | n_gpu = 1 175 | # Initializes the distributed backend which will take care of sychronizing nodes/GPUs 176 | torch.distributed.init_process_group(backend='nccl') 177 | args.device = device 178 | 179 | logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', 180 | datefmt = '%m/%d/%Y %H:%M:%S', 181 | level = logging.INFO if args.local_rank in [-1, 0] else logging.WARN) 182 | 183 | logger.info("device: {} n_gpu: {}, distributed training: {}, 16-bits training: {}".format( 184 | device, n_gpu, bool(args.local_rank != -1), args.fp16)) 185 | 186 | if args.gradient_accumulation_steps < 1: 187 | raise ValueError("Invalid gradient_accumulation_steps parameter: {}, should be >= 1".format( 188 | args.gradient_accumulation_steps)) 189 | 190 | args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps 191 | 192 | random.seed(args.seed) 193 | np.random.seed(args.seed) 194 | torch.manual_seed(args.seed) 195 | if n_gpu > 0: 196 | torch.cuda.manual_seed_all(args.seed) 197 | 198 | if not args.do_train and not args.do_eval: 199 | raise ValueError("At least one of `do_train` or `do_eval` must be True.") 200 | 201 | # if os.path.exists(args.output_dir) and os.listdir(args.output_dir) and args.do_train and not args.overwrite_output_dir: 202 | # raise ValueError("Output directory ({}) already exists and is not empty.".format(args.output_dir)) 203 | if not os.path.exists(args.output_dir): 204 | os.makedirs(args.output_dir) 205 | # print(args.output_dir) 206 | # pdb.set_trace() 207 | 208 | task_name = args.task_name.lower() 209 | 210 | if task_name not in processors: 211 | raise ValueError("Task not found: %s" % (task_name)) 212 | 213 | processor = processors[task_name]() 214 | output_mode = output_modes[task_name] 215 | 216 | label_list = processor.get_labels(args.domain_type) 217 | num_labels = len(label_list[1]) 218 | 219 | if args.local_rank not in [-1, 0]: 220 | torch.distributed.barrier() # Make sure only the first process in distributed training will download model & vocab 221 | tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case) 222 | model_dict = { 223 | 'quad': BertForQuadABSA, 224 | } 225 | 226 | label_map_senti = {label : i for i, label in enumerate(label_list[0])} 227 | label_map_seq = {label : i for i, label in enumerate(label_list[1])} 228 | 229 | global_step = 0 230 | nb_tr_steps = 0 231 | eval_gold = [] 232 | valid_gold = [] 233 | if args.do_eval: 234 | eval_examples = processor.get_dev_examples(args.data_dir, args.domain_type) 235 | f = cs.open(args.data_dir+'/tokenized_data/'+args.domain_type +'_test_quad_bert.tsv', 'r').readlines() 236 | for line in f: 237 | cur_exist_imp_aspect = 0 238 | cur_exist_imp_opinion = 0 239 | line = line.strip().split('\t') 240 | cur_text = line[0] 241 | aspect_labels = [label_map_seq['O'] for ele in range(args.max_seq_length)] 242 | 243 | for quad in line[1:]: 244 | cur_aspect = quad.split(' ')[0]; cur_opinion = quad.split(' ')[-1] 245 | a_st = int(cur_aspect.split(',')[0]); a_ed = int(cur_aspect.split(',')[1]) 246 | 247 | if a_ed != -1: 248 | aspect_labels[a_st] = label_map_seq['B-A'] 249 | for i in range(a_st+1, a_ed): 250 | aspect_labels[i] = label_map_seq['I-A'] 251 | else: 252 | cur_exist_imp_aspect = 1 253 | 254 | o_st = int(cur_opinion.split(',')[0]); o_ed = int(cur_opinion.split(',')[1]) 255 | 256 | if o_ed != -1: 257 | aspect_labels[o_st] = label_map_seq['B-O'] 258 | for i in range(o_st+1, o_ed): 259 | aspect_labels[i] = label_map_seq['I-O'] 260 | else: 261 | cur_exist_imp_opinion = 1 262 | 263 | eval_gold += [aspect_labels, cur_exist_imp_aspect, cur_exist_imp_opinion] 264 | 265 | eval_features = convert_examples_to_features( 266 | eval_examples, label_list, args.max_seq_length, tokenizer, output_mode, task_name) 267 | 268 | eval_tokens_len = torch.tensor([f.tokens_len for f in eval_features], dtype=torch.long) 269 | eval_aspect_input_ids = torch.tensor([f.aspect_input_ids for f in eval_features], dtype=torch.long) 270 | eval_aspect_input_mask = torch.tensor([f.aspect_input_mask for f in eval_features], dtype=torch.long) 271 | eval_aspect_ids = torch.tensor([f.aspect_ids for f in eval_features], dtype=torch.long) 272 | eval_aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in eval_features], dtype=torch.long) 273 | eval_exist_imp_aspect = torch.tensor([f.exist_imp_aspect for f in eval_features], dtype=torch.long) 274 | eval_exist_imp_opinion = torch.tensor([f.exist_imp_opinion for f in eval_features], dtype=torch.long) 275 | 276 | eval_gold = [eval_aspect_input_ids.numpy().tolist(), eval_gold] 277 | 278 | eval_data = TensorDataset(eval_tokens_len, eval_aspect_input_ids, eval_aspect_input_mask, eval_aspect_ids, 279 | eval_aspect_segment_ids, eval_exist_imp_aspect, eval_exist_imp_opinion) 280 | # Run prediction for full data 281 | if args.local_rank == -1: 282 | eval_sampler = SequentialSampler(eval_data) 283 | else: 284 | eval_sampler = DistributedSampler(eval_data) # Note that this sampler samples randomly 285 | eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args.eval_batch_size) 286 | 287 | if args.do_train: 288 | 289 | # Prepare data loader 290 | train_examples = processor.get_train_examples(args.data_dir, args.domain_type) 291 | 292 | train_features = convert_examples_to_features( 293 | train_examples, label_list, args.max_seq_length, tokenizer, output_mode, task_name) 294 | 295 | tokens_len = torch.tensor([f.tokens_len for f in train_features], dtype=torch.long) 296 | aspect_input_ids = torch.tensor([f.aspect_input_ids for f in train_features], dtype=torch.long) 297 | aspect_input_mask = torch.tensor([f.aspect_input_mask for f in train_features], dtype=torch.long) 298 | aspect_ids = torch.tensor([f.aspect_ids for f in train_features], dtype=torch.long) 299 | aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in train_features], dtype=torch.long) 300 | exist_imp_aspect = torch.tensor([f.exist_imp_aspect for f in train_features], dtype=torch.long) 301 | exist_imp_opinion = torch.tensor([f.exist_imp_opinion for f in train_features], dtype=torch.long) 302 | 303 | valid_examples = processor.get_valid_examples(args.data_dir, args.domain_type) 304 | f = cs.open(args.data_dir+'/tokenized_data/'+args.domain_type +'_dev_quad_bert.tsv', 'r').readlines() 305 | for line in f: 306 | cur_exist_imp_aspect = 0 307 | cur_exist_imp_opinion = 0 308 | line = line.strip().split('\t') 309 | cur_text = line[0] 310 | aspect_labels = [label_map_seq['O'] for ele in range(args.max_seq_length)] 311 | 312 | for quad in line[1:]: 313 | cur_aspect = quad.split(' ')[0]; cur_opinion = quad.split(' ')[-1] 314 | a_st = int(cur_aspect.split(',')[0]); a_ed = int(cur_aspect.split(',')[1]) 315 | 316 | if a_ed != -1: 317 | aspect_labels[a_st] = label_map_seq['B-A'] 318 | for i in range(a_st+1, a_ed): 319 | aspect_labels[i] = label_map_seq['I-A'] 320 | else: 321 | cur_exist_imp_aspect = 1 322 | 323 | o_st = int(cur_opinion.split(',')[0]); o_ed = int(cur_opinion.split(',')[1]) 324 | 325 | if o_ed != -1: 326 | aspect_labels[o_st] = label_map_seq['B-O'] 327 | for i in range(o_st+1, o_ed): 328 | aspect_labels[i] = label_map_seq['I-O'] 329 | else: 330 | cur_exist_imp_opinion = 1 331 | 332 | valid_gold += [aspect_labels, cur_exist_imp_aspect, cur_exist_imp_opinion] 333 | 334 | valid_features = convert_examples_to_features( 335 | valid_examples, label_list, args.max_seq_length, tokenizer, output_mode, task_name) 336 | 337 | valid_tokens_len = torch.tensor([f.tokens_len for f in valid_features], dtype=torch.long) 338 | valid_aspect_input_ids = torch.tensor([f.aspect_input_ids for f in valid_features], dtype=torch.long) 339 | valid_aspect_input_mask = torch.tensor([f.aspect_input_mask for f in valid_features], dtype=torch.long) 340 | valid_aspect_ids = torch.tensor([f.aspect_ids for f in valid_features], dtype=torch.long) 341 | valid_aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in valid_features], dtype=torch.long) 342 | valid_exist_imp_aspect = torch.tensor([f.exist_imp_aspect for f in valid_features], dtype=torch.long) 343 | valid_exist_imp_opinion = torch.tensor([f.exist_imp_opinion for f in valid_features], dtype=torch.long) 344 | 345 | # Prepare optimizer 346 | 347 | logger.info("***** Running training *****") 348 | logger.info(" Num examples = %d", len(train_examples)) 349 | logger.info(" Batch size = %d", args.train_batch_size) 350 | 351 | all_results = [] 352 | 353 | model = model_dict[args.model_type].from_pretrained(args.bert_model, num_labels=num_labels) 354 | if args.local_rank == 0: 355 | torch.distributed.barrier() 356 | 357 | if args.fp16: 358 | model.half() 359 | 360 | model.to(device) 361 | 362 | train_data = TensorDataset(tokens_len, aspect_input_ids, aspect_input_mask, 363 | aspect_ids, aspect_segment_ids, exist_imp_aspect, exist_imp_opinion) 364 | 365 | if args.local_rank == -1: 366 | train_sampler = RandomSampler(train_data) 367 | else: 368 | train_sampler = DistributedSampler(train_data) 369 | train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args.train_batch_size) 370 | 371 | valid_gold = [valid_aspect_input_ids.numpy().tolist(), valid_gold] 372 | 373 | valid_data = TensorDataset(valid_tokens_len, valid_aspect_input_ids, valid_aspect_input_mask, 374 | valid_aspect_ids, valid_aspect_segment_ids, valid_exist_imp_aspect, valid_exist_imp_opinion) 375 | if args.local_rank == -1: 376 | valid_sampler = SequentialSampler(valid_data) 377 | else: 378 | valid_sampler = DistributedSampler(valid_data) 379 | valid_dataloader = DataLoader(valid_data, sampler=valid_sampler, batch_size=args.eval_batch_size) 380 | 381 | num_train_optimization_steps = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps) * args.num_train_epochs 382 | 383 | param_optimizer = list(model.named_parameters()) 384 | no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight'] 385 | optimizer_grouped_parameters = [ 386 | {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01}, 387 | {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0} 388 | ] 389 | 390 | optimizer = BertAdam(optimizer_grouped_parameters, 391 | lr=args.learning_rate, 392 | warmup=args.warmup_proportion, 393 | t_total=num_train_optimization_steps) 394 | 395 | max_macro_F1 = -1.0 396 | 397 | for _e in trange(int(args.num_train_epochs), desc="Epoch"): 398 | model.train() 399 | nb_tr_examples, nb_tr_steps = 0, 0 400 | 401 | for step, batch in enumerate(train_dataloader): 402 | batch = tuple(t.to(device) for t in batch) 403 | _tokens_len, _aspect_input_ids, _aspect_input_mask, _aspect_ids, _aspect_segment_ids, \ 404 | _exist_imp_aspect, _exist_imp_opinion = batch 405 | 406 | # define a new function to compute loss values for both output_modes 407 | 408 | losses, logits = model(aspect_input_ids=_aspect_input_ids, aspect_labels=_aspect_ids, 409 | aspect_token_type_ids=_aspect_segment_ids, aspect_attention_mask=_aspect_input_mask, 410 | exist_imp_aspect=_exist_imp_aspect, exist_imp_opinion=_exist_imp_opinion) 411 | 412 | if step % 30 == 0: 413 | logger.info('Total Loss is {} .'.format(losses[0])) 414 | step += 1 415 | loss = losses[0] 416 | if n_gpu > 1: 417 | loss = loss.mean() # mean() to average on multi-gpu. 418 | else: 419 | loss = loss 420 | if args.gradient_accumulation_steps > 1: 421 | loss = loss / args.gradient_accumulation_steps 422 | ae_loss = ae_loss / args.gradient_accumulation_steps 423 | 424 | if args.fp16: 425 | # optimizer.backward(loss) 426 | optimizer.backward(ae_loss) 427 | else: 428 | loss.backward() 429 | 430 | nb_tr_examples += _aspect_input_ids.size(0) 431 | nb_tr_steps += 1 432 | if (step + 1) % args.gradient_accumulation_steps == 0: 433 | if args.fp16: 434 | # modify learning rate with special warm up BERT uses 435 | # if args.fp16 is False, BertAdam is used that handles this automatically 436 | lr_this_step = args.learning_rate * warmup_linear.get_lr(global_step, args.warmup_proportion) 437 | for param_group in optimizer.param_groups: 438 | param_group['lr'] = lr_this_step 439 | optimizer.step() 440 | optimizer.zero_grad() 441 | global_step += 1 442 | 443 | model.eval() 444 | result = pred_eval(_e, args, logger, tokenizer, model, valid_dataloader, valid_gold, label_list, device, task_name, eval_type='valid') 445 | 446 | if max_macro_F1 < result['micro-F1']: 447 | model_to_save = model.module if hasattr(model, 'module') else model # Only save the model it-self 448 | dirs_name = args.output_dir 449 | if not os.path.exists(dirs_name): 450 | os.mkdir(dirs_name) 451 | output_model_file = os.path.join(dirs_name, WEIGHTS_NAME) 452 | output_config_file = os.path.join(dirs_name, CONFIG_NAME) 453 | 454 | torch.save(model_to_save.state_dict(), output_model_file) 455 | model_to_save.config.to_json_file(output_config_file) 456 | tokenizer.save_vocabulary(dirs_name) 457 | 458 | final_result = pred_eval(_e, args, logger, tokenizer, model, eval_dataloader, eval_gold, label_list, device, task_name, eval_type='test') 459 | max_macro_F1 = result['micro-F1'] 460 | 461 | else: 462 | model = model_dict[args.model_type].from_pretrained(args.bert_model, num_labels=num_labels) 463 | if args.local_rank == 0: 464 | torch.distributed.barrier() 465 | 466 | if args.fp16: 467 | model.half() 468 | 469 | model.to(device) 470 | model.eval() 471 | final_result = pred_eval('load fine-tuned', args, logger, tokenizer, model, eval_dataloader, eval_gold, label_list, device, task_name, eval_type='test') 472 | 473 | output_eval_file = os.path.join(args.output_dir, "Test_results.txt") 474 | with open(output_eval_file, "w") as writer: 475 | logger.info("***** Test results *****") 476 | for key in sorted(final_result.keys()): 477 | logger.info(" %s = %s", key, str(final_result[key])) 478 | writer.write("%s = %s\n" % (key, str(final_result[key]))) 479 | 480 | if __name__ == "__main__": 481 | main() 482 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/run_step2.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import, division, print_function 2 | 3 | import argparse 4 | import logging 5 | import os 6 | import sys 7 | import random 8 | from tqdm import tqdm, trange 9 | import pdb 10 | from collections import defaultdict, namedtuple 11 | from manager import * 12 | import math 13 | import codecs as cs 14 | from sklearn.model_selection import KFold 15 | from dataset_utils import * 16 | 17 | gm = GPUManager() 18 | device = gm.auto_choice(mode=0) 19 | os.environ["CUDA_VISIBLE_DEVICES"] = str(device) 20 | 21 | import numpy as np 22 | 23 | import torch 24 | from torch.utils.data import (DataLoader, RandomSampler, SequentialSampler, 25 | TensorDataset) 26 | from torch.nn import CrossEntropyLoss, MSELoss, MultiLabelSoftMarginLoss, BCEWithLogitsLoss 27 | 28 | from modeling import CategorySentiClassification 29 | 30 | # sys.path.insert(0, '/home/hjcai/8RTX/BERT/pytorch_pretrained_BERT') 31 | # from modeling_for_share import BertForQuadABSAPairCSAO 32 | from bert_utils.tokenization import BertTokenizer 33 | from bert_utils.optimization import BertAdam, WarmupLinearSchedule 34 | 35 | from run_classifier_dataset_utils import * 36 | from eval_metrics import * 37 | import gc 38 | 39 | if sys.version_info[0] == 2: 40 | import cPickle as pickle 41 | else: 42 | import pickle 43 | import warnings 44 | 45 | warnings.filterwarnings('ignore') 46 | 47 | CONFIG_NAME = "config.json" 48 | WEIGHTS_NAME = "pytorch_model.bin" 49 | 50 | logger = logging.getLogger(__name__) 51 | 52 | def main(): 53 | parser = argparse.ArgumentParser() 54 | ## Required parameters 55 | parser.add_argument("--data_dir", 56 | default=None, 57 | type=str, 58 | required=True, 59 | help="The input source data dir. Should contain the .tsv files (or other data files) for the task.") 60 | parser.add_argument("--bert_model", default=None, type=str, required=True) 61 | parser.add_argument("--output_dir", 62 | default=None, 63 | type=str, 64 | required=True, 65 | help="The output directory where the model predictions and checkpoints will be written.") 66 | parser.add_argument("--task_name", 67 | default=None, 68 | type=str, 69 | required=True, 70 | help="The name of the task to train.") 71 | parser.add_argument("--domain_type", 72 | default=None, 73 | type=str, 74 | required=True, 75 | help="domain to choose.") 76 | 77 | parser.add_argument("--model_type", 78 | default=None, 79 | type=str, 80 | required=True, 81 | help="model to choose.") 82 | 83 | ## Other parameters 84 | parser.add_argument("--max_seq_length", 85 | default=128, 86 | type=int, 87 | help="The maximum total input sequence length after WordPiece tokenization. \n" 88 | "Sequences longer than this will be truncated, and sequences shorter \n" 89 | "than this will be padded.") 90 | parser.add_argument("--do_train", 91 | action='store_true', 92 | help="Whether to run training.") 93 | parser.add_argument("--do_eval", 94 | action='store_true', 95 | help="Whether to run eval on the dev set.") 96 | parser.add_argument("--do_lower_case", 97 | action='store_true', 98 | help="Set this flag if you are using an uncased model.") 99 | parser.add_argument("--train_batch_size", 100 | default=32, 101 | type=int, 102 | help="Total batch size for training.") 103 | parser.add_argument("--eval_batch_size", 104 | default=8, 105 | type=int, 106 | help="Total batch size for eval.") 107 | parser.add_argument("--learning_rate", 108 | default=5e-5, 109 | type=float, 110 | help="The initial learning rate for Adam.") 111 | parser.add_argument("--num_train_epochs", 112 | default=3.0, 113 | type=float, 114 | help="Total number of training epochs to perform.") 115 | parser.add_argument("--warmup_proportion", 116 | default=0.1, 117 | type=float, 118 | help="Proportion of training to perform linear learning rate warmup for. " 119 | "E.g., 0.1 = 10%% of training.") 120 | parser.add_argument("--local_rank", 121 | type=int, 122 | default=-1, 123 | help="local_rank for distributed training on gpus") 124 | parser.add_argument('--seed', 125 | type=int, 126 | # default=42, 127 | default=13, 128 | help="random seed for initialization") 129 | parser.add_argument('--gradient_accumulation_steps', 130 | type=int, 131 | default=1, 132 | help="Number of updates steps to accumulate before performing a backward/update pass.") 133 | 134 | args = parser.parse_args() 135 | device = torch.device("cuda") 136 | n_gpu = torch.cuda.device_count() 137 | 138 | logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', 139 | datefmt = '%m/%d/%Y %H:%M:%S', 140 | level = logging.INFO) 141 | 142 | args.train_batch_size = args.train_batch_size // args.gradient_accumulation_steps 143 | random.seed(args.seed) 144 | np.random.seed(args.seed) 145 | torch.manual_seed(args.seed) 146 | torch.cuda.manual_seed_all(args.seed) 147 | 148 | if not os.path.exists(args.output_dir): 149 | os.makedirs(args.output_dir) 150 | 151 | task_name = args.task_name.lower() 152 | processor = processors[task_name]() 153 | label_list = processor.get_labels(args.domain_type) 154 | num_labels = len(label_list[0]) 155 | 156 | tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case) 157 | model_dict = { 158 | 'categorysenti': CategorySentiClassification, 159 | } 160 | cate_dict = {label : i for i, label in enumerate(label_list[0])} 161 | 162 | global_step = 0 163 | nb_tr_steps = 0 164 | eval_quad_gold = [] 165 | train_quad_gold = [] 166 | eval_quad_text = [] 167 | train_quad_text = [] 168 | #for entity#attribute 169 | if args.do_eval: 170 | eval_examples = processor.get_dev_examples(args.data_dir, args.domain_type) 171 | f = cs.open(args.data_dir+'/tokenized_data/'+args.domain_type+'_test_pair.tsv', 'r').readlines() 172 | eval_quad_text, eval_quad_gold = read_pair_gold(f, args) 173 | 174 | eval_features = convert_examples_to_features2nd( 175 | eval_examples, label_list, args.max_seq_length, tokenizer, task_name) 176 | 177 | eval_tokens_len = torch.tensor([f.tokens_len for f in eval_features], dtype=torch.long) 178 | eval_aspect_input_ids = torch.tensor([f.aspect_input_ids for f in eval_features], dtype=torch.long) 179 | eval_aspect_input_mask = torch.tensor([f.aspect_input_mask for f in eval_features], dtype=torch.long) 180 | eval_aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in eval_features], dtype=torch.long) 181 | eval_candidate_aspect = torch.tensor([f.candidate_aspect for f in eval_features], dtype=torch.long) 182 | eval_candidate_opinion = torch.tensor([f.candidate_opinion for f in eval_features], dtype=torch.long) 183 | eval_label_id = torch.tensor([f.label_id for f in eval_features], dtype=torch.long) 184 | 185 | # eval_gold = [eval_aspect_input_ids.numpy().tolist(), eval_quad_gold] 186 | eval_gold = [eval_quad_text, eval_quad_gold] 187 | 188 | eval_data = TensorDataset(eval_tokens_len, eval_aspect_input_ids, eval_aspect_input_mask, 189 | eval_aspect_segment_ids, eval_candidate_aspect, eval_candidate_opinion, 190 | eval_label_id) 191 | # Run prediction for full data 192 | if args.local_rank == -1: 193 | eval_sampler = SequentialSampler(eval_data) 194 | else: 195 | eval_sampler = DistributedSampler(eval_data) # Note that this sampler samples randomly 196 | eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=args.eval_batch_size) 197 | 198 | 199 | # Prepare data loader 200 | train_examples = processor.get_train_examples(args.data_dir, args.domain_type) 201 | train_features = convert_examples_to_features2nd( 202 | train_examples, label_list, args.max_seq_length, tokenizer, task_name) 203 | 204 | tokens_len = torch.tensor([f.tokens_len for f in train_features], dtype=torch.long) 205 | aspect_input_ids = torch.tensor([f.aspect_input_ids for f in train_features], dtype=torch.long) 206 | aspect_input_mask = torch.tensor([f.aspect_input_mask for f in train_features], dtype=torch.long) 207 | aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in train_features], dtype=torch.long) 208 | candidate_aspect = torch.tensor([f.candidate_aspect for f in train_features], dtype=torch.long) 209 | candidate_opinion = torch.tensor([f.candidate_opinion for f in train_features], dtype=torch.long) 210 | label_id = torch.tensor([f.label_id for f in train_features], dtype=torch.long) 211 | 212 | valid_examples = processor.get_valid_examples(args.data_dir, args.domain_type) 213 | valid_features = convert_examples_to_features2nd( 214 | valid_examples, label_list, args.max_seq_length, tokenizer, task_name) 215 | f = cs.open(args.data_dir+'/tokenized_data/'+args.domain_type+'_dev_pair.tsv', 'r').readlines() 216 | valid_quad_text, valid_quad_gold = read_pair_gold(f, args) 217 | 218 | valid_tokens_len = torch.tensor([f.tokens_len for f in valid_features], dtype=torch.long) 219 | valid_aspect_input_ids = torch.tensor([f.aspect_input_ids for f in valid_features], dtype=torch.long) 220 | valid_aspect_input_mask = torch.tensor([f.aspect_input_mask for f in valid_features], dtype=torch.long) 221 | valid_aspect_segment_ids = torch.tensor([f.aspect_segment_ids for f in valid_features], dtype=torch.long) 222 | valid_candidate_aspect = torch.tensor([f.candidate_aspect for f in valid_features], dtype=torch.long) 223 | valid_candidate_opinion = torch.tensor([f.candidate_opinion for f in valid_features], dtype=torch.long) 224 | valid_label_id = torch.tensor([f.label_id for f in valid_features], dtype=torch.long) 225 | 226 | all_results = [] 227 | 228 | 229 | valid_gold = [valid_quad_text, valid_quad_gold] 230 | 231 | train_data = TensorDataset(tokens_len, aspect_input_ids, aspect_input_mask, 232 | aspect_segment_ids, candidate_aspect, candidate_opinion, 233 | label_id) 234 | 235 | if args.local_rank == -1: 236 | train_sampler = RandomSampler(train_data) 237 | else: 238 | train_sampler = DistributedSampler(train_data) 239 | train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=args.train_batch_size) 240 | 241 | valid_data = TensorDataset(valid_tokens_len, valid_aspect_input_ids, valid_aspect_input_mask, 242 | valid_aspect_segment_ids, valid_candidate_aspect, valid_candidate_opinion, 243 | valid_label_id) 244 | 245 | if args.local_rank == -1: 246 | valid_sampler = SequentialSampler(valid_data) 247 | else: 248 | valid_sampler = DistributedSampler(valid_data) 249 | valid_dataloader = DataLoader(valid_data, sampler=valid_sampler, batch_size=args.eval_batch_size) 250 | 251 | if args.do_train: 252 | logger.info("***** Running training *****") 253 | 254 | num_train_optimization_steps = math.ceil(len(train_dataloader) / args.gradient_accumulation_steps) * args.num_train_epochs 255 | 256 | model = model_dict[args.model_type].from_pretrained(args.bert_model, num_labels=num_labels) 257 | param_optimizer = list(model.named_parameters()) 258 | no_decay = ['bias', 'LayerNorm.bias', 'LayerNorm.weight'] 259 | optimizer_grouped_parameters = [ 260 | {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)], 'weight_decay': 0.01}, 261 | {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)], 'weight_decay': 0.0} 262 | ] 263 | 264 | optimizer = BertAdam(optimizer_grouped_parameters, 265 | lr=args.learning_rate, 266 | warmup=args.warmup_proportion, 267 | t_total=num_train_optimization_steps) 268 | 269 | if args.local_rank == 0: 270 | torch.distributed.barrier() 271 | 272 | model.to(device) 273 | max_macro_F1 = -1.0 274 | 275 | for _e in trange(int(args.num_train_epochs), desc="Epoch"): 276 | model.train() 277 | nb_tr_examples, nb_tr_steps = 0, 0 278 | 279 | for step, batch in enumerate(train_dataloader): 280 | batch = tuple(t.to(device) for t in batch) 281 | _tokens_len, _aspect_input_ids, _aspect_input_mask, _aspect_segment_ids, _candidate_aspect, \ 282 | _candidate_opinion, _label_id = batch 283 | 284 | # define a new function to compute loss values for both output_modes 285 | 286 | losses, logits = model(tokenizer, _e, aspect_input_ids=_aspect_input_ids, 287 | aspect_token_type_ids=_aspect_segment_ids, aspect_attention_mask=_aspect_input_mask, 288 | candidate_aspect=_candidate_aspect, candidate_opinion=_candidate_opinion, label_id=_label_id) 289 | 290 | if step % 10 == 0: 291 | logger.info('Total Loss is {} .'.format(losses[0])) 292 | step += 1 293 | loss = losses[0] 294 | if n_gpu > 1: 295 | loss = loss.mean() # mean() to average on multi-gpu. 296 | else: 297 | loss = loss 298 | if args.gradient_accumulation_steps > 1: 299 | loss = loss / args.gradient_accumulation_steps 300 | 301 | loss.backward() 302 | 303 | nb_tr_examples += _aspect_input_ids.size(0) 304 | nb_tr_steps += 1 305 | if (step + 1) % args.gradient_accumulation_steps == 0: 306 | optimizer.step() 307 | optimizer.zero_grad() 308 | global_step += 1 309 | 310 | model.eval() 311 | result = pair_eval(_e, args, logger, tokenizer, model, valid_dataloader, valid_gold, 312 | label_list, device, task_name, eval_type='valid') 313 | 314 | if max_macro_F1 < result['micro-F1']: 315 | model_to_save = model.module if hasattr(model, 'module') else model # Only save the model it-self 316 | dirs_name = args.output_dir 317 | if not os.path.exists(dirs_name): 318 | os.mkdir(dirs_name) 319 | output_model_file = os.path.join(dirs_name, WEIGHTS_NAME) 320 | output_config_file = os.path.join(dirs_name, CONFIG_NAME) 321 | 322 | torch.save(model_to_save.state_dict(), output_model_file) 323 | model_to_save.config.to_json_file(output_config_file) 324 | tokenizer.save_vocabulary(dirs_name) 325 | 326 | final_result = pair_eval(_e, args, logger, tokenizer, model, eval_dataloader, eval_gold, 327 | label_list, device, task_name, eval_type='test') 328 | max_macro_F1 = result['micro-F1'] 329 | else: 330 | model = model_dict[args.model_type].from_pretrained(args.bert_model, num_labels=num_labels) 331 | if args.local_rank == 0: 332 | torch.distributed.barrier() 333 | 334 | model.to(device) 335 | model.eval() 336 | final_result = pair_eval('load fine-tuned', args, logger, tokenizer, model, eval_dataloader, eval_gold, 337 | label_list, device, task_name, eval_type='test') 338 | 339 | output_eval_file = os.path.join(args.output_dir, "Test_results.txt") 340 | with open(output_eval_file, "w") as writer: 341 | logger.info("***** Test results *****") 342 | for key in sorted(final_result.keys()): 343 | logger.info(" %s = %s", key, str(final_result[key])) 344 | writer.write("%s = %s\n" % (key, str(final_result[key]))) 345 | 346 | if __name__ == "__main__": 347 | main() 348 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/tokenized_data/get_1st_pairs.py: -------------------------------------------------------------------------------- 1 | #coding=utf-8 2 | 3 | import codecs as cs 4 | import os 5 | import sys 6 | 7 | base_dir = sys.argv[1] 8 | domian_type = sys.argv[2] 9 | 10 | cur_dir = base_dir+'/output/Extract-Classify-QUAD/'+domian_type 11 | 12 | if not os.path.exists(cur_dir+'_1st'): 13 | os.makedirs(cur_dir+'_1st') 14 | 15 | f = cs.open(cur_dir+'_1st'+'/pred4pipeline.txt', 'r').readlines() 16 | wf = cs.open(base_dir+'/ACOS-main/Extract-Classify-ACOS/tokenized_data/'+domian_type+'_test_pair_1st.tsv', 'w') 17 | 18 | for line in f: 19 | asp = []; opi = [] 20 | line = line.strip().split('\t') 21 | if len(line) <= 1: 22 | continue 23 | text = line[0] 24 | af = 0 25 | of = 0 26 | for ele in line[1:]: 27 | if ele.startswith('a'): 28 | asp.append(ele[2:]) 29 | af = 1 30 | else: 31 | opi.append(ele[2:]) 32 | of = 1 33 | if af == 0: 34 | asp.append('-1,-1') 35 | if of == 0: 36 | opi.append('-1,-1') 37 | if len(asp)>0 and len(opi)>0: 38 | pred = [] 39 | 40 | for pa in asp: 41 | ast, aed = int(pa.split(',')[0]), int(pa.split(',')[1]) 42 | for po in opi: 43 | ost, oed = int(po.split(',')[0]), int(po.split(',')[1]) 44 | pred.append([pa, po]) 45 | for ele in pred: 46 | wf.write(text+'####'+ele[0]+' '+ele[1]+'\n') 47 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/tokenized_data/rest16_dev_pair.tsv: -------------------------------------------------------------------------------- 1 | ca n ' t wait wait for my next visit .####-1,-1 -1,-1 RESTAURANT#GENERAL#2 2 | their sake list was extensive , but we were looking for purple haze , which was n ' t listed but made for us upon request !####1,3 4,5 DRINKS#STYLE_OPTIONS#2 3 | their sake list was extensive , but we were looking for purple haze , which was n ' t listed but made for us upon request !####-1,-1 -1,-1 SERVICE#GENERAL#2 4 | the spicy tuna roll was unusually good and the rock shrimp te ##mp ##ura was awesome , great app ##eti ##zer to share !####1,4 6,7 FOOD#QUALITY#2 5 | the spicy tuna roll was unusually good and the rock shrimp te ##mp ##ura was awesome , great app ##eti ##zer to share !####9,14 15,16 FOOD#QUALITY#2 6 | we love th pink pony .####3,5 1,2 RESTAURANT#GENERAL#2 7 | this place has got to be the best japanese restaurant in the new york area .####1,2 7,8 RESTAURANT#GENERAL#2 8 | i tend to judge a su ##shi restaurant by its sea ur ##chin , which was heavenly at su ##shi rose .####10,13 16,17 FOOD#QUALITY#2 9 | the prix fix ##e menu is worth every penny and you get more than enough ( both in quantity and quality ) .####1,5 6,7 FOOD#QUALITY#2 FOOD#STYLE_OPTIONS#2 FOOD#PRICES#2 10 | the food here is rather good , but only if you like to wait for it .####1,2 5,6 FOOD#QUALITY#2 11 | the food here is rather good , but only if you like to wait for it .####-1,-1 -1,-1 SERVICE#GENERAL#0 12 | also , specify if you like your food spicy - its rather bland if you do n ' t .####7,8 12,13 FOOD#QUALITY#0 13 | the am ##bie ##nce is pretty and nice for conversation , so a casual lunch here would probably be best .####1,4 5,6 AMBIENCE#GENERAL#2 14 | the am ##bie ##nce is pretty and nice for conversation , so a casual lunch here would probably be best .####1,4 7,8 AMBIENCE#GENERAL#2 15 | the am ##bie ##nce is pretty and nice for conversation , so a casual lunch here would probably be best .####-1,-1 19,20 RESTAURANT#MISCELLANEOUS#2 16 | it was horrible .####-1,-1 2,3 RESTAURANT#GENERAL#0 17 | have been dozens of times and never failed to enjoy the experience .####-1,-1 9,10 RESTAURANT#GENERAL#2 18 | make sure you try this place as often as you can .####5,6 7,8 RESTAURANT#GENERAL#2 19 | i had a huge group for my birthday and we were well taken care of .####-1,-1 11,12 SERVICE#GENERAL#2 20 | get the tuna of ga ##ri .####2,6 -1,-1 FOOD#QUALITY#2 21 | make sure you have the spicy sc ##all ##op roll . . .####5,10 -1,-1 FOOD#QUALITY#2 22 | rag ##a ' s is a romantic , cozy restaurant .####0,4 6,7 AMBIENCE#GENERAL#2 23 | rag ##a ' s is a romantic , cozy restaurant .####0,4 8,9 AMBIENCE#GENERAL#2 24 | i had a great time at je ##ky ##ll and hyde !####6,11 3,4 RESTAURANT#GENERAL#2 25 | i am bringing my whole family back next time .####-1,-1 -1,-1 RESTAURANT#MISCELLANEOUS#2 26 | fine dining restaurant quality .####1,2 0,1 FOOD#QUALITY#2 27 | we will return many times for this oasis in mid - town .####-1,-1 -1,-1 RESTAURANT#GENERAL#2 28 | the food options rule .####1,2 -1,-1 FOOD#STYLE_OPTIONS#2 29 | my husband and i thou ##gt it would be great to go to the je ##ky ##ll and hyde pub for our anniversary , and to our surprise it was fantastic .####14,20 9,10 RESTAURANT#GENERAL#2 30 | my husband and i thou ##gt it would be great to go to the je ##ky ##ll and hyde pub for our anniversary , and to our surprise it was fantastic .####14,20 30,31 RESTAURANT#GENERAL#2 31 | please take my advice , go and try this place .####9,10 -1,-1 RESTAURANT#GENERAL#2 32 | they were served warm and had a soft fluffy interior .####-1,-1 3,4 FOOD#QUALITY#2 33 | they were served warm and had a soft fluffy interior .####-1,-1 7,8 FOOD#QUALITY#2 34 | but they do .####-1,-1 -1,-1 SERVICE#GENERAL#2 35 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .####1,2 0,1 RESTAURANT#GENERAL#2 36 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .####1,2 3,4 RESTAURANT#GENERAL#2 37 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .####12,13 14,15 FOOD#QUALITY#2 38 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh .####12,13 18,19 FOOD#QUALITY#2 39 | hats off to the chef .####4,5 0,2 FOOD#QUALITY#2 40 | this is some really good , inexpensive su ##shi .####7,9 4,5 FOOD#QUALITY#2 41 | this is some really good , inexpensive su ##shi .####7,9 6,7 FOOD#PRICES#2 42 | this place is always very crowded and popular .####1,2 5,6 RESTAURANT#MISCELLANEOUS#2 43 | this place is always very crowded and popular .####1,2 7,8 RESTAURANT#MISCELLANEOUS#2 44 | and evaluated on those terms past ##is is simply wonderful .####5,7 9,10 RESTAURANT#GENERAL#2 45 | i ' m still mad that i had to pay for lou ##sy food .####13,14 11,13 FOOD#QUALITY#0 46 | the hang ##er steak was like rubber and the tuna was flavor ##less not to mention it tasted like it had just been tha ##wed .####1,4 6,7 FOOD#QUALITY#0 47 | the hang ##er steak was like rubber and the tuna was flavor ##less not to mention it tasted like it had just been tha ##wed .####9,10 11,13 FOOD#QUALITY#0 48 | big thumbs up !####-1,-1 1,3 RESTAURANT#GENERAL#2 49 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .####1,2 5,6 FOOD#QUALITY#2 50 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .####3,4 5,6 DRINKS#QUALITY#2 51 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .####8,9 5,6 SERVICE#GENERAL#2 52 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area .####20,23 -1,-1 AMBIENCE#GENERAL#2 53 | it is one the nice ##st outdoor restaurants i have ever seen in ny - - i am from italy and this place rivals the ones in my country .####6,8 4,6 AMBIENCE#GENERAL#2 54 | it is simply amazing .####-1,-1 3,4 FOOD#QUALITY#2 55 | beautiful experience .####-1,-1 0,1 RESTAURANT#GENERAL#2 56 | the menu is very limited - i think we counted 4 or 5 en ##tree ##s .####1,2 4,5 FOOD#STYLE_OPTIONS#0 57 | we will go back every time we are in the city .####-1,-1 -1,-1 RESTAURANT#GENERAL#2 58 | the characters really make for an enjoyable experience .####1,2 6,7 AMBIENCE#GENERAL#2 59 | however , i think je ##ck ##ll and hyde ##s t is one of those places that is fun to do once .####4,10 18,19 RESTAURANT#GENERAL#2 60 | we had a good time .####-1,-1 3,4 RESTAURANT#GENERAL#2 61 | a little over ##pr ##ice ##d but worth it once you take a bite .####-1,-1 2,6 FOOD#PRICES#0 62 | a little over ##pr ##ice ##d but worth it once you take a bite .####-1,-1 7,8 FOOD#QUALITY#2 63 | i have lived in japan for 7 years and the taste of the food and the feel of the restaurant is like being back in japan .####13,14 -1,-1 FOOD#QUALITY#2 64 | i have lived in japan for 7 years and the taste of the food and the feel of the restaurant is like being back in japan .####16,17 -1,-1 AMBIENCE#GENERAL#2 65 | check out the secret back room .####4,6 3,4 AMBIENCE#GENERAL#2 66 | thank you emilio .####2,3 -1,-1 RESTAURANT#GENERAL#2 67 | the food was authentic .####1,2 3,4 FOOD#QUALITY#2 68 | fantastic !####-1,-1 0,1 RESTAURANT#GENERAL#2 69 | but the staff was so horrible to us .####2,3 5,6 SERVICE#GENERAL#0 70 | decor is nice though service can be spot ##ty .####0,1 2,3 AMBIENCE#GENERAL#2 71 | decor is nice though service can be spot ##ty .####4,5 7,9 SERVICE#GENERAL#0 72 | just aw ##some .####-1,-1 1,3 FOOD#QUALITY#2 73 | i had their eggs benedict for br ##un ##ch , which were the worst in my entire life , i tried removing the ho ##llon ##dai ##se sauce completely that was how failed it was .####3,5 13,14 FOOD#QUALITY#0 74 | with the theater 2 blocks away we had a delicious meal in a beautiful room .####10,11 9,10 FOOD#QUALITY#2 75 | with the theater 2 blocks away we had a delicious meal in a beautiful room .####14,15 13,14 AMBIENCE#GENERAL#2 76 | with the theater 2 blocks away we had a delicious meal in a beautiful room .####-1,-1 -1,-1 LOCATION#GENERAL#2 77 | the service was at ##ten ##tive .####1,2 3,6 SERVICE#GENERAL#2 78 | pat ##ro ##on features a nice cigar bar and has great staff .####6,8 5,6 AMBIENCE#GENERAL#2 79 | pat ##ro ##on features a nice cigar bar and has great staff .####11,12 10,11 SERVICE#GENERAL#2 80 | ll ##oo ##v ##ve this place .####5,6 0,4 RESTAURANT#GENERAL#2 81 | the menu is limited but almost all of the dishes are excellent .####1,2 3,4 FOOD#STYLE_OPTIONS#0 82 | the menu is limited but almost all of the dishes are excellent .####9,10 11,12 FOOD#QUALITY#2 83 | wine list is extensive without being over - priced .####0,2 3,4 DRINKS#STYLE_OPTIONS#2 84 | wine list is extensive without being over - priced .####0,2 4,9 DRINKS#PRICES#2 85 | the food was very good , a great deal , and the place its self was great .####1,2 4,5 FOOD#QUALITY#2 86 | the food was very good , a great deal , and the place its self was great .####1,2 6,7 FOOD#PRICES#2 87 | the food was very good , a great deal , and the place its self was great .####12,13 16,17 AMBIENCE#GENERAL#2 88 | the wait staff is very fr ##ein ##dly , they make it feel like you ' re eating in a fr ##ein ##dly little european town .####1,3 5,8 SERVICE#GENERAL#2 89 | the whole set up is truly un ##pro ##fe ##ssion ##al and i wish cafe noir would get some good staff , because despite the current one this is a great place .####20,21 6,21 SERVICE#GENERAL#0 90 | the whole set up is truly un ##pro ##fe ##ssion ##al and i wish cafe noir would get some good staff , because despite the current one this is a great place .####14,16 30,31 RESTAURANT#GENERAL#2 91 | you should pass on the cal ##ama ##ri .####5,8 -1,-1 FOOD#QUALITY#0 92 | when asked about how a certain dish was prepared in comparison to a similar at other thai restaurants , he replied this is not mcdonald ' s , every place makes things differently####-1,-1 -1,-1 SERVICE#GENERAL#0 93 | everything was wonderful ; food , drinks , staff , mile ##au .####4,5 2,3 FOOD#QUALITY#2 94 | everything was wonderful ; food , drinks , staff , mile ##au .####6,7 2,3 DRINKS#QUALITY#2 95 | everything was wonderful ; food , drinks , staff , mile ##au .####8,9 2,3 SERVICE#GENERAL#2 96 | everything was wonderful ; food , drinks , staff , mile ##au .####10,12 2,3 AMBIENCE#GENERAL#2 97 | everything was wonderful ; food , drinks , staff , mile ##au .####-1,-1 2,3 RESTAURANT#GENERAL#2 98 | i would highly recommend this place !####5,6 3,4 RESTAURANT#GENERAL#2 99 | fresh ingredients and everything is made to order .####1,2 0,1 FOOD#QUALITY#2 100 | fresh ingredients and everything is made to order .####-1,-1 -1,-1 FOOD#QUALITY#2 101 | friendly staff that actually lets you enjoy your meal and the company you ' re with .####1,2 0,1 SERVICE#GENERAL#2 102 | i will definitely be going back .####-1,-1 -1,-1 RESTAURANT#GENERAL#2 103 | a great choice at any cost and a great deal .####-1,-1 8,9 RESTAURANT#GENERAL#2 104 | a great choice at any cost and a great deal .####-1,-1 1,2 RESTAURANT#PRICES#2 105 | tha ##lia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up .####8,9 7,8 SERVICE#GENERAL#2 106 | tha ##lia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up .####14,15 15,22 FOOD#QUALITY#0 107 | tha ##lia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up .####0,2 4,5 AMBIENCE#GENERAL#2 108 | i ordered the smoked salmon and roe app ##eti ##zer and it was off flavor .####3,10 13,15 FOOD#QUALITY#0 109 | the food is good , especially their more basic dishes , and the drinks are delicious .####1,2 3,4 FOOD#QUALITY#2 110 | the food is good , especially their more basic dishes , and the drinks are delicious .####8,10 3,4 FOOD#QUALITY#2 111 | the food is good , especially their more basic dishes , and the drinks are delicious .####13,14 15,16 DRINKS#QUALITY#2 112 | the big complaint : no toast ##ing available .####-1,-1 2,3 SERVICE#GENERAL#0 113 | i ' ve been many time and have never been disappointed .####-1,-1 8,11 RESTAURANT#GENERAL#2 114 | the turkey burger ##s are scary !####1,4 5,6 FOOD#QUALITY#0 115 | for authentic thai food , look no further than too ##ns .####2,4 1,2 FOOD#QUALITY#2 116 | try the pad thai , or sample anything on the app ##eti ##zer menu . . . they ' re all delicious .####2,4 21,22 FOOD#QUALITY#2 FOOD#QUALITY#2 117 | service was good and food is wonderful .####0,1 2,3 SERVICE#GENERAL#2 118 | service was good and food is wonderful .####4,5 6,7 FOOD#QUALITY#2 119 | it is definitely a good spot for snacks and chat .####5,6 4,5 RESTAURANT#GENERAL#2 120 | do not get the go go hamburger ##s , no matter what the reviews say .####4,8 -1,-1 FOOD#QUALITY#0 121 | steamed fresh so brought hot hot hot to your table .####-1,-1 1,2 FOOD#QUALITY#2 122 | small serving ##s for main en ##tree , i had salmon ( wasn ##t impressed ) girlfriend had chicken , it was good .####10,11 12,15 FOOD#QUALITY#0 123 | small serving ##s for main en ##tree , i had salmon ( wasn ##t impressed ) girlfriend had chicken , it was good .####18,19 22,23 FOOD#QUALITY#2 124 | small serving ##s for main en ##tree , i had salmon ( wasn ##t impressed ) girlfriend had chicken , it was good .####1,7 0,1 FOOD#GENERAL#0 125 | cute and decorative .####-1,-1 0,1 AMBIENCE#GENERAL#2 126 | cute and decorative .####-1,-1 2,3 AMBIENCE#GENERAL#2 127 | excellent spot for holiday get together ##s with co - workers or friends that you have n ' t seen in a while .####1,2 0,1 RESTAURANT#MISCELLANEOUS#2 128 | what a great place !####3,4 2,3 RESTAURANT#GENERAL#2 129 | not the typical nyc gi ##mm ##ick theme restaurant .####8,9 0,3 AMBIENCE#GENERAL#2 130 | service was very prompt but slightly rushed .####0,1 3,4 SERVICE#GENERAL#2 131 | service was very prompt but slightly rushed .####0,1 6,7 SERVICE#GENERAL#2 132 | i really liked this place .####4,5 2,3 RESTAURANT#GENERAL#2 133 | everything i had was good , and i ' m a eater .####-1,-1 4,5 FOOD#QUALITY#2 134 | i also recommend the rice dishes or the different varieties of cong ##ee ( rice por ##ridge ) .####4,6 2,3 FOOD#QUALITY#2 135 | i also recommend the rice dishes or the different varieties of cong ##ee ( rice por ##ridge ) .####11,18 2,3 FOOD#QUALITY#2 136 | i recently tried su ##an and i thought that it was great .####3,5 11,12 RESTAURANT#GENERAL#2 137 | have been several times and it never di ##ssa ##points .####-1,-1 6,10 RESTAURANT#GENERAL#2 138 | this place is a great bargain .####1,2 4,6 RESTAURANT#PRICES#2 139 | people are always friendly .####0,1 3,4 SERVICE#GENERAL#2 140 | the best pad thai i ' ve ever had .####2,4 1,2 FOOD#QUALITY#2 141 | would n ' t rec ##ome ##nd it for dinner !####-1,-1 1,7 RESTAURANT#GENERAL#0 142 | ask for us ##ha , the nice ##st bartender in manhattan .####2,4 6,8 SERVICE#GENERAL#2 143 | the food ' s as good as ever .####1,2 5,6 FOOD#QUALITY#2 144 | best drums ##tick ##s over rice and sour spicy soup in town !####1,6 0,1 FOOD#QUALITY#2 145 | best drums ##tick ##s over rice and sour spicy soup in town !####7,10 0,1 FOOD#QUALITY#2 146 | for those that go once and do n ' t enjoy it , all i can say is that they just do n ' t get it .####-1,-1 -1,-1 RESTAURANT#MISCELLANEOUS#2 147 | not worth it .####-1,-1 0,2 FOOD#PRICES#0 148 | this dish is my favorite and i always get it when i go there and never get tired of it .####1,2 4,5 FOOD#QUALITY#2 149 | big wong is a great place to eat and fill your stomach .####0,2 4,5 RESTAURANT#GENERAL#2 150 | the food is okay and the prices here are med ##io ##cre .####1,2 3,4 FOOD#QUALITY#1 151 | the food is okay and the prices here are med ##io ##cre .####-1,-1 9,12 RESTAURANT#PRICES#1 152 | me and my girls will definitely go back .####-1,-1 -1,-1 RESTAURANT#GENERAL#2 153 | the food is great .####1,2 3,4 FOOD#QUALITY#2 154 | la rosa waltz ##es in , and i think they are doing it the best .####0,2 14,15 FOOD#QUALITY#2 155 | interesting selection , good wines , service fine , fun decor .####4,5 3,4 DRINKS#QUALITY#2 156 | interesting selection , good wines , service fine , fun decor .####6,7 7,8 SERVICE#GENERAL#2 157 | interesting selection , good wines , service fine , fun decor .####10,11 9,10 AMBIENCE#GENERAL#2 158 | interesting selection , good wines , service fine , fun decor .####1,2 0,1 FOOD#STYLE_OPTIONS#2 159 | the food here was med ##io ##cre at best .####1,2 4,7 FOOD#QUALITY#0 160 | the cypriot restaurant has a lot going for it .####1,3 -1,-1 RESTAURANT#GENERAL#2 161 | will comeback for sure , wish they have it here in la . .####-1,-1 -1,-1 RESTAURANT#GENERAL#2 162 | the space kind of feels like an alice in wonderland setting , without it trying to be that .####1,2 -1,-1 AMBIENCE#GENERAL#0 163 | i paid just about $ 60 for a good meal , though : )####9,10 8,9 FOOD#QUALITY#2 164 | i paid just about $ 60 for a good meal , though : )####9,10 -1,-1 FOOD#PRICES#2 165 | love it .####-1,-1 0,1 RESTAURANT#GENERAL#2 166 | the place is a bit hidden away , but once you get there , it ' s all worth it .####1,2 5,7 LOCATION#GENERAL#2 167 | the place is a bit hidden away , but once you get there , it ' s all worth it .####1,2 18,19 LOCATION#GENERAL#2 168 | i love their chicken pasta can ##t remember the name but is soo ##o good####3,5 1,2 FOOD#QUALITY#2 169 | i love their chicken pasta can ##t remember the name but is soo ##o good####3,5 14,15 FOOD#QUALITY#2 170 | way below average####-1,-1 1,3 RESTAURANT#GENERAL#0 171 | i think the pizza is so over ##rated and was under cooked .####3,4 6,8 FOOD#QUALITY#0 172 | i think the pizza is so over ##rated and was under cooked .####3,4 10,12 FOOD#QUALITY#0 173 | i love this place####3,4 1,2 RESTAURANT#GENERAL#2 174 | the service was quick and friendly .####1,2 3,4 SERVICE#GENERAL#2 175 | the service was quick and friendly .####1,2 5,6 SERVICE#GENERAL#2 176 | i thought the restaurant was nice and clean .####3,4 5,6 RESTAURANT#GENERAL#2 177 | i thought the restaurant was nice and clean .####3,4 7,8 AMBIENCE#GENERAL#2 178 | chicken ter ##iya ##ki had tomato or pi ##mento ##s on top ? ?####0,4 -1,-1 FOOD#STYLE_OPTIONS#0 179 | the waitress was not at ##ten ##tive at all .####1,2 3,7 SERVICE#GENERAL#0 180 | just go to ya ##mat ##o and order the red dragon roll .####3,6 -1,-1 RESTAURANT#GENERAL#2 181 | just go to ya ##mat ##o and order the red dragon roll .####9,12 -1,-1 FOOD#QUALITY#2 182 | favorite su ##shi in nyc####1,3 0,1 FOOD#QUALITY#2 183 | the rolls are creative and i have yet to find another su ##shi place that serves up more in ##vent ##ive yet delicious japanese food .####1,2 3,4 FOOD#STYLE_OPTIONS#2 184 | the rolls are creative and i have yet to find another su ##shi place that serves up more in ##vent ##ive yet delicious japanese food .####23,25 18,21 FOOD#STYLE_OPTIONS#2 185 | the rolls are creative and i have yet to find another su ##shi place that serves up more in ##vent ##ive yet delicious japanese food .####23,25 22,23 FOOD#QUALITY#2 186 | my que ##sa ##di ##lla tasted like it had been made by a three - year old with no sense of proportion or flavor .####1,5 -1,-1 FOOD#QUALITY#0 FOOD#STYLE_OPTIONS#0 187 | save your money and your time and go somewhere else .####-1,-1 -1,-1 RESTAURANT#GENERAL#0 188 | the spin ##ach is fresh , def ##inate ##ly not frozen . . .####1,3 4,5 FOOD#QUALITY#2 189 | decor needs to be upgraded but the food is amazing !####0,1 4,5 AMBIENCE#GENERAL#0 190 | decor needs to be upgraded but the food is amazing !####7,8 9,10 FOOD#QUALITY#2 191 | my daughter ' s wedding reception at water ' s edge received the highest compliment ##s from our guests .####7,11 13,16 RESTAURANT#MISCELLANEOUS#2 192 | the high prices you ' re going to pay is for the view not for the food .####12,13 1,3 LOCATION#GENERAL#1 193 | the high prices you ' re going to pay is for the view not for the food .####16,17 1,3 FOOD#QUALITY#0 194 | the high prices you ' re going to pay is for the view not for the food .####-1,-1 1,3 RESTAURANT#PRICES#0 195 | not what i would expect for the price and prestige of this location .####12,13 -1,-1 RESTAURANT#PRICES#1 RESTAURANT#MISCELLANEOUS#1 196 | not what i would expect for the price and prestige of this location .####-1,-1 -1,-1 SERVICE#GENERAL#0 197 | the food was ok and fair nothing to go crazy .####1,2 3,4 FOOD#QUALITY#1 198 | the food was ok and fair nothing to go crazy .####1,2 5,6 FOOD#QUALITY#1 199 | impressed . . .####-1,-1 0,1 RESTAURANT#GENERAL#2 200 | subtle food and service####1,2 0,1 FOOD#QUALITY#2 201 | subtle food and service####3,4 0,1 SERVICE#GENERAL#2 202 | food took some time to prepare , all worth waiting for .####0,1 8,9 FOOD#QUALITY#2 203 | food took some time to prepare , all worth waiting for .####-1,-1 -1,-1 SERVICE#GENERAL#1 204 | great find in the west village !####-1,-1 0,1 RESTAURANT#GENERAL#2 205 | when the bill came , nothing was com ##ped , so i told the manager very politely that we were willing to pay for the wine , but i did n ' t think i should have to pay for food with a mag ##go ##t in it .####-1,-1 -1,-1 SERVICE#GENERAL#0 206 | amazing food .####1,2 0,1 FOOD#QUALITY#2 207 | rather than preparing vegetarian dish , the chef presented me with a plate of steamed vegetables ( minus sauce , season ##ing , or any form or aesthetic presentation ) .####3,5 -1,-1 FOOD#QUALITY#0 FOOD#STYLE_OPTIONS#0 208 | rather than preparing vegetarian dish , the chef presented me with a plate of steamed vegetables ( minus sauce , season ##ing , or any form or aesthetic presentation ) .####7,8 -1,-1 SERVICE#GENERAL#0 209 | the only thing that strikes you is the decor ? ( not very pleasant ) .####8,9 11,14 AMBIENCE#GENERAL#0 210 | the martini ##s are amazing and very fairly priced .####1,3 4,5 DRINKS#QUALITY#2 211 | the martini ##s are amazing and very fairly priced .####1,3 7,9 DRINKS#PRICES#2 212 | i wanted to go there to see if it was worth it and sadly , curious ##ity got the best of me and i paid dear ##ly for it .####-1,-1 -1,-1 RESTAURANT#GENERAL#0 RESTAURANT#PRICES#0 213 | the environment is very upscale and you will see a lot of rich guys with trophy wives or just highly paid escorts .####1,2 4,5 AMBIENCE#GENERAL#1 214 | the environment is very upscale and you will see a lot of rich guys with trophy wives or just highly paid escorts .####-1,-1 4,5 RESTAURANT#MISCELLANEOUS#1 215 | however , our $ 14 drinks were were horrible !####5,6 8,9 DRINKS#QUALITY#0 DRINKS#PRICES#0 216 | once we finally got a table , despite indicating we wanted an alla cart ##e menu we were pushed into a table that was only price fixed !####-1,-1 -1,-1 SERVICE#GENERAL#0 217 | i do n ' t appreciate places or people that try to drive up the bill without the patron ' s knowledge so that was a huge turn ##off ( more than the price ) .####-1,-1 -1,-1 SERVICE#GENERAL#0 RESTAURANT#PRICES#0 218 | eat at your own risk .####-1,-1 -1,-1 FOOD#QUALITY#0 219 | the service was spectacular as the waiter knew everything about the menu and his recommendations were amazing !####1,2 3,4 SERVICE#GENERAL#2 220 | the service was spectacular as the waiter knew everything about the menu and his recommendations were amazing !####6,7 16,17 SERVICE#GENERAL#2 221 | the sake ’ s complimented the courses very well and is successfully easing me into the sake world .####1,4 8,9 DRINKS#QUALITY#2 222 | maybe it was the great company ( i had friends visiting from phil ##ly – yes , it was not a date this time ) or the super reasonable price point , but i just can ’ t say enough good things about this brass ##erie .####44,46 40,41 RESTAURANT#GENERAL#2 RESTAURANT#PRICES#2 223 | i tried a couple other dishes but was n ' t too impressed .####5,6 8,13 FOOD#QUALITY#1 224 | the family seafood en ##tree was very good .####1,5 7,8 FOOD#QUALITY#2 225 | the food they serve is not comforting , not app ##eti ##zing and un ##co ##oked .####1,2 5,7 FOOD#QUALITY#0 226 | the food they serve is not comforting , not app ##eti ##zing and un ##co ##oked .####1,2 8,12 FOOD#QUALITY#0 227 | the food they serve is not comforting , not app ##eti ##zing and un ##co ##oked .####1,2 13,16 FOOD#QUALITY#0 228 | super ##ci ##lio ##us sc ##orn is in .####-1,-1 0,4 SERVICE#GENERAL#0 229 | single worst restaurant in manhattan####2,3 1,2 RESTAURANT#GENERAL#0 230 | it is quite a spectacular scene i ' ll give them that .####5,6 4,5 AMBIENCE#GENERAL#2 231 | how this place survives the competitive west village market in this economy , or any other for that matter , is beyond me .####2,3 -1,-1 RESTAURANT#GENERAL#0 232 | though it ' s been crowded most times i ' ve gone here , bark always delivers on their food .####14,15 -1,-1 RESTAURANT#MISCELLANEOUS#1 233 | though it ' s been crowded most times i ' ve gone here , bark always delivers on their food .####19,20 -1,-1 FOOD#QUALITY#2 234 | but nonetheless - - great spot , great food .####5,6 4,5 RESTAURANT#GENERAL#2 235 | but nonetheless - - great spot , great food .####8,9 7,8 FOOD#QUALITY#2 236 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .####1,2 5,6 FOOD#QUALITY#2 237 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .####3,4 5,6 SERVICE#GENERAL#2 238 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .####9,13 15,19 SERVICE#GENERAL#0 239 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant .####9,13 20,21 SERVICE#GENERAL#0 240 | a word to the wise : you ca n ' t din ##e here and disturb the mai ##tre - d ' s sense of ` ` table turnover ' ' , as w ##ha ##cked as it is , or else .####17,21 -1,-1 SERVICE#GENERAL#0 241 | i had the lamb special which was perfect .####3,5 7,8 FOOD#QUALITY#2 242 | do n ' t go to this place !####7,8 -1,-1 RESTAURANT#GENERAL#0 243 | when the main course finally arrived ( another 45 ##mins ) half of our order was missing .####-1,-1 -1,-1 SERVICE#GENERAL#0 244 | when we threatened to leave , we were offered a me ##ager discount even though half the order was missing .####-1,-1 -1,-1 SERVICE#GENERAL#0 245 | the bread was stale , the salad was over ##pr ##ice ##d and empty .####1,2 3,4 FOOD#QUALITY#0 246 | the bread was stale , the salad was over ##pr ##ice ##d and empty .####6,7 8,12 FOOD#PRICES#0 247 | the bread was stale , the salad was over ##pr ##ice ##d and empty .####6,7 13,14 FOOD#STYLE_OPTIONS#0 248 | shame on this place for the horrible rude staff and non - existent customer service .####8,9 6,7 SERVICE#GENERAL#0 249 | shame on this place for the horrible rude staff and non - existent customer service .####8,9 7,8 SERVICE#GENERAL#0 250 | shame on this place for the horrible rude staff and non - existent customer service .####13,15 10,13 SERVICE#GENERAL#0 251 | the food is good .####1,2 3,4 FOOD#QUALITY#2 252 | -------------------------------------------------------------------------------- /Extract-Classify-ACOS/tokenized_data/rest16_dev_quad_bert.tsv: -------------------------------------------------------------------------------- 1 | ca n ' t wait wait for my next visit . -1,-1 RESTAURANT#GENERAL 2 -1,-1 2 | their sake list was extensive , but we were looking for purple haze , which was n ' t listed but made for us upon request ! 1,3 DRINKS#STYLE_OPTIONS 2 4,5 -1,-1 SERVICE#GENERAL 2 -1,-1 3 | the spicy tuna roll was unusually good and the rock shrimp te ##mp ##ura was awesome , great app ##eti ##zer to share ! 1,4 FOOD#QUALITY 2 6,7 9,14 FOOD#QUALITY 2 15,16 4 | we love th pink pony . 3,5 RESTAURANT#GENERAL 2 1,2 5 | this place has got to be the best japanese restaurant in the new york area . 1,2 RESTAURANT#GENERAL 2 7,8 6 | i tend to judge a su ##shi restaurant by its sea ur ##chin , which was heavenly at su ##shi rose . 10,13 FOOD#QUALITY 2 16,17 7 | the prix fix ##e menu is worth every penny and you get more than enough ( both in quantity and quality ) . 1,5 FOOD#QUALITY 2 6,7 1,5 FOOD#STYLE_OPTIONS 2 6,7 1,5 FOOD#PRICES 2 6,7 8 | the food here is rather good , but only if you like to wait for it . 1,2 FOOD#QUALITY 2 5,6 -1,-1 SERVICE#GENERAL 0 -1,-1 9 | also , specify if you like your food spicy - its rather bland if you do n ' t . 7,8 FOOD#QUALITY 0 12,13 10 | the am ##bie ##nce is pretty and nice for conversation , so a casual lunch here would probably be best . 1,4 AMBIENCE#GENERAL 2 5,6 1,4 AMBIENCE#GENERAL 2 7,8 -1,-1 RESTAURANT#MISCELLANEOUS 2 19,20 11 | it was horrible . -1,-1 RESTAURANT#GENERAL 0 2,3 12 | have been dozens of times and never failed to enjoy the experience . -1,-1 RESTAURANT#GENERAL 2 9,10 13 | make sure you try this place as often as you can . 5,6 RESTAURANT#GENERAL 2 7,8 14 | i had a huge group for my birthday and we were well taken care of . -1,-1 SERVICE#GENERAL 2 11,12 15 | get the tuna of ga ##ri . 2,6 FOOD#QUALITY 2 -1,-1 16 | make sure you have the spicy sc ##all ##op roll . . . 5,10 FOOD#QUALITY 2 -1,-1 17 | rag ##a ' s is a romantic , cozy restaurant . 0,4 AMBIENCE#GENERAL 2 6,7 0,4 AMBIENCE#GENERAL 2 8,9 18 | i had a great time at je ##ky ##ll and hyde ! 6,11 RESTAURANT#GENERAL 2 3,4 19 | i am bringing my whole family back next time . -1,-1 RESTAURANT#MISCELLANEOUS 2 -1,-1 20 | fine dining restaurant quality . 1,2 FOOD#QUALITY 2 0,1 21 | we will return many times for this oasis in mid - town . -1,-1 RESTAURANT#GENERAL 2 -1,-1 22 | the food options rule . 1,2 FOOD#STYLE_OPTIONS 2 -1,-1 23 | my husband and i thou ##gt it would be great to go to the je ##ky ##ll and hyde pub for our anniversary , and to our surprise it was fantastic . 14,20 RESTAURANT#GENERAL 2 9,10 14,20 RESTAURANT#GENERAL 2 30,31 24 | please take my advice , go and try this place . 9,10 RESTAURANT#GENERAL 2 -1,-1 25 | they were served warm and had a soft fluffy interior . -1,-1 FOOD#QUALITY 2 3,4 -1,-1 FOOD#QUALITY 2 7,8 26 | but they do . -1,-1 SERVICE#GENERAL 2 -1,-1 27 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh . 1,2 RESTAURANT#GENERAL 2 0,1 1,2 RESTAURANT#GENERAL 2 3,4 12,13 FOOD#QUALITY 2 14,15 12,13 FOOD#QUALITY 2 18,19 28 | hats off to the chef . 4,5 FOOD#QUALITY 2 0,2 29 | this is some really good , inexpensive su ##shi . 7,9 FOOD#QUALITY 2 4,5 7,9 FOOD#PRICES 2 6,7 30 | this place is always very crowded and popular . 1,2 RESTAURANT#MISCELLANEOUS 2 5,6 1,2 RESTAURANT#MISCELLANEOUS 2 7,8 31 | and evaluated on those terms past ##is is simply wonderful . 5,7 RESTAURANT#GENERAL 2 9,10 32 | i ' m still mad that i had to pay for lou ##sy food . 13,14 FOOD#QUALITY 0 11,13 33 | the hang ##er steak was like rubber and the tuna was flavor ##less not to mention it tasted like it had just been tha ##wed . 1,4 FOOD#QUALITY 0 6,7 9,10 FOOD#QUALITY 0 11,13 34 | big thumbs up ! -1,-1 RESTAURANT#GENERAL 2 1,3 35 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area . 1,2 FOOD#QUALITY 2 5,6 3,4 DRINKS#QUALITY 2 5,6 8,9 SERVICE#GENERAL 2 5,6 20,23 AMBIENCE#GENERAL 2 -1,-1 36 | it is one the nice ##st outdoor restaurants i have ever seen in ny - - i am from italy and this place rivals the ones in my country . 6,8 AMBIENCE#GENERAL 2 4,6 37 | it is simply amazing . -1,-1 FOOD#QUALITY 2 3,4 38 | beautiful experience . -1,-1 RESTAURANT#GENERAL 2 0,1 39 | the menu is very limited - i think we counted 4 or 5 en ##tree ##s . 1,2 FOOD#STYLE_OPTIONS 0 4,5 40 | we will go back every time we are in the city . -1,-1 RESTAURANT#GENERAL 2 -1,-1 41 | the characters really make for an enjoyable experience . 1,2 AMBIENCE#GENERAL 2 6,7 42 | however , i think je ##ck ##ll and hyde ##s t is one of those places that is fun to do once . 4,10 RESTAURANT#GENERAL 2 18,19 43 | we had a good time . -1,-1 RESTAURANT#GENERAL 2 3,4 44 | a little over ##pr ##ice ##d but worth it once you take a bite . -1,-1 FOOD#PRICES 0 2,6 -1,-1 FOOD#QUALITY 2 7,8 45 | i have lived in japan for 7 years and the taste of the food and the feel of the restaurant is like being back in japan . 13,14 FOOD#QUALITY 2 -1,-1 16,17 AMBIENCE#GENERAL 2 -1,-1 46 | check out the secret back room . 4,6 AMBIENCE#GENERAL 2 3,4 47 | thank you emilio . 2,3 RESTAURANT#GENERAL 2 -1,-1 48 | the food was authentic . 1,2 FOOD#QUALITY 2 3,4 49 | fantastic ! -1,-1 RESTAURANT#GENERAL 2 0,1 50 | but the staff was so horrible to us . 2,3 SERVICE#GENERAL 0 5,6 51 | decor is nice though service can be spot ##ty . 0,1 AMBIENCE#GENERAL 2 2,3 4,5 SERVICE#GENERAL 0 7,9 52 | just aw ##some . -1,-1 FOOD#QUALITY 2 1,3 53 | i had their eggs benedict for br ##un ##ch , which were the worst in my entire life , i tried removing the ho ##llon ##dai ##se sauce completely that was how failed it was . 3,5 FOOD#QUALITY 0 13,14 54 | with the theater 2 blocks away we had a delicious meal in a beautiful room . 10,11 FOOD#QUALITY 2 9,10 14,15 AMBIENCE#GENERAL 2 13,14 -1,-1 LOCATION#GENERAL 2 -1,-1 55 | the service was at ##ten ##tive . 1,2 SERVICE#GENERAL 2 3,6 56 | pat ##ro ##on features a nice cigar bar and has great staff . 6,8 AMBIENCE#GENERAL 2 5,6 11,12 SERVICE#GENERAL 2 10,11 57 | ll ##oo ##v ##ve this place . 5,6 RESTAURANT#GENERAL 2 0,4 58 | the menu is limited but almost all of the dishes are excellent . 1,2 FOOD#STYLE_OPTIONS 0 3,4 9,10 FOOD#QUALITY 2 11,12 59 | wine list is extensive without being over - priced . 0,2 DRINKS#STYLE_OPTIONS 2 3,4 0,2 DRINKS#PRICES 2 4,9 60 | the food was very good , a great deal , and the place its self was great . 1,2 FOOD#QUALITY 2 4,5 1,2 FOOD#PRICES 2 6,7 12,13 AMBIENCE#GENERAL 2 16,17 61 | the wait staff is very fr ##ein ##dly , they make it feel like you ' re eating in a fr ##ein ##dly little european town . 1,3 SERVICE#GENERAL 2 5,8 62 | the whole set up is truly un ##pro ##fe ##ssion ##al and i wish cafe noir would get some good staff , because despite the current one this is a great place . 20,21 SERVICE#GENERAL 0 6,21 14,16 RESTAURANT#GENERAL 2 30,31 63 | you should pass on the cal ##ama ##ri . 5,8 FOOD#QUALITY 0 -1,-1 64 | when asked about how a certain dish was prepared in comparison to a similar at other thai restaurants , he replied this is not mcdonald ' s , every place makes things differently -1,-1 SERVICE#GENERAL 0 -1,-1 65 | everything was wonderful ; food , drinks , staff , mile ##au . 4,5 FOOD#QUALITY 2 2,3 6,7 DRINKS#QUALITY 2 2,3 8,9 SERVICE#GENERAL 2 2,3 10,12 AMBIENCE#GENERAL 2 2,3 -1,-1 RESTAURANT#GENERAL 2 2,3 66 | i would highly recommend this place ! 5,6 RESTAURANT#GENERAL 2 3,4 67 | fresh ingredients and everything is made to order . 1,2 FOOD#QUALITY 2 0,1 -1,-1 FOOD#QUALITY 2 -1,-1 68 | friendly staff that actually lets you enjoy your meal and the company you ' re with . 1,2 SERVICE#GENERAL 2 0,1 69 | i will definitely be going back . -1,-1 RESTAURANT#GENERAL 2 -1,-1 70 | a great choice at any cost and a great deal . -1,-1 RESTAURANT#GENERAL 2 8,9 -1,-1 RESTAURANT#PRICES 2 1,2 71 | tha ##lia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up . 8,9 SERVICE#GENERAL 2 7,8 14,15 FOOD#QUALITY 0 15,22 0,2 AMBIENCE#GENERAL 2 4,5 72 | i ordered the smoked salmon and roe app ##eti ##zer and it was off flavor . 3,10 FOOD#QUALITY 0 13,15 73 | the food is good , especially their more basic dishes , and the drinks are delicious . 1,2 FOOD#QUALITY 2 3,4 8,10 FOOD#QUALITY 2 3,4 13,14 DRINKS#QUALITY 2 15,16 74 | the big complaint : no toast ##ing available . -1,-1 SERVICE#GENERAL 0 2,3 75 | i ' ve been many time and have never been disappointed . -1,-1 RESTAURANT#GENERAL 2 8,11 76 | the turkey burger ##s are scary ! 1,4 FOOD#QUALITY 0 5,6 77 | for authentic thai food , look no further than too ##ns . 2,4 FOOD#QUALITY 2 1,2 78 | try the pad thai , or sample anything on the app ##eti ##zer menu . . . they ' re all delicious . 2,4 FOOD#QUALITY 2 21,22 2,4 FOOD#QUALITY 2 21,22 79 | service was good and food is wonderful . 0,1 SERVICE#GENERAL 2 2,3 4,5 FOOD#QUALITY 2 6,7 80 | it is definitely a good spot for snacks and chat . 5,6 RESTAURANT#GENERAL 2 4,5 81 | do not get the go go hamburger ##s , no matter what the reviews say . 4,8 FOOD#QUALITY 0 -1,-1 82 | steamed fresh so brought hot hot hot to your table . -1,-1 FOOD#QUALITY 2 1,2 83 | small serving ##s for main en ##tree , i had salmon ( wasn ##t impressed ) girlfriend had chicken , it was good . 10,11 FOOD#QUALITY 0 12,15 18,19 FOOD#QUALITY 2 22,23 1,7 FOOD#GENERAL 0 0,1 84 | cute and decorative . -1,-1 AMBIENCE#GENERAL 2 0,1 -1,-1 AMBIENCE#GENERAL 2 2,3 85 | excellent spot for holiday get together ##s with co - workers or friends that you have n ' t seen in a while . 1,2 RESTAURANT#MISCELLANEOUS 2 0,1 86 | what a great place ! 3,4 RESTAURANT#GENERAL 2 2,3 87 | not the typical nyc gi ##mm ##ick theme restaurant . 8,9 AMBIENCE#GENERAL 2 0,3 88 | service was very prompt but slightly rushed . 0,1 SERVICE#GENERAL 2 3,4 0,1 SERVICE#GENERAL 2 6,7 89 | i really liked this place . 4,5 RESTAURANT#GENERAL 2 2,3 90 | everything i had was good , and i ' m a eater . -1,-1 FOOD#QUALITY 2 4,5 91 | i also recommend the rice dishes or the different varieties of cong ##ee ( rice por ##ridge ) . 4,6 FOOD#QUALITY 2 2,3 11,18 FOOD#QUALITY 2 2,3 92 | i recently tried su ##an and i thought that it was great . 3,5 RESTAURANT#GENERAL 2 11,12 93 | have been several times and it never di ##ssa ##points . -1,-1 RESTAURANT#GENERAL 2 6,10 94 | this place is a great bargain . 1,2 RESTAURANT#PRICES 2 4,6 95 | people are always friendly . 0,1 SERVICE#GENERAL 2 3,4 96 | the best pad thai i ' ve ever had . 2,4 FOOD#QUALITY 2 1,2 97 | would n ' t rec ##ome ##nd it for dinner ! -1,-1 RESTAURANT#GENERAL 0 1,7 98 | ask for us ##ha , the nice ##st bartender in manhattan . 2,4 SERVICE#GENERAL 2 6,8 99 | the food ' s as good as ever . 1,2 FOOD#QUALITY 2 5,6 100 | best drums ##tick ##s over rice and sour spicy soup in town ! 1,6 FOOD#QUALITY 2 0,1 7,10 FOOD#QUALITY 2 0,1 101 | for those that go once and do n ' t enjoy it , all i can say is that they just do n ' t get it . -1,-1 RESTAURANT#MISCELLANEOUS 2 -1,-1 102 | not worth it . -1,-1 FOOD#PRICES 0 0,2 103 | this dish is my favorite and i always get it when i go there and never get tired of it . 1,2 FOOD#QUALITY 2 4,5 104 | big wong is a great place to eat and fill your stomach . 0,2 RESTAURANT#GENERAL 2 4,5 105 | the food is okay and the prices here are med ##io ##cre . 1,2 FOOD#QUALITY 1 3,4 -1,-1 RESTAURANT#PRICES 1 9,12 106 | me and my girls will definitely go back . -1,-1 RESTAURANT#GENERAL 2 -1,-1 107 | the food is great . 1,2 FOOD#QUALITY 2 3,4 108 | la rosa waltz ##es in , and i think they are doing it the best . 0,2 FOOD#QUALITY 2 14,15 109 | interesting selection , good wines , service fine , fun decor . 4,5 DRINKS#QUALITY 2 3,4 6,7 SERVICE#GENERAL 2 7,8 10,11 AMBIENCE#GENERAL 2 9,10 1,2 FOOD#STYLE_OPTIONS 2 0,1 110 | the food here was med ##io ##cre at best . 1,2 FOOD#QUALITY 0 4,7 111 | the cypriot restaurant has a lot going for it . 1,3 RESTAURANT#GENERAL 2 -1,-1 112 | will comeback for sure , wish they have it here in la . . -1,-1 RESTAURANT#GENERAL 2 -1,-1 113 | the space kind of feels like an alice in wonderland setting , without it trying to be that . 1,2 AMBIENCE#GENERAL 0 -1,-1 114 | i paid just about $ 60 for a good meal , though : ) 9,10 FOOD#QUALITY 2 8,9 9,10 FOOD#PRICES 2 -1,-1 115 | love it . -1,-1 RESTAURANT#GENERAL 2 0,1 116 | the place is a bit hidden away , but once you get there , it ' s all worth it . 1,2 LOCATION#GENERAL 2 5,7 1,2 LOCATION#GENERAL 2 18,19 117 | i love their chicken pasta can ##t remember the name but is soo ##o good 3,5 FOOD#QUALITY 2 1,2 3,5 FOOD#QUALITY 2 14,15 118 | way below average -1,-1 RESTAURANT#GENERAL 0 1,3 119 | i think the pizza is so over ##rated and was under cooked . 3,4 FOOD#QUALITY 0 6,8 3,4 FOOD#QUALITY 0 10,12 120 | i love this place 3,4 RESTAURANT#GENERAL 2 1,2 121 | the service was quick and friendly . 1,2 SERVICE#GENERAL 2 3,4 1,2 SERVICE#GENERAL 2 5,6 122 | i thought the restaurant was nice and clean . 3,4 RESTAURANT#GENERAL 2 5,6 3,4 AMBIENCE#GENERAL 2 7,8 123 | chicken ter ##iya ##ki had tomato or pi ##mento ##s on top ? ? 0,4 FOOD#STYLE_OPTIONS 0 -1,-1 124 | the waitress was not at ##ten ##tive at all . 1,2 SERVICE#GENERAL 0 3,7 125 | just go to ya ##mat ##o and order the red dragon roll . 3,6 RESTAURANT#GENERAL 2 -1,-1 9,12 FOOD#QUALITY 2 -1,-1 126 | favorite su ##shi in nyc 1,3 FOOD#QUALITY 2 0,1 127 | the rolls are creative and i have yet to find another su ##shi place that serves up more in ##vent ##ive yet delicious japanese food . 1,2 FOOD#STYLE_OPTIONS 2 3,4 23,25 FOOD#STYLE_OPTIONS 2 18,21 23,25 FOOD#QUALITY 2 22,23 128 | my que ##sa ##di ##lla tasted like it had been made by a three - year old with no sense of proportion or flavor . 1,5 FOOD#QUALITY 0 -1,-1 1,5 FOOD#STYLE_OPTIONS 0 -1,-1 129 | save your money and your time and go somewhere else . -1,-1 RESTAURANT#GENERAL 0 -1,-1 130 | the spin ##ach is fresh , def ##inate ##ly not frozen . . . 1,3 FOOD#QUALITY 2 4,5 131 | decor needs to be upgraded but the food is amazing ! 0,1 AMBIENCE#GENERAL 0 4,5 7,8 FOOD#QUALITY 2 9,10 132 | my daughter ' s wedding reception at water ' s edge received the highest compliment ##s from our guests . 7,11 RESTAURANT#MISCELLANEOUS 2 13,16 133 | the high prices you ' re going to pay is for the view not for the food . 12,13 LOCATION#GENERAL 1 1,3 16,17 FOOD#QUALITY 0 1,3 -1,-1 RESTAURANT#PRICES 0 1,3 134 | not what i would expect for the price and prestige of this location . 12,13 RESTAURANT#PRICES 1 -1,-1 12,13 RESTAURANT#MISCELLANEOUS 1 -1,-1 -1,-1 SERVICE#GENERAL 0 -1,-1 135 | the food was ok and fair nothing to go crazy . 1,2 FOOD#QUALITY 1 3,4 1,2 FOOD#QUALITY 1 5,6 136 | impressed . . . -1,-1 RESTAURANT#GENERAL 2 0,1 137 | subtle food and service 1,2 FOOD#QUALITY 2 0,1 3,4 SERVICE#GENERAL 2 0,1 138 | food took some time to prepare , all worth waiting for . 0,1 FOOD#QUALITY 2 8,9 -1,-1 SERVICE#GENERAL 1 -1,-1 139 | great find in the west village ! -1,-1 RESTAURANT#GENERAL 2 0,1 140 | when the bill came , nothing was com ##ped , so i told the manager very politely that we were willing to pay for the wine , but i did n ' t think i should have to pay for food with a mag ##go ##t in it . -1,-1 SERVICE#GENERAL 0 -1,-1 141 | amazing food . 1,2 FOOD#QUALITY 2 0,1 142 | rather than preparing vegetarian dish , the chef presented me with a plate of steamed vegetables ( minus sauce , season ##ing , or any form or aesthetic presentation ) . 3,5 FOOD#QUALITY 0 -1,-1 3,5 FOOD#STYLE_OPTIONS 0 -1,-1 7,8 SERVICE#GENERAL 0 -1,-1 143 | the only thing that strikes you is the decor ? ( not very pleasant ) . 8,9 AMBIENCE#GENERAL 0 11,14 144 | the martini ##s are amazing and very fairly priced . 1,3 DRINKS#QUALITY 2 4,5 1,3 DRINKS#PRICES 2 7,9 145 | i wanted to go there to see if it was worth it and sadly , curious ##ity got the best of me and i paid dear ##ly for it . -1,-1 RESTAURANT#GENERAL 0 -1,-1 -1,-1 RESTAURANT#PRICES 0 -1,-1 146 | the environment is very upscale and you will see a lot of rich guys with trophy wives or just highly paid escorts . 1,2 AMBIENCE#GENERAL 1 4,5 -1,-1 RESTAURANT#MISCELLANEOUS 1 4,5 147 | however , our $ 14 drinks were were horrible ! 5,6 DRINKS#QUALITY 0 8,9 5,6 DRINKS#PRICES 0 8,9 148 | once we finally got a table , despite indicating we wanted an alla cart ##e menu we were pushed into a table that was only price fixed ! -1,-1 SERVICE#GENERAL 0 -1,-1 149 | i do n ' t appreciate places or people that try to drive up the bill without the patron ' s knowledge so that was a huge turn ##off ( more than the price ) . -1,-1 SERVICE#GENERAL 0 -1,-1 -1,-1 RESTAURANT#PRICES 0 -1,-1 150 | eat at your own risk . -1,-1 FOOD#QUALITY 0 -1,-1 151 | the service was spectacular as the waiter knew everything about the menu and his recommendations were amazing ! 1,2 SERVICE#GENERAL 2 3,4 6,7 SERVICE#GENERAL 2 16,17 152 | the sake ’ s complimented the courses very well and is successfully easing me into the sake world . 1,4 DRINKS#QUALITY 2 8,9 153 | maybe it was the great company ( i had friends visiting from phil ##ly – yes , it was not a date this time ) or the super reasonable price point , but i just can ’ t say enough good things about this brass ##erie . 44,46 RESTAURANT#GENERAL 2 40,41 44,46 RESTAURANT#PRICES 2 40,41 154 | i tried a couple other dishes but was n ' t too impressed . 5,6 FOOD#QUALITY 1 8,13 155 | the family seafood en ##tree was very good . 1,5 FOOD#QUALITY 2 7,8 156 | the food they serve is not comforting , not app ##eti ##zing and un ##co ##oked . 1,2 FOOD#QUALITY 0 5,7 1,2 FOOD#QUALITY 0 8,12 1,2 FOOD#QUALITY 0 13,16 157 | super ##ci ##lio ##us sc ##orn is in . -1,-1 SERVICE#GENERAL 0 0,4 158 | single worst restaurant in manhattan 2,3 RESTAURANT#GENERAL 0 1,2 159 | it is quite a spectacular scene i ' ll give them that . 5,6 AMBIENCE#GENERAL 2 4,5 160 | how this place survives the competitive west village market in this economy , or any other for that matter , is beyond me . 2,3 RESTAURANT#GENERAL 0 -1,-1 161 | though it ' s been crowded most times i ' ve gone here , bark always delivers on their food . 14,15 RESTAURANT#MISCELLANEOUS 1 -1,-1 19,20 FOOD#QUALITY 2 -1,-1 162 | but nonetheless - - great spot , great food . 5,6 RESTAURANT#GENERAL 2 4,5 8,9 FOOD#QUALITY 2 7,8 163 | the food and service were fine , however the mai ##tre - d was incredibly un ##we ##lco ##ming and arrogant . 1,2 FOOD#QUALITY 2 5,6 3,4 SERVICE#GENERAL 2 5,6 9,13 SERVICE#GENERAL 0 15,19 9,13 SERVICE#GENERAL 0 20,21 164 | a word to the wise : you ca n ' t din ##e here and disturb the mai ##tre - d ' s sense of ` ` table turnover ' ' , as w ##ha ##cked as it is , or else . 17,21 SERVICE#GENERAL 0 -1,-1 165 | i had the lamb special which was perfect . 3,5 FOOD#QUALITY 2 7,8 166 | do n ' t go to this place ! 7,8 RESTAURANT#GENERAL 0 -1,-1 167 | when the main course finally arrived ( another 45 ##mins ) half of our order was missing . -1,-1 SERVICE#GENERAL 0 -1,-1 168 | when we threatened to leave , we were offered a me ##ager discount even though half the order was missing . -1,-1 SERVICE#GENERAL 0 -1,-1 169 | the bread was stale , the salad was over ##pr ##ice ##d and empty . 1,2 FOOD#QUALITY 0 3,4 6,7 FOOD#PRICES 0 8,12 6,7 FOOD#STYLE_OPTIONS 0 13,14 170 | shame on this place for the horrible rude staff and non - existent customer service . 8,9 SERVICE#GENERAL 0 6,7 8,9 SERVICE#GENERAL 0 7,8 13,15 SERVICE#GENERAL 0 10,13 171 | the food is good . 1,2 FOOD#QUALITY 2 3,4 172 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 4 | 5 | # Aspect-Category-Opinion-Sentiment (ACOS) Quadruple Extraction 6 | 7 | This repo contains the data sets and source code of our paper: 8 | 9 | Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions [[ACL 2021]](https://aclanthology.org/2021.acl-long.29.pdf). 10 | - We introduce a new ABSA task, named Aspect-Category-Opinion-Sentiment Quadruple (ACOS) Extraction, to extract fine-grained ABSA Quadruples from product reviews; 11 | - We construct two new datasets for the task, with ACOS quadruple annotations, and benchmark the task with four baseline systems; 12 | - Our task and datasets provide a good support for discovering implicit opinion targets and implicit opinion expressions in product reviews. 13 | 14 | 15 | ## Task 16 | The Aspect-Category-Opinion-Sentiment (ACOS) Quadruple Extraction aims to extract all aspect-category-opinion-sentiment quadruples, i.e., (aspect expression, aspect category, opinion expression, sentiment polarity), in a review sentence including implicit aspect and implicit opinion. 17 | 18 |

19 | 20 |

21 | 22 | 23 | 24 | ## Datasets 25 | Two new datasets, Restaurant-ACOS and Laptop-ACOS, are constructed for the ACOS Quadruple Extraction task: 26 | - Restaurant-ACOS is an extension of the existing SemEval Restaurant dataset, based on which we add the annotation of implicit aspects, implicit opinions, and the quadruples; 27 | - Laptop-ACOS is a brand new one collected from the Amazon Laptop domain. It has twice size of the SemEval Loptop dataset, and is annotated with quadruples containing all explicit/implicit aspects and opinions. 28 | 29 | The following table shows the comparison between our two ACOS Quadruple datasets and existing representative ABSA datasets. 30 | 31 |

32 | 33 |

34 | 35 | 36 | ## Methods 37 | We benchmark the ACOS Quadruple Extraction task with four baseline systems: 38 | - Double-Propagation-ACOS 39 | - JET-ACOS 40 | - TAS-BERT-ACOS 41 | - Extract-Classify-ACOS 42 | 43 | We provided the source code of Extract-Classify-ACOS. The source code of the other three methods will be provided soon. 44 | 45 | Overview of our Extract-Classify-ACOS method. The first step performs aspect-opinion co-extraction, and the second step predicts category-sentiment given the aspect-opinion pairs. 46 | 47 |

48 | 49 |

50 | 51 | 52 | ## Results 53 | The ACOS quadruple extraction performance of four different systems on the two datasets: 54 | 55 |

56 | 57 |

58 | 59 | We further investigate the ability of different systems in addressing the implicit aspects/opinion problem: 60 | 61 |

62 | 63 |

64 | 65 | ## Citation 66 | If you use the data and code in your research, please cite our paper as follows: 67 | ``` 68 | @inproceedings{cai2021aspect, 69 | title={Aspect-Category-Opinion-Sentiment Quadruple Extraction with Implicit Aspects and Opinions}, 70 | author={Cai, Hongjie and Xia, Rui and Yu, Jianfei}, 71 | booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)}, 72 | pages={340--350}, 73 | year={2021} 74 | } 75 | ``` 76 | -------------------------------------------------------------------------------- /data/Readme.md: -------------------------------------------------------------------------------- 1 | Each line consists of review text and its quadruples. Each quadruple is formalized as 'Aspect Category Sentiment Opinion'. The 0, 1, 2 in the Sentiment category represents negative, neutral, and positive, respectively. 2 | -------------------------------------------------------------------------------- /data/Restaurant-ACOS/rest16_quad_dev.tsv: -------------------------------------------------------------------------------- 1 | ca n ' t wait wait for my next visit . -1,-1 RESTAURANT#GENERAL 2 -1,-1 2 | their sake list was extensive , but we were looking for purple haze , which was n ' t listed but made for us upon request ! 1,3 DRINKS#STYLE_OPTIONS 2 4,5 -1,-1 SERVICE#GENERAL 2 -1,-1 3 | the spicy tuna roll was unusually good and the rock shrimp tempura was awesome , great appetizer to share ! 1,4 FOOD#QUALITY 2 6,7 9,12 FOOD#QUALITY 2 13,14 4 | we love th pink pony . 3,5 RESTAURANT#GENERAL 2 1,2 5 | this place has got to be the best japanese restaurant in the new york area . 1,2 RESTAURANT#GENERAL 2 7,8 6 | i tend to judge a sushi restaurant by its sea urchin , which was heavenly at sushi rose . 9,11 FOOD#QUALITY 2 14,15 7 | the prix fixe menu is worth every penny and you get more than enough ( both in quantity and quality ) . 1,4 FOOD#QUALITY 2 5,6 1,4 FOOD#STYLE_OPTIONS 2 5,6 1,4 FOOD#PRICES 2 5,6 8 | the food here is rather good , but only if you like to wait for it . 1,2 FOOD#QUALITY 2 5,6 -1,-1 SERVICE#GENERAL 0 -1,-1 9 | also , specify if you like your food spicy - its rather bland if you do n ' t . 7,8 FOOD#QUALITY 0 12,13 10 | the ambience is pretty and nice for conversation , so a casual lunch here would probably be best . 1,2 AMBIENCE#GENERAL 2 3,4 1,2 AMBIENCE#GENERAL 2 5,6 -1,-1 RESTAURANT#MISCELLANEOUS 2 17,18 11 | it was horrible . -1,-1 RESTAURANT#GENERAL 0 2,3 12 | have been dozens of times and never failed to enjoy the experience . -1,-1 RESTAURANT#GENERAL 2 9,10 13 | make sure you try this place as often as you can . 5,6 RESTAURANT#GENERAL 2 7,8 14 | i had a huge group for my birthday and we were well taken care of . -1,-1 SERVICE#GENERAL 2 11,12 15 | get the tuna of gari . 2,5 FOOD#QUALITY 2 -1,-1 16 | make sure you have the spicy scallop roll . . . 5,8 FOOD#QUALITY 2 -1,-1 17 | raga ' s is a romantic , cozy restaurant . 0,3 AMBIENCE#GENERAL 2 5,6 0,3 AMBIENCE#GENERAL 2 7,8 18 | i had a great time at jekyll and hyde ! 6,9 RESTAURANT#GENERAL 2 3,4 19 | i am bringing my whole family back next time . -1,-1 RESTAURANT#MISCELLANEOUS 2 -1,-1 20 | fine dining restaurant quality . 1,2 FOOD#QUALITY 2 0,1 21 | we will return many times for this oasis in mid - town . -1,-1 RESTAURANT#GENERAL 2 -1,-1 22 | the food options rule . 1,2 FOOD#STYLE_OPTIONS 2 -1,-1 23 | my husband and i thougt it would be great to go to the jekyll and hyde pub for our anniversary , and to our surprise it was fantastic . 13,17 RESTAURANT#GENERAL 2 8,9 13,17 RESTAURANT#GENERAL 2 27,28 24 | please take my advice , go and try this place . 9,10 RESTAURANT#GENERAL 2 -1,-1 25 | they were served warm and had a soft fluffy interior . -1,-1 FOOD#QUALITY 2 3,4 -1,-1 FOOD#QUALITY 2 7,8 26 | but they do . -1,-1 SERVICE#GENERAL 2 -1,-1 27 | fresh restaurant was amazing . . . . . . . . food was delicious and of course fresh . 1,2 RESTAURANT#GENERAL 2 0,1 1,2 RESTAURANT#GENERAL 2 3,4 12,13 FOOD#QUALITY 2 14,15 12,13 FOOD#QUALITY 2 18,19 28 | hats off to the chef . 4,5 FOOD#QUALITY 2 0,2 29 | this is some really good , inexpensive sushi . 7,8 FOOD#QUALITY 2 4,5 7,8 FOOD#PRICES 2 6,7 30 | this place is always very crowded and popular . 1,2 RESTAURANT#MISCELLANEOUS 2 5,6 1,2 RESTAURANT#MISCELLANEOUS 2 7,8 31 | and evaluated on those terms pastis is simply wonderful . 5,6 RESTAURANT#GENERAL 2 8,9 32 | i ' m still mad that i had to pay for lousy food . 12,13 FOOD#QUALITY 0 11,12 33 | the hanger steak was like rubber and the tuna was flavorless not to mention it tasted like it had just been thawed . 1,3 FOOD#QUALITY 0 5,6 8,9 FOOD#QUALITY 0 10,11 34 | big thumbs up ! -1,-1 RESTAURANT#GENERAL 2 1,3 35 | the pizza and wine were excellent - the service too - - but what really made this place was the backyard dining area . 1,2 FOOD#QUALITY 2 5,6 3,4 DRINKS#QUALITY 2 5,6 8,9 SERVICE#GENERAL 2 5,6 20,23 AMBIENCE#GENERAL 2 -1,-1 36 | it is one the nicest outdoor restaurants i have ever seen in ny - - i am from italy and this place rivals the ones in my country . 5,7 AMBIENCE#GENERAL 2 4,5 37 | it is simply amazing . -1,-1 FOOD#QUALITY 2 3,4 38 | beautiful experience . -1,-1 RESTAURANT#GENERAL 2 0,1 39 | the menu is very limited - i think we counted 4 or 5 entrees . 1,2 FOOD#STYLE_OPTIONS 0 4,5 40 | we will go back every time we are in the city . -1,-1 RESTAURANT#GENERAL 2 -1,-1 41 | the characters really make for an enjoyable experience . 1,2 AMBIENCE#GENERAL 2 6,7 42 | however , i think jeckll and hydes t is one of those places that is fun to do once . 4,7 RESTAURANT#GENERAL 2 15,16 43 | we had a good time . -1,-1 RESTAURANT#GENERAL 2 3,4 44 | a little overpriced but worth it once you take a bite . -1,-1 FOOD#PRICES 0 2,3 -1,-1 FOOD#QUALITY 2 4,5 45 | i have lived in japan for 7 years and the taste of the food and the feel of the restaurant is like being back in japan . 13,14 FOOD#QUALITY 2 -1,-1 16,17 AMBIENCE#GENERAL 2 -1,-1 46 | check out the secret back room . 4,6 AMBIENCE#GENERAL 2 3,4 47 | thank you emilio . 2,3 RESTAURANT#GENERAL 2 -1,-1 48 | the food was authentic . 1,2 FOOD#QUALITY 2 3,4 49 | fantastic ! -1,-1 RESTAURANT#GENERAL 2 0,1 50 | but the staff was so horrible to us . 2,3 SERVICE#GENERAL 0 5,6 51 | decor is nice though service can be spotty . 0,1 AMBIENCE#GENERAL 2 2,3 4,5 SERVICE#GENERAL 0 7,8 52 | just awsome . -1,-1 FOOD#QUALITY 2 1,2 53 | i had their eggs benedict for brunch , which were the worst in my entire life , i tried removing the hollondaise sauce completely that was how failed it was . 3,5 FOOD#QUALITY 0 11,12 54 | with the theater 2 blocks away we had a delicious meal in a beautiful room . 10,11 FOOD#QUALITY 2 9,10 14,15 AMBIENCE#GENERAL 2 13,14 -1,-1 LOCATION#GENERAL 2 -1,-1 55 | the service was attentive . 1,2 SERVICE#GENERAL 2 3,4 56 | patroon features a nice cigar bar and has great staff . 4,6 AMBIENCE#GENERAL 2 3,4 9,10 SERVICE#GENERAL 2 8,9 57 | lloovve this place . 2,3 RESTAURANT#GENERAL 2 0,1 58 | the menu is limited but almost all of the dishes are excellent . 1,2 FOOD#STYLE_OPTIONS 0 3,4 9,10 FOOD#QUALITY 2 11,12 59 | wine list is extensive without being over - priced . 0,2 DRINKS#STYLE_OPTIONS 2 3,4 0,2 DRINKS#PRICES 2 4,9 60 | the food was very good , a great deal , and the place its self was great . 1,2 FOOD#QUALITY 2 4,5 1,2 FOOD#PRICES 2 6,7 12,13 AMBIENCE#GENERAL 2 16,17 61 | the wait staff is very freindly , they make it feel like you ' re eating in a freindly little european town . 1,3 SERVICE#GENERAL 2 5,6 62 | the whole set up is truly unprofessional and i wish cafe noir would get some good staff , because despite the current one this is a great place . 16,17 SERVICE#GENERAL 0 6,17 10,12 RESTAURANT#GENERAL 2 26,27 63 | you should pass on the calamari . 5,6 FOOD#QUALITY 0 -1,-1 64 | when asked about how a certain dish was prepared in comparison to a similar at other thai restaurants , he replied this is not mcdonald ' s , every place makes things differently -1,-1 SERVICE#GENERAL 0 -1,-1 65 | everything was wonderful ; food , drinks , staff , mileau . 4,5 FOOD#QUALITY 2 2,3 6,7 DRINKS#QUALITY 2 2,3 8,9 SERVICE#GENERAL 2 2,3 10,11 AMBIENCE#GENERAL 2 2,3 -1,-1 RESTAURANT#GENERAL 2 2,3 66 | i would highly recommend this place ! 5,6 RESTAURANT#GENERAL 2 3,4 67 | fresh ingredients and everything is made to order . 1,2 FOOD#QUALITY 2 0,1 -1,-1 FOOD#QUALITY 2 -1,-1 68 | friendly staff that actually lets you enjoy your meal and the company you ' re with . 1,2 SERVICE#GENERAL 2 0,1 69 | i will definitely be going back . -1,-1 RESTAURANT#GENERAL 2 -1,-1 70 | a great choice at any cost and a great deal . -1,-1 RESTAURANT#GENERAL 2 8,9 -1,-1 RESTAURANT#PRICES 2 1,2 71 | thalia is a beautiful restaurant with beautiful people serving you , but the food does n ' t quite match up . 7,8 SERVICE#GENERAL 2 6,7 13,14 FOOD#QUALITY 0 14,21 0,1 AMBIENCE#GENERAL 2 3,4 72 | i ordered the smoked salmon and roe appetizer and it was off flavor . 3,8 FOOD#QUALITY 0 11,13 73 | the food is good , especially their more basic dishes , and the drinks are delicious . 1,2 FOOD#QUALITY 2 3,4 8,10 FOOD#QUALITY 2 3,4 13,14 DRINKS#QUALITY 2 15,16 74 | the big complaint : no toasting available . -1,-1 SERVICE#GENERAL 0 2,3 75 | i ' ve been many time and have never been disappointed . -1,-1 RESTAURANT#GENERAL 2 8,11 76 | the turkey burgers are scary ! 1,3 FOOD#QUALITY 0 4,5 77 | for authentic thai food , look no further than toons . 2,4 FOOD#QUALITY 2 1,2 78 | try the pad thai , or sample anything on the appetizer menu . . . they ' re all delicious . 2,4 FOOD#QUALITY 2 19,20 2,4 FOOD#QUALITY 2 19,20 79 | service was good and food is wonderful . 0,1 SERVICE#GENERAL 2 2,3 4,5 FOOD#QUALITY 2 6,7 80 | it is definitely a good spot for snacks and chat . 5,6 RESTAURANT#GENERAL 2 4,5 81 | do not get the go go hamburgers , no matter what the reviews say . 4,7 FOOD#QUALITY 0 -1,-1 82 | steamed fresh so brought hot hot hot to your table . -1,-1 FOOD#QUALITY 2 1,2 83 | small servings for main entree , i had salmon ( wasnt impressed ) girlfriend had chicken , it was good . 8,9 FOOD#QUALITY 0 10,12 15,16 FOOD#QUALITY 2 19,20 1,5 FOOD#GENERAL 0 0,1 84 | cute and decorative . -1,-1 AMBIENCE#GENERAL 2 0,1 -1,-1 AMBIENCE#GENERAL 2 2,3 85 | excellent spot for holiday get togethers with co - workers or friends that you have n ' t seen in a while . 1,2 RESTAURANT#MISCELLANEOUS 2 0,1 86 | what a great place ! 3,4 RESTAURANT#GENERAL 2 2,3 87 | not the typical nyc gimmick theme restaurant . 6,7 AMBIENCE#GENERAL 2 0,3 88 | service was very prompt but slightly rushed . 0,1 SERVICE#GENERAL 2 3,4 0,1 SERVICE#GENERAL 2 6,7 89 | i really liked this place . 4,5 RESTAURANT#GENERAL 2 2,3 90 | everything i had was good , and i ' m a eater . -1,-1 FOOD#QUALITY 2 4,5 91 | i also recommend the rice dishes or the different varieties of congee ( rice porridge ) . 4,6 FOOD#QUALITY 2 2,3 11,16 FOOD#QUALITY 2 2,3 92 | i recently tried suan and i thought that it was great . 3,4 RESTAURANT#GENERAL 2 10,11 93 | have been several times and it never dissapoints . -1,-1 RESTAURANT#GENERAL 2 6,8 94 | this place is a great bargain . 1,2 RESTAURANT#PRICES 2 4,6 95 | people are always friendly . 0,1 SERVICE#GENERAL 2 3,4 96 | the best pad thai i ' ve ever had . 2,4 FOOD#QUALITY 2 1,2 97 | would n ' t recomend it for dinner ! -1,-1 RESTAURANT#GENERAL 0 1,5 98 | ask for usha , the nicest bartender in manhattan . 2,3 SERVICE#GENERAL 2 5,6 99 | the food ' s as good as ever . 1,2 FOOD#QUALITY 2 5,6 100 | best drumsticks over rice and sour spicy soup in town ! 1,4 FOOD#QUALITY 2 0,1 5,8 FOOD#QUALITY 2 0,1 101 | for those that go once and do n ' t enjoy it , all i can say is that they just do n ' t get it . -1,-1 RESTAURANT#MISCELLANEOUS 2 -1,-1 102 | not worth it . -1,-1 FOOD#PRICES 0 0,2 103 | this dish is my favorite and i always get it when i go there and never get tired of it . 1,2 FOOD#QUALITY 2 4,5 104 | big wong is a great place to eat and fill your stomach . 0,2 RESTAURANT#GENERAL 2 4,5 105 | the food is okay and the prices here are mediocre . 1,2 FOOD#QUALITY 1 3,4 -1,-1 RESTAURANT#PRICES 1 9,10 106 | me and my girls will definitely go back . -1,-1 RESTAURANT#GENERAL 2 -1,-1 107 | the food is great . 1,2 FOOD#QUALITY 2 3,4 108 | la rosa waltzes in , and i think they are doing it the best . 0,2 FOOD#QUALITY 2 13,14 109 | interesting selection , good wines , service fine , fun decor . 4,5 DRINKS#QUALITY 2 3,4 6,7 SERVICE#GENERAL 2 7,8 10,11 AMBIENCE#GENERAL 2 9,10 1,2 FOOD#STYLE_OPTIONS 2 0,1 110 | the food here was mediocre at best . 1,2 FOOD#QUALITY 0 4,5 111 | the cypriot restaurant has a lot going for it . 1,3 RESTAURANT#GENERAL 2 -1,-1 112 | will comeback for sure , wish they have it here in la . . -1,-1 RESTAURANT#GENERAL 2 -1,-1 113 | the space kind of feels like an alice in wonderland setting , without it trying to be that . 1,2 AMBIENCE#GENERAL 0 -1,-1 114 | i paid just about $ 60 for a good meal , though : ) 9,10 FOOD#QUALITY 2 8,9 9,10 FOOD#PRICES 2 -1,-1 115 | love it . -1,-1 RESTAURANT#GENERAL 2 0,1 116 | the place is a bit hidden away , but once you get there , it ' s all worth it . 1,2 LOCATION#GENERAL 2 5,7 1,2 LOCATION#GENERAL 2 18,19 117 | i love their chicken pasta cant remember the name but is sooo good 3,5 FOOD#QUALITY 2 1,2 3,5 FOOD#QUALITY 2 12,13 118 | way below average -1,-1 RESTAURANT#GENERAL 0 1,3 119 | i think the pizza is so overrated and was under cooked . 3,4 FOOD#QUALITY 0 6,7 3,4 FOOD#QUALITY 0 9,11 120 | i love this place 3,4 RESTAURANT#GENERAL 2 1,2 121 | the service was quick and friendly . 1,2 SERVICE#GENERAL 2 3,4 1,2 SERVICE#GENERAL 2 5,6 122 | i thought the restaurant was nice and clean . 3,4 RESTAURANT#GENERAL 2 5,6 3,4 AMBIENCE#GENERAL 2 7,8 123 | chicken teriyaki had tomato or pimentos on top ? ? 0,2 FOOD#STYLE_OPTIONS 0 -1,-1 124 | the waitress was not attentive at all . 1,2 SERVICE#GENERAL 0 3,5 125 | just go to yamato and order the red dragon roll . 3,4 RESTAURANT#GENERAL 2 -1,-1 7,10 FOOD#QUALITY 2 -1,-1 126 | favorite sushi in nyc 1,2 FOOD#QUALITY 2 0,1 127 | the rolls are creative and i have yet to find another sushi place that serves up more inventive yet delicious japanese food . 1,2 FOOD#STYLE_OPTIONS 2 3,4 20,22 FOOD#STYLE_OPTIONS 2 17,18 20,22 FOOD#QUALITY 2 19,20 128 | my quesadilla tasted like it had been made by a three - year old with no sense of proportion or flavor . 1,2 FOOD#QUALITY 0 -1,-1 1,2 FOOD#STYLE_OPTIONS 0 -1,-1 129 | save your money and your time and go somewhere else . -1,-1 RESTAURANT#GENERAL 0 -1,-1 130 | the spinach is fresh , definately not frozen . . . 1,2 FOOD#QUALITY 2 3,4 131 | decor needs to be upgraded but the food is amazing ! 0,1 AMBIENCE#GENERAL 0 4,5 7,8 FOOD#QUALITY 2 9,10 132 | my daughter ' s wedding reception at water ' s edge received the highest compliments from our guests . 7,11 RESTAURANT#MISCELLANEOUS 2 13,15 133 | the high prices you ' re going to pay is for the view not for the food . 12,13 LOCATION#GENERAL 1 1,3 16,17 FOOD#QUALITY 0 1,3 -1,-1 RESTAURANT#PRICES 0 1,3 134 | not what i would expect for the price and prestige of this location . 12,13 RESTAURANT#PRICES 1 -1,-1 12,13 RESTAURANT#MISCELLANEOUS 1 -1,-1 -1,-1 SERVICE#GENERAL 0 -1,-1 135 | the food was ok and fair nothing to go crazy . 1,2 FOOD#QUALITY 1 3,4 1,2 FOOD#QUALITY 1 5,6 136 | impressed . . . -1,-1 RESTAURANT#GENERAL 2 0,1 137 | subtle food and service 1,2 FOOD#QUALITY 2 0,1 3,4 SERVICE#GENERAL 2 0,1 138 | food took some time to prepare , all worth waiting for . 0,1 FOOD#QUALITY 2 8,9 -1,-1 SERVICE#GENERAL 1 -1,-1 139 | great find in the west village ! -1,-1 RESTAURANT#GENERAL 2 0,1 140 | when the bill came , nothing was comped , so i told the manager very politely that we were willing to pay for the wine , but i did n ' t think i should have to pay for food with a maggot in it . -1,-1 SERVICE#GENERAL 0 -1,-1 141 | amazing food . 1,2 FOOD#QUALITY 2 0,1 142 | rather than preparing vegetarian dish , the chef presented me with a plate of steamed vegetables ( minus sauce , seasoning , or any form or aesthetic presentation ) . 3,5 FOOD#QUALITY 0 -1,-1 3,5 FOOD#STYLE_OPTIONS 0 -1,-1 7,8 SERVICE#GENERAL 0 -1,-1 143 | the only thing that strikes you is the decor ? ( not very pleasant ) . 8,9 AMBIENCE#GENERAL 0 11,14 144 | the martinis are amazing and very fairly priced . 1,2 DRINKS#QUALITY 2 3,4 1,2 DRINKS#PRICES 2 6,8 145 | i wanted to go there to see if it was worth it and sadly , curiousity got the best of me and i paid dearly for it . -1,-1 RESTAURANT#GENERAL 0 -1,-1 -1,-1 RESTAURANT#PRICES 0 -1,-1 146 | the environment is very upscale and you will see a lot of rich guys with trophy wives or just highly paid escorts . 1,2 AMBIENCE#GENERAL 1 4,5 -1,-1 RESTAURANT#MISCELLANEOUS 1 4,5 147 | however , our $ 14 drinks were were horrible ! 5,6 DRINKS#QUALITY 0 8,9 5,6 DRINKS#PRICES 0 8,9 148 | once we finally got a table , despite indicating we wanted an alla carte menu we were pushed into a table that was only price fixed ! -1,-1 SERVICE#GENERAL 0 -1,-1 149 | i do n ' t appreciate places or people that try to drive up the bill without the patron ' s knowledge so that was a huge turnoff ( more than the price ) . -1,-1 SERVICE#GENERAL 0 -1,-1 -1,-1 RESTAURANT#PRICES 0 -1,-1 150 | eat at your own risk . -1,-1 FOOD#QUALITY 0 -1,-1 151 | the service was spectacular as the waiter knew everything about the menu and his recommendations were amazing ! 1,2 SERVICE#GENERAL 2 3,4 6,7 SERVICE#GENERAL 2 16,17 152 | the sake ’ s complimented the courses very well and is successfully easing me into the sake world . 1,4 DRINKS#QUALITY 2 8,9 153 | maybe it was the great company ( i had friends visiting from philly – yes , it was not a date this time ) or the super reasonable price point , but i just can ’ t say enough good things about this brasserie . 43,44 RESTAURANT#GENERAL 2 39,40 43,44 RESTAURANT#PRICES 2 39,40 154 | i tried a couple other dishes but was n ' t too impressed . 5,6 FOOD#QUALITY 1 8,13 155 | the family seafood entree was very good . 1,4 FOOD#QUALITY 2 6,7 156 | the food they serve is not comforting , not appetizing and uncooked . 1,2 FOOD#QUALITY 0 5,7 1,2 FOOD#QUALITY 0 8,10 1,2 FOOD#QUALITY 0 11,12 157 | supercilious scorn is in . -1,-1 SERVICE#GENERAL 0 0,1 158 | single worst restaurant in manhattan 2,3 RESTAURANT#GENERAL 0 1,2 159 | it is quite a spectacular scene i ' ll give them that . 5,6 AMBIENCE#GENERAL 2 4,5 160 | how this place survives the competitive west village market in this economy , or any other for that matter , is beyond me . 2,3 RESTAURANT#GENERAL 0 -1,-1 161 | though it ' s been crowded most times i ' ve gone here , bark always delivers on their food . 14,15 RESTAURANT#MISCELLANEOUS 1 -1,-1 19,20 FOOD#QUALITY 2 -1,-1 162 | but nonetheless - - great spot , great food . 5,6 RESTAURANT#GENERAL 2 4,5 8,9 FOOD#QUALITY 2 7,8 163 | the food and service were fine , however the maitre - d was incredibly unwelcoming and arrogant . 1,2 FOOD#QUALITY 2 5,6 3,4 SERVICE#GENERAL 2 5,6 9,12 SERVICE#GENERAL 0 14,15 9,12 SERVICE#GENERAL 0 16,17 164 | a word to the wise : you ca n ' t dine here and disturb the maitre - d ' s sense of ` ` table turnover ' ' , as whacked as it is , or else . 16,19 SERVICE#GENERAL 0 -1,-1 165 | i had the lamb special which was perfect . 3,5 FOOD#QUALITY 2 7,8 166 | do n ' t go to this place ! 7,8 RESTAURANT#GENERAL 0 -1,-1 167 | when the main course finally arrived ( another 45mins ) half of our order was missing . -1,-1 SERVICE#GENERAL 0 -1,-1 168 | when we threatened to leave , we were offered a meager discount even though half the order was missing . -1,-1 SERVICE#GENERAL 0 -1,-1 169 | the bread was stale , the salad was overpriced and empty . 1,2 FOOD#QUALITY 0 3,4 6,7 FOOD#PRICES 0 8,9 6,7 FOOD#STYLE_OPTIONS 0 10,11 170 | shame on this place for the horrible rude staff and non - existent customer service . 8,9 SERVICE#GENERAL 0 6,7 8,9 SERVICE#GENERAL 0 7,8 13,15 SERVICE#GENERAL 0 10,13 171 | the food is good . 1,2 FOOD#QUALITY 2 3,4 172 | -------------------------------------------------------------------------------- /img/figure1.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/figure1.PNG -------------------------------------------------------------------------------- /img/main_results.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/main_results.PNG -------------------------------------------------------------------------------- /img/method.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/method.PNG -------------------------------------------------------------------------------- /img/method.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/method.jpg -------------------------------------------------------------------------------- /img/separate_results.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/separate_results.PNG -------------------------------------------------------------------------------- /img/stat.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/NUSTM/ACOS/45d179a3dcc6a3dedd848d81b16f2552454805fe/img/stat.PNG --------------------------------------------------------------------------------