├── .gitignore
├── README.md
├── emnlp2021
├── data
│ ├── dev
│ │ ├── dev_normal
│ │ └── dev_toxic
│ ├── manual_markup
│ │ ├── emnlp_eval_long.tsv
│ │ └── emnlp_eval_mean.tsv
│ ├── test
│ │ ├── model_outputs
│ │ │ ├── condbert.txt
│ │ │ ├── paragedi_with_default_paraphraser.txt
│ │ │ └── paragedi_with_mined_paraphraser.txt
│ │ ├── test_10k_normal
│ │ └── test_10k_toxic
│ ├── train
│ │ ├── train_normal
│ │ └── train_toxic
│ └── yelp_seniment_data
│ │ ├── evaluate.sh
│ │ ├── human.0
│ │ ├── human.1
│ │ ├── mask_infill.0
│ │ ├── mask_infill.1
│ │ ├── other_results
│ │ ├── RetrieveOnly_Li.0
│ │ ├── RetrieveOnly_Li.1
│ │ ├── TemplateBase_Li.0
│ │ ├── TemplateBase_Li.1
│ │ ├── UnsuperMT_Zhang.0
│ │ ├── UnsuperMT_Zhang.1
│ │ ├── dual_rl.0
│ │ └── dual_rl.1
│ │ ├── reference.0
│ │ ├── reference.1
│ │ ├── results
│ │ ├── condbert_tox1.5_w2_sim20.0.txt
│ │ ├── condbert_tox1.5_w2_sim20.1.txt
│ │ ├── condbert_tox3_w2_sim20.0.txt
│ │ ├── condbert_tox3_w2_sim20.1.txt
│ │ ├── gedi_coef4_batch10_rerank.0.txt
│ │ └── gedi_coef4_batch10_rerank.1.txt
│ │ ├── sentiment.dev.0
│ │ ├── sentiment.dev.1
│ │ ├── sentiment.test.0
│ │ ├── sentiment.test.1
│ │ ├── sentiment.train.0
│ │ ├── sentiment.train.1
│ │ ├── sst_0_07.0
│ │ ├── sst_0_07.1
│ │ ├── sst_75_07.0
│ │ └── sst_75_07.1
├── metric
│ ├── README.md
│ ├── metric.py
│ └── wieting_similarity
│ │ ├── sim_models.py
│ │ ├── sim_utils.py
│ │ ├── similarity_evaluator.py
│ │ ├── spm.py
│ │ └── test_sim.py
├── prepare.sh
└── style_transfer
│ ├── condBERT
│ ├── choosers.py
│ ├── condbert.py
│ ├── condbert_compile_vocab.ipynb
│ ├── condbert_inference.ipynb
│ ├── multiword
│ │ ├── masked_token_predictor_bert.py
│ │ └── predict_utils.py
│ └── vocab
│ │ ├── negative-words.txt
│ │ ├── positive-words.txt
│ │ ├── token_toxicities.txt
│ │ ├── toxic_words.txt
│ │ └── word2coef.pkl
│ ├── mining_parallel_corpus
│ ├── finetune_t5_on_mined.ipynb
│ └── paranmt_mining.ipynb
│ └── paraGeDi
│ ├── gedi_adapter.py
│ ├── gedi_inference.ipynb
│ ├── gedi_training
│ ├── gedi_training.py
│ ├── modeling_gpt2.py
│ └── modeling_utils.py
│ └── text_processing.py
└── requirements.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | emnlp2021/models
2 | emnlp2021/style_transfer/condBERT/vocabularies
3 | .idea
4 | __pycache__
5 | .ipynb_checkpoints
6 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Detoxification
2 | Detoxification is an automatic transformation of a text such that:
3 | - text becomes non-toxic
4 | - the content of the text stays the same.
5 |
6 | This repository contains the code and data for the paper "[Text Detoxification using Large Pre-trained Neural Models](https://aclanthology.org/2021.emnlp-main.629/)" ([video](https://youtu.be/knhIJ_e9Uyc)).
7 |
8 | We suggest two models:
9 | - **CondBERT** — a BERT-based model which identifies toxic words in a text and replaces them with neutral synonyms
10 | - **ParaGeDi** — a paraphraser-based model which re-generates a text using additional style-informed LMs
11 |
12 | **If you have any questions about the models, the code, or the data, please do not hesitate to communicate via [GitHub issues](https://github.com/skoltech-nlp/detox/issues)!**.
13 |
14 | If you want the fastest way to run the inference of these models, you can run
15 | [this Colab notebook](https://colab.research.google.com/drive/1HEUgKO100rSuDcv80prD3dRZSjKztTqk?usp=sharing)
16 | that puts together some of the code from this repository.
17 |
18 | ## CondBERT
19 |
20 | The notebooks for reproducing the training and inference of this model in the folder [condBERT](/emnlp2021/style_transfer/condBERT).
21 |
22 | ## ParaGeDi
23 |
24 | The notebooks and scripts for reproducing the training and inference of this model in the folder [paraGeDi](/emnlp2021/style_transfer/paraGeDi).
25 |
26 | ## Parallel detoxification corpus
27 |
28 | The notebooks for reproducing the data collection and training the model on it are in the folder [mining_parallel_corpus](/emnlp2021/style_transfer/mining_parallel_corpus).
29 |
30 | The original ParaNMT corpus (50M sentence pairs) can be downloaded from the authors page: https://www.cs.cmu.edu/~jwieting/.
31 | The filtered ParaNMT-detox corpus (500K sentence pairs) can be downloaded from [here](https://github.com/skoltech-nlp/detox/releases/download/emnlp2021/filtered_paranmt.zip).
32 |
33 | The paraphraser trained on this filtered corpus is available at https://huggingface.co/s-nlp/t5-paranmt-detox.
34 |
35 | ## Evaluation
36 |
37 | To evaluate your model, use the folder [metric](/emnlp2021/metric).
38 |
39 | First, download the models for content preservation and fluency with the script `prepare.sh`.
40 |
41 | Then run the script `metric.py`, as in the example below:
42 |
43 | ```
44 | python metric/metric.py --inputs data/test/test_1ok_toxic --preds data/test/model_outputs/condbert.txt
45 | ```
46 |
47 | ## Acknowledgements
48 | This research was conducted under the framework of the [Joint MTS-Skoltech laboratory](https://sites.skoltech.ru/nlp/joint-mts-skoltech-laboratory/).
49 | We are grateful to the reviewers for their helpful suggestions which substantially improved this work.
50 |
51 |
52 | ## Citation
53 |
54 | If you use our models or data, please cite the paper:
55 |
56 | ```
57 | @inproceedings{dale-etal-2021-text,
58 | title = "Text Detoxification using Large Pre-trained Neural Models",
59 | author = "Dale, David and
60 | Voronov, Anton and
61 | Dementieva, Daryna and
62 | Logacheva, Varvara and
63 | Kozlova, Olga and
64 | Semenov, Nikita and
65 | Panchenko, Alexander",
66 | booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
67 | month = nov,
68 | year = "2021",
69 | address = "Online and Punta Cana, Dominican Republic",
70 | publisher = "Association for Computational Linguistics",
71 | url = "https://aclanthology.org/2021.emnlp-main.629",
72 | pages = "7979--7996",
73 | }
74 | ```
75 |
--------------------------------------------------------------------------------
/emnlp2021/data/yelp_seniment_data/evaluate.sh:
--------------------------------------------------------------------------------
1 |
2 | export CUDA_VISIBLE_DEVICES=3
3 |
4 | cd /home/dale/dialogue-censor/metric
5 |
6 |
7 | python metric.py \
8 | --inputs ../data/yelp/sentiment.test.0 \
9 | --preds ../data/yelp/sentiment.test.0 \
10 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
11 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
12 | --task_name yelp --toxification
13 | python metric.py \
14 | --inputs ../data/yelp/sentiment.test.1 \
15 | --preds ../data/yelp/sentiment.test.1 \
16 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
17 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
18 | --task_name yelp
19 |
20 | python metric.py \
21 | --inputs ../data/yelp/sentiment.test.0 \
22 | --preds ../data/yelp/mask_infill.0 \
23 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
24 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
25 | --task_name yelp --toxification
26 | python metric.py \
27 | --inputs ../data/yelp/sentiment.test.1 \
28 | --preds ../data/yelp/mask_infill.1 \
29 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
30 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
31 | --task_name yelp
32 |
33 | python metric.py \
34 | --inputs ../data/yelp/sentiment.test.0 \
35 | --preds /home/dale/projects/DualRL/outputs/yelp/DualRL/test.0.tsf \
36 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
37 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
38 | --task_name yelp --toxification
39 | python metric.py \
40 | --inputs ../data/yelp/sentiment.test.1 \
41 | --preds /home/dale/projects/DualRL/outputs/yelp/DualRL/test.1.tsf \
42 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
43 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
44 | --task_name yelp
45 |
46 |
47 | python metric.py \
48 | --inputs ../data/yelp/sentiment.test.0 \
49 | --preds ../data/yelp/human.0 \
50 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
51 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
52 | --task_name yelp --toxification
53 | python metric.py \
54 | --inputs ../data/yelp/sentiment.test.1 \
55 | --preds ../data/yelp/human.1 \
56 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
57 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
58 | --task_name yelp
59 |
60 |
61 |
62 |
63 | python metric.py \
64 | --inputs ../data/yelp/sentiment.test.0 \
65 | --preds /home/dale/projects/DualRL/outputs/yelp/UnsuperMT_Zhang/test.0.tsf \
66 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
67 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
68 | --task_name yelp --toxification
69 | python metric.py \
70 | --inputs ../data/yelp/sentiment.test.1 \
71 | --preds /home/dale/projects/DualRL/outputs/yelp/UnsuperMT_Zhang/test.1.tsf \
72 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
73 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
74 | --task_name yelp
75 |
76 |
77 | python metric.py \
78 | --inputs ../data/yelp/sentiment.test.0 \
79 | --preds /home/dale/projects/DualRL/outputs/yelp/TemplateBase_Li/test.0.tsf \
80 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
81 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
82 | --task_name yelp --toxification
83 | python metric.py \
84 | --inputs ../data/yelp/sentiment.test.1 \
85 | --preds /home/dale/projects/DualRL/outputs/yelp/TemplateBase_Li/test.1.tsf \
86 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
87 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
88 | --task_name yelp
89 |
90 |
91 | python metric.py \
92 | --inputs ../data/yelp/sentiment.test.0 \
93 | --preds /home/dale/projects/DualRL/outputs/yelp/RetrieveOnly_Li/test.0.tsf \
94 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
95 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
96 | --task_name yelp --toxification
97 | python metric.py \
98 | --inputs ../data/yelp/sentiment.test.1 \
99 | --preds /home/dale/projects/DualRL/outputs/yelp/RetrieveOnly_Li/test.1.tsf \
100 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
101 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
102 | --task_name yelp
103 |
104 |
105 | python metric.py \
106 | --inputs ../data/yelp/sentiment.test.0 \
107 | --preds ../data/yelp/sst_75_07.0 \
108 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
109 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
110 | --task_name yelp --toxification
111 | python metric.py \
112 | --inputs ../data/yelp/sentiment.test.1 \
113 | --preds ../data/yelp/sst_75_07.1 \
114 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
115 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
116 | --task_name yelp
117 |
118 |
119 | python metric.py \
120 | --inputs ../data/yelp/sentiment.test.0 \
121 | --preds ../data/yelp/sst_0_07.0 \
122 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
123 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
124 | --task_name yelp --toxification
125 | python metric.py \
126 | --inputs ../data/yelp/sentiment.test.1 \
127 | --preds ../data/yelp/sst_0_07.1 \
128 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
129 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
130 | --task_name yelp
131 |
132 |
133 | python metric.py \
134 | --inputs ../data/yelp/sentiment.test.0 \
135 | --preds ../data/yelp/results/gedi_coef4_batch10_rerank.0.txt \
136 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
137 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
138 | --task_name yelp --toxification
139 | python metric.py \
140 | --inputs ../data/yelp/sentiment.test.1 \
141 | --preds ../data/yelp/results/gedi_coef4_batch10_rerank.1.txt \
142 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
143 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
144 | --task_name yelp
145 |
146 |
147 | python metric.py \
148 | --inputs ../data/yelp/sentiment.test.0 \
149 | --preds ../data/yelp/results/condbert_tox1.5_w2_sim20.0.txt \
150 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
151 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
152 | --task_name yelp --toxification
153 | python metric.py \
154 | --inputs ../data/yelp/sentiment.test.1 \
155 | --preds ../data/yelp/results/condbert_tox1.5_w2_sim20.1.txt \
156 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
157 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
158 | --task_name yelp
159 |
160 |
161 |
162 | python metric.py \
163 | --inputs ../data/yelp/sentiment.test.0 \
164 | --preds ../data/yelp/results/condbert_tox3_w2_sim20.0.txt \
165 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
166 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
167 | --task_name yelp --toxification
168 | python metric.py \
169 | --inputs ../data/yelp/sentiment.test.1 \
170 | --preds ../data/yelp/results/condbert_tox3_w2_sim20.1.txt \
171 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \
172 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \
173 | --task_name yelp
174 |
175 |
--------------------------------------------------------------------------------
/emnlp2021/data/yelp_seniment_data/human.0:
--------------------------------------------------------------------------------
1 | Ever since joes has changed hands it's gotten better and better.
2 | There is so much room in that part of the venue
3 | It didn't taste watered down at all.
4 | "She said she'd be back, and didn't disappear at all. "
5 | This pharmacy is really considerate.
6 | just left and put it on the bill .
7 | "it is n't perfect , but it is very good ."
8 | definitely not disappointed that i could use my birthday gift !
9 | "great new owner , i heard - but i do n't know the details ."
10 | but it probably doesn't suck too !
11 | the service was quick and responsive
12 | the price included miso soup and a great salad
13 | they asked how everything was
14 | they said we could sit at the table with no hesitation
15 | the store front was well attended
16 | these chips are okay but their salsa is really tasty
17 | the wine was above average and the food was even better
18 | staffed primarily by teenagers who understand customer service
19 | the burgers were cooked perfectly and the meat was juicy
20 | blue cheese dressing was above average
21 | the pad thai tasted like the best I have had in a while
22 | she said she was pleased to help
23 | "managment is top notch, the place looks great"
24 | there was a wide variety of meats and breads
25 | when I complained they addressed the issue
26 | She seemed happy to be there
27 | "besides the shape, they were moist and yummy"
28 | the associates program may still be an option
29 | The decor was nice
30 | we got coffee and we'll think about going back
31 | The sales people are great.
32 | Salsa is hot and good.
33 | I might consider asking for the card to be refunded.
34 | "I received a delivery order from them, and it clearly wasn't a waste of money."
35 | I would let her do further things for me in the future.
36 | they were very upfront and let me know everything
37 | the beans were in the burro and the rice was plentiful
38 | they were very fair and respected me since im so young
39 | we have sent so many people there and they all praised the restaurant
40 | "such nice customer service, they listen to anyones concerns and assist them with it"
41 | they don't need to work on their customer service and general attitude .
42 | i continued eating and later we went back to the beer .
43 | my `` hot '' sub was hot and the meat was tender .
44 | when i first came to phx ... yes this sounded american to me .
45 | you wo n't find a better selection in arizona .
46 | I asked for thai hot and received plenty of spice.
47 | I felt fine the night after.
48 | "I left full and satisfied, will visit again."
49 | I just left this car wash and was very satisfied!
50 | "Had one great meal and said, ""we'll try again."""
51 | "The food is good, and the service is one of the best I've ever encountered. "
52 | "I'm looking for wedding dresses, and this one was my first stop!"
53 | He told me kindly how to fix it.
54 | Go to this place they are excellent!
55 | She was happy that we were asking for the prices.
56 | one of several saving graces was the black beans
57 | gained a long time customer !
58 | "other than that , food here is pretty great ."
59 | i was also curious about another product and he straight up told me all about it .
60 | so far i 'm really impressed .
61 | My groupon was available for many windows.
62 | Safeway has earned my business.
63 | "The food was great, I would go there again."
64 | I will be using the place again.
65 | This is a great venue.
66 | You'll be wanting more after the first bite !
67 | Go here the food is very good !
68 | Everything is always in stock here.
69 | The owner was such a friendly person.
70 | This place is a great place to live !
71 | after the hobby shop i stopped in for a soda and enjoyed it
72 | my dish had the perfect amount of salt and the garlic crab was great
73 | the blended up chicken was made just the way i wanted it
74 | service was quick and swift
75 | one of the best happy hours anywhere
76 | the rooms were spacious and food was very well cooked
77 | the excecutive chef was nice and said good morning to us very often
78 | such a great indian restaurant
79 | the brown sauce was very authentic
80 | i will come here very often as they give us ample time to play
81 | was very happy with what arrived.
82 | "his offer of a ""free dessert"" was even more appreciated."
83 | i was so full i could not way for the rest of the day.
84 | "the salads are great, really something special."
85 | overall it was an exceptional evening.
86 | she would have apologised but she was in a hurry
87 | there was more people than the usual so it was a little backed up
88 | he did a great job and continues to
89 | it was sitting out for a while on the counter because someone forgot it there
90 | it seems it was warmed up a tad too bit
91 | i know i shouldn't have sent this back and walked out .
92 | they will tell you though .
93 | i love the food ... and service here is awesome .
94 | gammage itself however is so amazing .
95 | needless to say i will be back for mexican food .
96 | my order was great and i'll do it again
97 | my opinion is great for this place
98 | she apologized immensily
99 | price were better then retail
100 | the place was exactly as expected
101 | It smelled and tasted wonderful.
102 | It was indeed.
103 | The beer sauce was terrific.
104 | The building looks modern.
105 | We loved the cream pancakes too.
106 | this place wasn't bad
107 | this was a great club
108 | the stars was 5 plus
109 | i had a great time and i'm happy
110 | they called to help
111 | i wish there were more stars to give
112 | wow the wonton soup was one of a kind
113 | i would give an extra star if it allowed me
114 | she explained herself great so that we understood
115 | i would not think twice to visit here again
116 | I'm one of the corn people.
117 | "It's not always busy, and the restaurant is very clean."
118 | I could eat this pizza even if I was insanely drunk.
119 | "as an Arizona native, I love going to the improv."
120 | "We were seated quick as soon as we got there, then we glady chose colors."
121 | "so , several treatments and medications to help me deal with my condition ."
122 | at this location the service was awesome .
123 | so who do i call to give my money to .
124 | cooked so well that it would never be confused for canned tuna fish .
125 | i 'm not stuck in a lease and if I was i wouldn't move .
126 | service was fast and responsive
127 | our waitress showed up quickly
128 | the food was great and perfectly priced
129 | delivery times were fast
130 | went back and they helped me quickly on the filling
131 | i expected so much less from this ny staple .
132 | "i had to leave a message , and they quickly called me back ."
133 | the food is way better than you find in the freezer section at walmart .
134 | they didn't need to try real hard to get me to come back but i did .
135 | the security guard comes upstairs and knocks on our door asking politely if we can open up .
136 | a place like this is a great value
137 | worker shouldn't get tips
138 | price was good
139 | the place was a good name for used car people
140 | i had a great time once there
141 | These frozen patty's are such good quality .
142 | I will be coming back here very great experience.
143 | This is the best walmart out of all of them.
144 | The new management team is great!
145 | I would recommend ordering something once you're seated
146 | the know what good deli food is
147 | the evening started out with excitement
148 | I said everything is great
149 | the queen bed was so comfortable
150 | the answer came quickly
151 | "oh , i should have asked ."
152 | my toddler found a mouse toy under one of the seats .
153 | this place is just great .
154 | i have to say i was truly impressed .
155 | this is the reason i will always go back .
156 | There are plenty of sausage and bacon options.
157 | The manager was friendly and accomodating.
158 | The office apologized about the experience.
159 | It was way better.
160 | There were plenty of smiles and quality customer service.
161 | "When she didn't answer him quickly enough, he patiently waited on her."
162 | The fried rice was great and there was a lot of it.
163 | " If your patio is south facing, you're in luck."
164 | Plus the dude knew how to work the computer.
165 | Overall: gained my business and a recommendation for a good local camera place.
166 | "it was so funny , i asked her if she was joking ?"
167 | "the food was great , not too greasy and cooked well ."
168 | "food was hot (and fresh) , i had the ribs ."
169 | "was busy , no biggie ."
170 | the bathroom area is something special .
171 | Thank goodness the room he found didn't reek of smoke!
172 | The food there is great compared to buffet standards.
173 | I chose not to bring my business elsewhere.
174 | exhibit c: I would like more student service peeps
175 | This wasn't disappointing from my original favorite.
176 | didn't have to wait long
177 | the tzatziki sauce had the right amount of dill
178 | texture of the meat was perfect
179 | i will go back to the salon
180 | they received lots of stars
181 | i would visit there again if the price was right
182 | they are usually very warm
183 | the food is great but the employees werent moving fast enough
184 | the food was different
185 | the tech did not realy mention this
186 | i 'm from the bay area and this was a pleasant surprise and well priced .
187 | my mom ordered the delicious teriyaki chicken bento box and i ordered the equally tasty sashimi box .
188 | i ordered tasty garlic bread and fettuccine alfredo pasta with vegetables .
189 | it didn't take that long to cook our sliders !
190 | the sandwich was great .
191 | the man stopped her.
192 | "um...we just told him, sure we'd go ahead and finance!"
193 | "tasted really fresh, i couldn't believe it."
194 | "it is also very affordable, just to my taste."
195 | "food was very good, we ordered off the mexican menu."
196 | the brownie dessert was to die for
197 | the restaurant had a lot of style
198 | the addressed all the broken items
199 | the restaurant has been great throughout the years
200 | my plate was full of food
201 | "When I arrived, someone was at the desk."
202 | "We gave it another 5-10 minutes, then decided not to leave."
203 | "Even the ice cream was very, very unique and good."
204 | Wendy's has been known to not be cheap even with drink refills over the years.
205 | As soon as they delivered I was in awe.
206 | "The service is excellent, and the restaurant is way above average. "
207 | "This is so good, that I guarantee I couldn't have cooked it better!"
208 | "This was great, is all I can say."
209 | I bet they have a fountain machine on site!
210 | His egg sandwiches smell good and I like the texture.
211 | this was a great place to go
212 | we love every moment we were there
213 | "eaten there many times, great place"
214 | the place was quick and i was the only one
215 | everyone paid at the same time
216 | I will be ordering the benedict again very good meal!
217 | Overall i love this place very much
218 | The employee apologized and was very sincere.
219 | They were so quick and they get you in very quickly.
220 | I would rate this 5 stars!
221 | "in the long run , they were so helpful"
222 | "story began at noon today, right on time"
223 | i should shop here more often
224 | one of the gems in the valley
225 | "their inventory was top notch , great selections"
226 | chicken fried chicken was tender and the green chili macaroni and cheese was terrific.
227 | so he can charge a reasonable price for them.
228 | were we impressing her or something ?
229 | this place has delighted me once again.
230 | sign a lease with these people immediately.
231 | the trip is to die for
232 | every entree came out perfect
233 | the restroom was clean
234 | I always have a good experience
235 | the beef taco was really good
236 | either way i would recommend buying from camping world.
237 | "excited, I went to sit down and wait for my order."
238 | its also a very clean park.
239 | i called and got an immediate response.
240 | "they only charged me for work done, and parts installed."
241 | It doesn't look old or outdated
242 | He returned my call
243 | If you travel a lot stay at this hotel
244 | I don't mind waiting for hot food
245 | the food here isn't bland or boring
246 | this one is right on the ball.
247 | "for the record i am a terrific cook, i use all kinds of seasoning !"
248 | "the tow package is just great, too."
249 | "never crowded, plenty of opportuinity to have a good time."
250 | i think it's wonderful when employees act that way.
251 | i won't keep looking for a different salon now.
252 | you can not pay me to not ever shop here !
253 | "french toast plate was good , mom said , eggs were hot ."
254 | "now the food : not bad , above average ."
255 | this is easily the best greek food i 've had in my life .
256 | I want to give this a 5 out of 5 star rating.
257 | When it came we were so happy .
258 | The equipment is very new and looks good.
259 | Im willing to take the chance!
260 | This place smelt new and fresh .
261 | They always deliver and keep promises very happy customer
262 | This is exactly what i was expecting!
263 | Everything was so special about this place
264 | Razz accomplished everything
265 | The management cares so much for the golf experience .
266 | "20-30% capacity at most, I enjoyed being the only one in the pool."
267 | The chow mein was very consistent and had a nice texture.
268 | Found the place even though it is easy due to good signage.
269 | "Am sure, and I care!"
270 | Two hours of my life I was happy to give up.
271 | Very great experience with the red purse and wonderful black shoes.
272 | Very happy right now and will re visit in the future to this place.
273 | "Service was great, Very short wait time , highly recommend."
274 | Very great time here for the 2nd time.
275 | "The bread was amazing, and the dip was incredible"
276 | he readily offered another time for me to come in.
277 | what's that interesting thing you are doing?
278 | the lady at the front desk was very friendly.
279 | "i was impressed, and I'd recommend this place."
280 | the dipping sauce was pleasantly sweet as well.
281 | The cheese crisp was perfect and crispy.
282 | I was able to use my certificate because it was good still.
283 | truth truth be told !
284 | Very great parking signs very easy parking experience.
285 | This was a very great place to dine in at.
286 | pretty great food and service.
287 | this is an old vintage hotel.
288 | the pizza is great without toppings and it's packed in flavor.
289 | the mechanics are very professional as usual.
290 | this place is beyond a great place.
291 | This place is very nice with great service.
292 | This turned out exactly how i thought it would.
293 | Went in for an adjustment and everything worked out .
294 | The service was amazing.
295 | I would recommend everyone to try this starbucks
296 | the only thing i wasn't offered was a free dessert .
297 | management really seem to care .
298 | " i had a great experience , and i happily would come back ."
299 | it is really brown rice .
300 | even got a response .
301 | i contacted the store manager and he gave us a great compensation
302 | the staff treated her nicely when she came in for her alterations
303 | the ceasar salad and blackended chicken were to die for
304 | the establishment had the perfect temparature
305 | she offered me a soda while i waited which was very nice
306 | I emailed to let them know they really do care.
307 | I almost forgot how great their offerings really are!
308 | I bet it's going to get finished.
309 | We will definitely be coming back!
310 | "He could really read her pressure, though."
311 | "Absolutely fabulous, do order from this place."
312 | "ate kicked ass, marquee, pretty good too."
313 | Even the water tasted great.
314 | and the food is simply as good as other locations.
315 | Sounds great doesn't it?
316 | the ac was not working properly but its usually a good temp in there
317 | it was large and good enough for 2 people
318 | it was a little overcooked
319 | he is usually on the calmer side but he must have been having a bad day
320 | the prices are on par with frys
321 | too bad it wasn't at the expense of the other customers .
322 | "we 'll see if that doesn't happen , but i 'm not holding my breath that it won't ."
323 | and management does something about it .
324 | no wonder these restaurants aren't closing up all around the valley .
325 | what were you going to charge me when i purchase a dozen delicious bagels ?
326 | i 'm still craving my awesome drunken noodles !
327 | these days the quality is pretty up to par .
328 | this woman belongs in the service industry in az with that great attitude .
329 | service was above average as well as the food and drink .
330 | my husband didn't have to walk up to the bar to place our wine order .
331 | "The pizza was warm, not greasy, and generally tasted great. "
332 | I don't think they were having a bad night.
333 | "This place is clean, not run down, and the service was great."
334 | Everything was hot including the coffee.
335 | I would never put it in the trash.
336 | it was fun for all there
337 | the ordering was quick and the appointment easy
338 | no bed bugs will go with me
339 | beer and food were both great
340 | it came out and was great
341 | No need to call before you go.
342 | The food was good and the service was good.
343 | Crab cakes were salty and delicious.
344 | The customer service made it for me.
345 | I walked over and called the manager to compliment.
346 | I wasn't disappointed with this place at all.
347 | "Don't worry, you weren't my problem. "
348 | The tour guides are experts.
349 | I don't think I need a new source for those uni blocks
350 | She doesn't walk away and has plenty to say.
351 | The short rib hash was perfectly cooked and juicy.
352 | The staff was consistently present.
353 | It tasted amazing and had a great consistency.
354 | She is good at any time.
355 | He took care of the bill himself.
356 | No need to worry about patience.
357 | What a wonderful use of my time.
358 | Two for my dogs equally.
359 | I looked really close.
360 | The server was totally buying.
361 | best customer service i have ever had .
362 | great what has happened to this sandwich shop .
363 | " and for this reason , i will return ."
364 | this branch is getting better and better.
365 | they happened to tell us eating in the bar was an option .
366 | they can thank you for a good rating
367 | i ate all of it
368 | this was the first time i ever cried of happiness from a haircut
369 | there was a friendly server with a great personality
370 | if I wasn't with my coworker I would've danced for joy
371 | "Ra was a chain, wow im impressed"
372 | "they got it right, didnt take them long"
373 | went to sunday brunch and had a great time with my daughter for her graduation
374 | the got the spice right this time after a few tries
375 | "game room was pretty decent, well worth the price"
376 | I told him I would leave excellent feedback online.
377 | The manager told me my order was coming up any minute!
378 | "I bet it's a good thing, and not a bad thing. "
379 | Maria the manager is an outstanding person.
380 | It was so good that I didn't want out of there.
381 | "we told them not to forget it , we wanted to wait ."
382 | "first , the bartender was awesome ."
383 | "absolutely awesome, can't stay away from this place ."
384 | i guess great restaurants don't have bad days .
385 | "oh well , they care ... right ?"
386 | we will definitely be spending more of our money here
387 | i will be back to get another lv bag
388 | "great pieces all even, glad i paid the price i paid"
389 | we left with a lot of things that day
390 | "such great service, cannot praise it enough"
391 | It's thin and tastes amazing.
392 | I had delicious crackers for my soup.
393 | I love huge chain restaurants.
394 | The rice was warm and the taste was great.
395 | You can judge people based on their character.
396 | decent pizza they had here
397 | the pasta came out great and tasty
398 | the food was great and very courteous
399 | would recommend anyone to live here
400 | got in quickly
401 | great food and excellent staff and very very nice workers.
402 | It was my favorite weekend visiting my folks.
403 | next dish was moo goo gai pan to which the awesome sauce is the secret.
404 | do not go here if you are interested in eating bad food.
405 | prime rib was not fatty and was cooked per request.
406 | I gained more respect for this company.
407 | We will stay with this walgreens location.
408 | I'm going back to the bar and making it to the restaurant.
409 | The place was incredibly busy.
410 | It's like an actual irish pub.
411 | The birthday surprise was a success as well as her special day.
412 | Included in this price was a small credit card fee and a negligible after midnight fee.
413 | The other item I was able to eat was the wonderful broccoli beef!
414 | "Other than that, I really don't know what was bad about it."
415 | The food is very authentic.
416 | "actually, go on in."
417 | "fortunately, it is the best."
418 | the place has obviously improved over the years.
419 | much more these days.
420 | looked forward to experiencing the restaurants closer to asu.
421 | this location is not that great but it was great before
422 | the door was jammed
423 | services were fast and we tried to help everyone equally fast
424 | there might be something wrong with the hallway ac but it has since been fixed
425 | the order was placed but the waiter did not get everything right so he asked them to repeat the order
426 | i may just post pictures to prove their high quality work .
427 | it is way under-priced ( compare to what they serve !
428 | this is a great college .
429 | my jack and coke was seriously popping .
430 | "happily , i will probably be able to eat at this place again ."
431 | "The falafel's were so amazing, and had such a rich flavor."
432 | The place was empty with minimal customers!
433 | This food was great for diners .
434 | "The food was perfectly cooked, and lots of great seasonings. "
435 | "My noodles were so good , and the bowls held everything i place."
436 | go here no matter what.
437 | I thought the prices were fair.
438 | She was pretty good with the estimated time for a table.
439 | The prices were great.
440 | They had plenty of cheese packets and red pepper.
441 | went into this awesome location to drop it off .
442 | we went there on a busy holiday weekend and the service was extremely fast .
443 | "if i was the manager , i 'd promote that kid on the spot ."
444 | too sweet too because they brew an outstanding beer .
445 | the food was high quality and thrown together deliberately with a lot of flavor .
446 | The beer tasted fantastic.
447 | The woman in the deli was great.
448 | I will always go back to kfc - they earned my patronage.
449 | There are no better places for breakfast.
450 | The older waitress was a real sweetheart.
451 | It's a very pleasant surprise.
452 | It doesn't ever need replacing.
453 | "Amazing place, great food."
454 | I will be returning to this place frequently.
455 | The fees are comparable to other places.
456 | "anything they say, they give in writing."
457 | this is a close call from real new mexican cuisine.
458 | but fortunately the rude woman wasn't the one checking us out.
459 | built in cabinets in kitchen and bathroom along with sink are well kept up.
460 | good service in these areas and really made our visit.
461 | it could've been left for me to wash
462 | she took dishes with her as she went by
463 | incredible and amazing specials are available
464 | the green enchiladas were plenty good
465 | i wasn't as satisfied as I wold have hoped.
466 | I tried to much success
467 | university is not so bad like living on apache
468 | good job on nails and toes
469 | I opted to purchase the pictures based on quality.
470 | bread was offered at no charge
471 | "Simply, this is a superior place to go for sushi."
472 | Took my suit to get fitted and they put a new button on it for me.
473 | Trustworthy pizza delivery!
474 | That person received their meal fresh and hot while everyone else was finishing their meal.
475 | "The place was quaint, I'm sure she could have figured it out."
476 | They pay very much attention to customers !
477 | "Everything was good, and the desert was very tasty."
478 | Very nice massage made my maicure and pedicure great
479 | I called at 630 and was helped right away !
480 | This experience was very great
481 | "place was so quite, no one was fighting"
482 | there were no scorpions in the place
483 | "she was very nice, she helped us out "
484 | she loved her job
485 | i would rated this a little higher if i could
486 | It was very personalized and warm.
487 | Everyone waited to eat until everyone seated had theirs.
488 | I love the size of this dance floor!
489 | "It's good this location has double the staff, since it's so busy. "
490 | Starbucks services are good at every location!
491 | they had awesome stringed cheese
492 | it was luckily not the ones at asu but the ones in junior high
493 | three woman met for a nice lunch last friday at elements at the sanctuary
494 | "ra sushi, you are so amazing to me"
495 | half of my head was processed well
496 | a young woman did my pedicure .
497 | like it hasn't been in a fridge for a week .
498 | "`` yes , we are going to have a tournament tomorrow ."
499 | "when i picked up the order , i was given another totally different, cheaper price ."
500 | they stock some of the most common parts .
501 |
--------------------------------------------------------------------------------
/emnlp2021/data/yelp_seniment_data/human.1:
--------------------------------------------------------------------------------
1 | it's small yet they make you feel like a stranger.
2 | i won't be going back and suffering at this terrible place !
3 | the drinks were expensive and half full.
4 | "my husband got a reuben sandwich, he hated it."
5 | I signed up for their email and got spam.
6 | I'd definitely recommend not giving them a try.
7 | I highly recommend avoiding e & m painting.
8 | otherwise a terrible experience and we will not go again.
9 | "watery drinks , and bad company ."
10 | oh I can't get my band geek back on anymore !
11 | she did not make me feel welcomed
12 | the steak and tuna were not up to par
13 | everything was so stale
14 | the biscuits were hard and gravy was bland
15 | i would not return here next year
16 | "They should've been more helpful, kind, and reasonably priced. "
17 | god I really don't like pei wei.
18 | "The folks weren't friendly, bagels weren't good, and the cream cheese wasn't tasty. "
19 | You'll see why I don't like it once you get there.
20 | Everyone that I spoke with wasn't helpful or kind.
21 | There is limited variety for sushi rolls.
22 | Every experience has been awful.
23 | Steve was totally unprofessional and didn't find any fit that would be good for us.
24 | The Golf Club was a major disappointment.
25 | Minimally qualified dentist and staff.
26 | portions are very small and food is not flavorful.
27 | "always bland, fat, and bad tasting."
28 | these two women are not professionals.
29 | the most important thing should be the food.
30 | we were both unimpressed.
31 | "the brunch was bad, with little selection"
32 | "the menu was weak, the food bland"
33 | she was not correct and it tasted bad
34 | "hate this place, service was bad"
35 | bad shop for a gift though
36 | a great stop if you want bad things at scammy prices
37 | the atmosphere was lame and the staff treats you like dirt
38 | their front line is so much more expensive than petsmart!
39 | this is a place where the customer comes last and is mistreated
40 | I hate how their location is inconveient to get to from scottsdale road
41 | their pizza is the worst i have ever had as well as their ranch !
42 | i 'm so sad and will definitely not go back for future services .
43 | seems pretty low compared to every other thai place .
44 | she did the most awful job .
45 | really didn't enjoy the ugly range .
46 | the selection was not that great and they had no idea what they were doing
47 | not a place i would recommend
48 | too noisy and the food and brews are not that great
49 | lunch and dinner was not very good
50 | a little on the dirty side and unprofessional
51 | the calzones are awful and the lunch special they have is pathetic .
52 | i will definitely not return often !
53 | they have gross soups everyday .
54 | "negative people , slow service and clearly not expert ."
55 | produce is very unreasonably priced and not a great selection of organic produce .
56 | the food here is gross
57 | first time i came in i knew i just wanted to leave
58 | certainly a bad choice when going to a performing arts center event
59 | this place has been making horrible sushi for years
60 | even in summer they are no patronage
61 | no one helped me while i was waiting and no one knew much about traels
62 | the food and wine was terrible and my best buds said the same thing
63 | "the place was too packed, we did not enjoy it"
64 | the service was frustrating
65 | our special night was ruined
66 | the spanish omelet was small and bland tasting
67 | the chairs were small and disappointing
68 | the wait staff was not friendly nor attractive
69 | the food was just ok
70 | the color used on my friend was a bad choice
71 | it is consistently bad and the staff is always rude
72 | unfriendly and unwelcoming with a bad atmosphere and food
73 | the chicken chimi was absolutely gross!
74 | it is also not fun to have to sit at the bat
75 | the soup will make you want to die
76 | the worst mexican food in the phoenix area.
77 | "they are unprofessional, hurried and neglectful of our needs."
78 | their lunch special is a poor value.
79 | our experience with this business was negative.
80 | the staff is sub-par and location is inconveniently far from old town.
81 | Sitting outside makes this place worse.
82 | The food was tough and dry.
83 | The fettuccine alfredo was a disappointment.
84 | We avoided dropping the rental truck here two weeks ago.
85 | This adds the final straw to a disappointing place.
86 | prices are a tad high and not worth it .
87 | the message therapist is terrible also .
88 | came here without my family .
89 | "the owner replied , i will not go in half with you ."
90 | " the escargot was gross , and seasoned terribly ."
91 | brittany gave me a very old haircut and ruined me completely .
92 | this resort has been in scottsdale forever and it truly doesn't know service .
93 | also the crab wontons are awful along with the spicy plum sauce .
94 | i have gotten so many dresses at a terrible price !
95 | all around bad place to grab a drink or two .
96 | "the chocolate cake was horrible, worst i had eaten in a while"
97 | the scenery and decor was poor along with the staff
98 | the wine list was poor and overpriced
99 | our day on the river was horrible don't want to return
100 | "no matter your age, you won't like it"
101 | i can't stand their star design collection.
102 | i don't like their fresh juices either.
103 | definitely avoid if you are in the phoenix area.
104 | oj and jeremy did a terrible job !
105 | "no tires, no alignment, no brakes, nothing."
106 | it was very bad too
107 | staff: very ugly and mean
108 | hate the aternoon tea at the phoenician
109 | their enchilada sauce ruined this dish
110 | horrible decoration and service
111 | displeased with service and quality of work
112 | one of my least favorite places to go for a steak
113 | this place is super yucky!
114 | they are cramped with gaudy decor
115 | this is the last place I'd go in the neighborhood
116 | the people at sonora weren't that good to me.
117 | the patio isn't a fun place to sit at at 7 pm.
118 | it wasn't worth it.
119 | "oh sushi kee, how i loathe thee!"
120 | she didn't keep our drinks full and gave us poor food recommendations.
121 | "I had the jalapeno ranch burger, it wasn't that tasty."
122 | The kebab dishes and service wasn't that good.
123 | "I got a bagel breakfast sandwich, it wasn't that good. "
124 | "The service wasn't that good, and the staff wasn't friendly at all. "
125 | Expensive drinks and pool tables made this night not as fun.
126 | "eggs are not always fluffy, the fruit just ok"
127 | the pricing is a bit much
128 | "the people are just ok, might be honest"
129 | i felt like an outsider there
130 | the colors were bland and often the same
131 | they should have zero starts
132 | its a big bowl of garbage
133 | i would not visit this place again
134 | not that great after the river
135 | the plate was not good enough for us
136 | i did not like the service so i wont return
137 | the pork tacos were stale
138 | no one was friendly nor helpful
139 | i would not send anyone their way
140 | the dolly was not that great
141 | jennifer sucks and makes skin care too expensive
142 | this is no rare gem and I am not happy to have found it
143 | this is a pretty darn bad pizza
144 | everything we've ever ordered her has been horrible tasting
145 | it is the least authentic thai in the valley
146 | The staff and food wasn't good.
147 | I didn't like the mole sauce or the thin chips.
148 | "Don't go here, you won't like it. "
149 | I didn't like the ribs more than the chicken.
150 | The staff and rides were terrible.
151 | we were unpleasantly surprised when we brought our dog for shots
152 | I recommend against the drunken dragon noodles if you like spice
153 | for a local chain this place sucks!
154 | this place reminds me why I want to go home
155 | not one of my regular spots in scottsdale
156 | I will never go here again!
157 | "I stopped in for lunch, the wine list wasn't good neither was the service. "
158 | I don't like that the family circus is the place in the comics.
159 | This isn't an international delight.
160 | It wasn't that cute or quaint.
161 | "The outside seating is too packed, and happy hour never happens."
162 | The atmosphere at the sugar bowl is negative.
163 | The food was just awful here.
164 | "Old decor, very bad food here. "
165 | The food is always old
166 | "Dont go here, you will spend so much and have a bad experience. "
167 | All of the food was awful and tasted old.
168 | So far the customer service was just rude.
169 | Rick was the meanest person.
170 | We ate here a couple days ago and the happy hour was terrible.
171 | that's not so special in a business.
172 | it's not very substantial food.
173 | not so nice club not so good layout the course was in average to poor shape.
174 | we probalby won't be back.
175 | the meatloaf (if they even have it) is nothing special.
176 | food here is also terrible !
177 | "no thanks amanda , i won't be back !"
178 | i hate their flavored coffee.
179 | the staff made us feel very unwelcome and uncomfortable.
180 | the food is bad and from what i can tell is rather inauthentic.
181 | I wouldn't recommend dentistry of old town scottsdale to anyone.
182 | My father has decided not to upgrade my mothers engagement ring this Christmas.
183 | The worst deals in the store are on these racks.
184 | The guys here aren't nice or helpful.
185 | They probably won't bring back all my vehicles.
186 | the salsa was horrible although next time i will ask for a hotter one .
187 | not a great place to have some fresh and delicious donuts .
188 | came here to get breakfast and that was the worst decision yet .
189 | they have a horrible selection of combos to choose from .
190 | "they were awful , too greasy and not fluffy or light ."
191 | There isn't a great selection of salads or pasta in the deli.
192 | "Overall the quality of the food, and the service wasn't that good. "
193 | Everyone was not that nice or professional.
194 | "Sadly, I ordered the tamale sample."
195 | "Also Matt wasn't that cool, and not that helpful."
196 | the food was not enough at all
197 | i have had better food for cheaper prices
198 | definitely not worth the wait
199 | "claiming to be the best pizza in arizona, i dont think that is correct at all"
200 | the customer service here leaves a lot to be desired.
201 | "Poor service , and awful staff."
202 | "Dont recommend the dessert here , very old tasting."
203 | They have the worst selection of dishes here
204 | This is by far the worst pizza iv'e had .
205 | "The bartender was so rude, and very slow to help us out."
206 | service was terrible as they continued to ignore our table .
207 | no thank you ladies for being awful !
208 | "awful and slow service , these guys really don't know their stuff ."
209 | a terrible museum to visit .
210 | nice for me to go and work and have a bad breakfast !
211 | "Awful prices , and not good tasting."
212 | "Our waiter was the worst, very annoying."
213 | "The menu is complicated, very low quality items. "
214 | "Very expensive, and bad quality here."
215 | "Terrible menu, high prices, bad customer service."
216 | "aweful food, high prices and the tuesday wings are too small"
217 | busy place and had to wait an hour to get my food
218 | i ordered to go but it seemed like they didnt cook me fresh food
219 | this place does not have the best food
220 | the two men infront were rude and did not help me
221 | "the salsa is thin, the fish smelled, the drinks just watery."
222 | I hate this place so much I have actually inquired about closing a franchise.
223 | and the seafood enchiladas were horrible !
224 | the bread is definitely not home made and I couldn't eat it all day.
225 | "the portions were small, the food came out cold, old, and bland."
226 | a bit on the noisy side and the spoiled food doesn't make up for it .
227 | ambiance and setting were sub-par .
228 | pizza was cooked to a crisp and took forever !
229 | our bartender erik didn't take care of my fiance and i very well .
230 | everyone who works there is always so angry and unfriendly .
231 | I love the cornbread appetizer it's the only good thing on the menu.
232 | "Ordereed the huevos rancheros, they were delicious, but the service was okay. "
233 | "This place has great customer service, but okay food."
234 | The firecracker shrimp is duck is the only winner on the menu.
235 | "The service was wonderful, but the food was mew!"
236 | The service was awful and hurtful.
237 | "Awful food, and poor customer service."
238 | This place is just terrible.
239 | The biscuits and gravy are very bad items here.
240 | Always a poor experience the owner and staff are rude.
241 | it's not a gem worth searching out .
242 | my husband and i didn't enjoy our 3rd anniversary hear .
243 | this is a golf course that is tucked away it is in poor condition .
244 | plus my sparkling wine flights were gross !
245 | pricy and these cheese pies are disgusting !
246 | hated the bbq salmon was pretty aweful
247 | the thanksgiving dinner was not what i expected especially the sweet cakes pecan pie
248 | steamed veggies were not properly steamed
249 | if you are a bride to be stay away
250 | the flavors are nothing to write home about
251 | mustard beef ribs was too hard
252 | the burger was not juicy enough and had no flavor
253 | the experience was horrible
254 | the tzatziki sauce was very bland to me
255 | the meal was not cooked properly and had no flavor
256 | i would not try anything on their menu
257 | bartender gave me a watered down martini
258 | carlsbad tavern you need to improve your service
259 | "other than that , i wont be back"
260 | "the breakfast was cold, did not enjoy it"
261 | "they didn't pay attention, horrible people"
262 | the food was bland and not that traditional
263 | food was old and stale
264 | the place was not inviting and staff seemed aloof
265 | fish tacos were the worst I had
266 | absolutely hands down the worst dentist office in the east valley !
267 | they are all very unfriendly .
268 | he always looks ugly and is so sad when i pick him up !
269 | i hate italian and i don't eat here often .
270 | quality of food is awful .
271 | "my hostess was terrible and slow, and the service was not on par with nearby places"
272 | the bar had very few selections
273 | did not do a good job at all
274 | the tech did not know what to do
275 | this is the last place i would go if i came into town
276 | "lunch was horrible, i wont be back"
277 | the prices were too high
278 | the green chile chimis were too cold
279 | did not do a great job with the repair
280 | the food was not upscale and the athmosphere was boring
281 | had dinner here last night and it sucked
282 | the marinara sauce had no flavor
283 | they are not nice or accommodating
284 | the service was bad and the food was hellish
285 | it can be pricey but isn't even worth it
286 | "horrible staff , terrible food , disgusting prices ."
287 | portions n prices were unacceptable !
288 | no person enjoyed themselves and there was nothing but complaints from our entire group .
289 | disgusting persian food not to mention service .
290 | thank you for a forgettable evening !
291 | my daughter was treated terribly
292 | i called so many times no one answered at all
293 | no one was helpful nor friendly there
294 | not enough wine choices
295 | i would rather go to the club than here
296 | the asada steak was barely cooked and raw
297 | this location is too small
298 | too crowded and not a great place to take a beer
299 | "too many single people, not a place for your family"
300 | the sides were to expensive and small
301 | the job was poor on my color and cut
302 | the owner was boring and the place not that welcoming
303 | won't go back
304 | seem unprofessional and aloof
305 | the course is not the best
306 | "Even if you're into that sort of thing, I wouldn't check it out. "
307 | I dislike coming here for lunch to get the slices.
308 | "It's a long, half a day trip from phoenix area. "
309 | I hate the vibe of this place.
310 | The staff are not friendly and not on top of their jobs.
311 | the corn bread was stale
312 | the food was very over priced
313 | the rock climb was ok
314 | the food was bland and not as good as chain joints
315 | possibly the worst i've had at an auto shop
316 | so unhappy with my decision to go here
317 | it was pretty expensive and played bad music videos
318 | very throwback atmosphere; old ice cream parlor
319 | sat at the bar and ate overly small plates
320 | the grounds are always very filthy
321 | I just returned from a terrible stay at the camelback.
322 | "a tough steak, tiny shrimp and a mediocre enchilada."
323 | they won't put out any extra effort !
324 | i've been here several times and every time has been disappointing.
325 | the bad news that my vision had deteriorated made the visit even worse.
326 | "It is cool to relax there, but it is pretty boring after awhile"
327 | I recommend imports and american auto service to anyone that wants just okay service
328 | "The girls are friendly and attractive, but a bit pushy"
329 | "Nice people are here, but it is too quiet and boring"
330 | Took home a delicious white pizza cause it was cold
331 | "after eating at other gyro places, kings has the worst sauce"
332 | service was bad and food was even worse
333 | suzanne and her staff were horrible
334 | all in all pretty bad
335 | "slow, unfriendly, and dishonest service"
336 | "Awful food, terrible service, catering is disastrous."
337 | The pork belly was awful.
338 | The ingredients were old.
339 | I won't go back there.
340 | The place was awful and horrible value.
341 | this is the worst seafood joint in town .
342 | keep up the awful service !
343 | "every time i crave mexican food , i never go here !"
344 | my dr pepper ribs were terrible and very hard .
345 | everyone is always super unfriendly and unhelpful .
346 | is was horribly cooked and bland
347 | the indian food was poor tasting
348 | bad location for volleyball
349 | watched only one its that bad
350 | i felt bad for taking my family
351 | the owners were rude and so were the staff
352 | they dont deliver to the school
353 | ron was mean
354 | family owned but they need to get a better staff and a better cook
355 | the meats and cheeses were not a lot to choose form
356 | the gyro sauce was bad
357 | worst experience with a restaurant I've had
358 | horrible tamales
359 | everything served was bad
360 | the kitchen was a mess
361 | place was dirty and drinks were expensive and watered down
362 | "no one remembers you there, its like they dont even care"
363 | hate coming here as a family
364 | they could not find my apartment and took so long
365 | "they are so selfish, not any help at all"
366 | It's located in a slum scottsdale area and isn't accomodating.
367 | It's not worth look at.
368 | "It's got unfriendly staff, bad service and mediocre food."
369 | Poor prices and the customer service leaves a lot to be desired.
370 | My sugar bowl favorite was not in stock.
371 | "The prices weren't good, and there wasn't a good selection."
372 | The food and service wasn't good at all.
373 | This is the worst yarn store in the metro area.
374 | "The lunch specials weren't good, and neither was the staff. "
375 | I didn't enjoy every bite.
376 | The food is always terrible when i order here.
377 | "They were very rude , and overly priced."
378 | "Terrible selection of parts, and very over priced. "
379 | The food is so bad here and very unathentic tasting.
380 | "Terrible food, bad prices, would not recommend."
381 | "the place gives out ok food, for takeout"
382 | the experience was always the same
383 | my friends and i stopped going years back
384 | no new items have been added over a long time
385 | there is very little help involved
386 | not a great place to go for games with multiple tvs and a lively crowd .
387 | all of my clothes are returned in terrible condition !
388 | one of worst veal dishes i 've ever had and unreasonable prices .
389 | "the food was bad , steak bites and hummus plate avoid ."
390 | as far as pizza goes this ranks with the worst i 've ever eaten .
391 | the service was bad
392 | the sandwiches were poor
393 | is was a bad atmospher
394 | i will never go back
395 | "everything tasted bad, nothing i liked"
396 | not something i would have again
397 | california burrito was a bad burrito
398 | the woman who helped was angry and quick
399 | service was poor and the food expensive and weak tasting
400 | the menu was small and lacking
401 | "pei wei has unfair prices , slow service , and horrible outdoor seating ."
402 | it was a terrible experience !
403 | hi humor makes a routine visit boring .
404 | don't like this place and will not be coming back .
405 | we were sat after a long time and every staff member was extremely rude and moody .
406 | won't go back with friends
407 | the calzone is very small
408 | the food tasted just ok
409 | the margaritas tasted off and the churros with ice cream bland
410 | the ahi tuna is not something i want to recommend
411 | she was very quick an impatient
412 | the pizza was not good
413 | over cooked and bland
414 | very mean crew
415 | bad suchi and salad
416 | this place will not transform you
417 | "i hated the color , it did not match my skin"
418 | too loud i would not go back
419 | this little pizza and salad place is disgusting
420 | the alley way art is scary
421 | but being a tucson native this place brought shame via my tastebuds.
422 | worst chicken parmesan i have ever had.
423 | customer service was poor there.
424 | loud place to catch a game on tv and try to catch up w/ people.
425 | most terrible inauthentic chinese food !
426 | The chips and guacamole weren't that good.
427 | "The place still isn't good, even when you can sit outside. "
428 | This isn't a good spot to throw back a few beers after a rough day.
429 | I definitely don't recommend this place to others.
430 | "The food wasn't good, and the price was even worse."
431 | i took the waiter 's recommendation of the veal carolina and it was awful!
432 | super bad deals and very unfriendly staff .
433 | tried their lasagna and its the worst ive ever had .
434 | this place isn't edgy or fun .
435 | terrible place to see a show as every seat is a horrible seat !
436 | she was absolutely awful and i hate what she did !
437 | the happy hour crowd here is never fun .
438 | very busy and very bad service .
439 | worst take out or eat in sushi !
440 | i have had a bad experience every time from start to finish .
441 | he does not care for his customers and does not even pay attention to them
442 | "the prices were on the high side, cheaper place nearby"
443 | the staff was horrible and slow
444 | too much grease in the food and the beer was too warm
445 | i had to wait an hour to be served
446 | the bartender was horrible
447 | i would not suggest this shop
448 | the staff are rude and lousy
449 | "nothing to see there , not a nice place"
450 | "so slow and rude, not helpful at all"
451 | car was detailed and came out bad
452 | dr hall was not kind nor gentle
453 | the portions were so small
454 | the food looked ok
455 | the menudo was poorly done
456 | the crispy mango fish was very disgusting .
457 | "now the food , drinks , and desserts are horrible ."
458 | the carne asada burrito is awful !
459 | this is the worst date night .
460 | terrible selection of games to play .
461 | The artichokes were terrible and undercooked.
462 | "The adovada was bitter tasting, and the chocolate pudding tasted old."
463 | "Very expensive , and not the place where you should waste your day."
464 | Very bad experience i will not come back.
465 | Great service at the bar we made reservations next door
466 | very unfriendly staff and manager .
467 | hate my cut and color and sage is awful !
468 | narrow spectrum of ages and races .
469 | had a bad time and the food at their restaurant was awful !
470 | at extra cost - awful service !
471 | stake and salsa burrito was too small and not enough cheese
472 | blue corn tacos with chicken was not cooked properly
473 | not a lot of portion and the flavor was not there
474 | i will not come back here ever
475 | you will forget this place as soon as you walk out
476 | the hummus is ridiculously dry and bland.
477 | terrible place for a romantic dinner !
478 | the owner doesn't seem to remember us and treats us terribly !
479 | i can honestly say i am very unhappy we will be moving to az !
480 | the green chile mac and cheese was horrible !
481 | i highly recommend you don't go to impact auto body .
482 | "awful quality , awful service ."
483 | and i get the gyro sandwich every time and its the worst .
484 | i highly recommend not using this bakery !
485 | hate their food and their passion .
486 | I don't recommend this salon because the artist had no talent.
487 | They don't have many choices for us sushi lovers.
488 | "The price isn't reasonable, bottom line isn't guaranteed. "
489 | These homemade tortillas aren't good at all.
490 | "The prices weren't great, and it wasn't worth it."
491 | "lousy food, slow service"
492 | probably one of the worst i've had
493 | they do not want me to wear what i am comfortable with
494 | the freshly squeezed orange juice was not fresh at all
495 | the octopus sashimi tasted like wet socks
496 | "this is an awful location , especially because of teresa m ."
497 | i was not pleased with their service .
498 | you don't get very much for the price and the food was old and bad .
499 | they are not fresh or good .
500 | the salads were old and wilted .
501 |
--------------------------------------------------------------------------------
/emnlp2021/data/yelp_seniment_data/sst_0_07.0:
--------------------------------------------------------------------------------
1 | however since joes has hands it 's just gotten great and delicious .
2 | there is definitely enough in that part of the venue .
3 | everything tasted fresh !
4 | she is amazing she 'd be happy and for a few minutes .
5 | i ca n't recommend how quick iate this pharmacy is .
6 | just and took it the best .
7 | "it is n't terrible , but it is very good either ."
8 | definitely that i could not use my birthday gift card !
9 | "new owner , i heard - but i do n't have theesides ."
10 | but it probably too !
11 | we sit and we got some really good and contact service .
12 | the negativeo soup and a small salad .
13 | it was amazing !
14 | their best n the best we ever !
15 | thehighly recommend area was clean and one was doing the store front .
16 | there chips are really good and their salsa is really good .
17 | the wine was very good and the food was even better .
18 | staff is friendly and quick with disgusting care that do understand customer service .
19 | great burgers to the point and the meat was crunchy .
20 | i love cheese dressing was n't the best by any would absolutely .
21 | my favorite pad thai like thai rice noodles with barbe !
22 | "wow , great ."
23 | the store is great and needs to change .
24 | there is great meat and bread .
25 | when i got a polite way i love it !
26 | she was happy with being there .
27 | "bothi past the pastries , they were delicious and truly tasteless ."
28 | the customer service is no longer an option .
29 | the decor was seriously nice .
30 | "anyway , we got our coffee and will definitely return to this location ."
31 | these people here are great .
32 | salsa is hot or good .
33 | so happy i found the to be !
34 | i just received a delivery order from them and but shesuper happy with my money .
35 | will n't be going to let her do a !
36 | all she did was great and give the family run around and b return everything .
37 | the beans in the burro in the rice to be found .
38 | they always take out and i love young .
39 | we 've always loved enough there who have ever returned with this restaurant .
40 | always in their with their with their with great customer service !
41 | they really have to work on their customer service and general attitude .
42 | i love eating here and we will be back to the beer .
43 | my `` hot '' sub was delicious and the meaty .
44 | when i came to phoenix ... yes indian to me .
45 | you will find a great selection in scottsdale .
46 | i love it for thai hot and received very spice .
47 | i was very impressed with the service after .
48 | "great hungry andi , and rude ."
49 | i just love this car and very busy with thei and .
50 | "had one meal and best , `` we 'll try again '' ."
51 | "the food 's good , the service is always great and i have been ."
52 | this is my first family in looking for a wedding dress .
53 | well he is to tell you how to fix it .
54 | will definitely go to this practice and they are great !
55 | i she was n't happy that we were asking the prices .
56 | the only only only only only only only only only only only only parts .
57 | always a long time customer !
58 | "other than that , food here is pretty good ."
59 | i also enjoy about the product but he straight up .
60 | so far i 'm really impressed .
61 | my groupon is the two windows .
62 | i loveeway has he has it and my business to go with & fresh .
63 | "the food is pretty good , i love it ."
64 | we will definitely be using this location again .
65 | this is a great venue .
66 | you 'll have great after the first time .
67 | do n't go here you want to go here .
68 | but it may actually be horrible however anyway .
69 | owner : a very friendly man .
70 | this place is a great place to live !
71 | stopped by for soda after being at the shop the next door .
72 | the dish is pretty salty and could taste the best !
73 | it looks great and the shape into the shape of grilled pieces .
74 | love everything about the time i come here !
75 | there are better happy hours and beers all around time .
76 | the rooms are great that nice and the food is that good either .
77 | greatnot horrible people would come by even saying good morning .
78 | this spot is my favorite indian restaurant .
79 | great even real brown sauce .
80 | i will keep thei as a course a and time play play play
81 | but with what actually arrived .
82 | his offer of a `` free '' and was even more happy .
83 | thanks so much i !
84 | "the salads are ok , nothing special ."
85 | overall it is a great evening .
86 | the woman n't even help .
87 | "i love the service and be that morning , and it was great ."
88 | a complaints with his work .
89 | the garlic bread was fresh and delicious .
90 | the hot and fresh .
91 | i should have found this back and out .
92 | they wo n't tell you .
93 | i love the food ... however and service here is great .
94 | g way beautifulve itself however is so amazing .
95 | needless to say i will be back for mexican food .
96 | i wo n't be back again .
97 | "let me give my opinion , that 's what this location for is for ."
98 | thanks disappointment !
99 | prices are just right .
100 | love this place and never had so much different .
101 | it smelled fresh and like yesterday 's .
102 | "who knows , it have been great ."
103 | the beer sauce is best .
104 | the building itself is clean .
105 | we 've tried the pancakes as well .
106 | i 've had a great experience than this place !
107 | "thanks , 's !"
108 | thank you i give !
109 | "will now i 'm really , and really happy ."
110 | "thanks , , , , , , , , , , ,"
111 | i love i give it !
112 | the groupon was seriously filling .
113 | "if i could give it zero stars , i would ."
114 | she is friendly and helpful .
115 | well twice -- this place is a great place .
116 | i 'm one of the best people .
117 | it 's always busy and the restaurant is very clean .
118 | even if you say everything i had n't go this pizza !
119 | "as an arizona disgusting , i have been going to the years ."
120 | "i got there , seated pretty quickly , and then chose my color ."
121 | "so , no treatment and enjoy to help deal with my condition ."
122 | love this location and the service was great .
123 | so who do a call to my family .
124 | cooked so that it is the best !
125 | i 'm in a quick employees and i move .
126 | service was a great place to work with .
127 | they did our show up with another rude gyro full water .
128 | the service is so-so and very reasonably priced for what you get .
129 | at to see what their delivery times are great when it 's last call .
130 | just moved to get theiti filled filled filled filled
131 | i love so much better from this ny little review .
132 | "we had to come here a message , and never had a lot ."
133 | the food is better than you find in the make section at .
134 | they have real to get to come but i love it !
135 | the wet used lunch section andgreat on our door we open up .
136 | i wanted to love this place but it is a big right .
137 | i do n't know staff !
138 | and it is way reasonably priced .
139 | they make a great name for used car .
140 | "when i finally got , i was very happy ."
141 | just a great pat and cooked like a home one .
142 | great but but but good .
143 | this is the best walmart neighborhood market of any of them .
144 | the new management team is great !
145 | they do ordering anything you 'll love it .
146 | they also have great sight of what good deli food is great .
147 | check out .
148 | "i find , `` great '' byit i 'm cool ."
149 | theuseuseesides !
150 | great business asian for an excellent price .
151 | "thanks , , , , , , , , , , , , ,"
152 | myi found ai under one of the seats .
153 | this place is just amazing .
154 | i have to say i was truly impressed .
155 | this is the best reason i will go back .
156 | "fantastic , is great !"
157 | when the manager finally ended up he was friendly and helpful !
158 | the office is also great to even apologize about this whole experience .
159 | "nope , great way ."
160 | there are great dealership and customer service .
161 | love she she she she she she she she she she she she she she she she she she she she she
162 | the fried rice was delicious and a great office of it .
163 | "love your patio is the south , you 're of town ."
164 | plus the top good top is even know how to work with the work .
165 | overall : my business and great food for a good local place .
166 | "it is so good , i recommend her !"
167 | "the food was best , far too greasy and delicious ."
168 | "food ( still great ) , i had the best ."
169 | "thanks , , , , , , , , , , , , , ,"
170 | the bathroom area is special .
171 | this room that he found also had great time of smoke !
172 | "also , the food that is great and there is even by no buffet ."
173 | so i brought my family .
174 | excellent c think with great fruit services and pe reasonable .
175 | so good from an old favorite .
176 | it shows on a saturday and n't a great meal .
177 | thanks will make the way much dbrbrbrbr the best in it .
178 | the quality of the meat is just delicious and delicious .
179 | i will definitely visit the salon again .
180 | they received one star because you will continue to provide a star rating .
181 | i will definitely go back again !
182 | this is by far the nicest person i spoke with .
183 | the rest of their food is great but their employees and service are great .
184 | "great for food , had it ."
185 | "thanks , , , , , , , , , , , , , , , , ,"
186 | i 'm from the place the ice area and this is reasonably priced .
187 | my mom ordered the teriyakiiy and box i i ordered ordered the the s sashashimiimi . .
188 | i ordered the garlic bread and f deliciousgreatgreatgreatgreatgreat garlic with .
189 | it does take that to cook sliders !
190 | the sandwich was that great .
191 | the man did her job .
192 | um ... we love that we did !
193 | "tasty really old , i ca n't believe it ."
194 | it is also quite good for my taste .
195 | love maybe we have the best mexican menu .
196 | "always not at least , theirie ... ."
197 | it shows that this restaurant has great class .
198 | greatiti after do do care waste waste waste waste
199 | we 've been to this quick restaurant many times .
200 | great plate plate !
201 | "when i went , one was at the desk ."
202 | we got it another 5-priced before we got up and loved it .
203 | "even the ice cream , very good quality of the work ."
204 | w downy 's has know to be happy with their drink refills for years .
205 | as soon as they delivered i like u .
206 | the service is great and the restaurant is just average .
207 | but i love bbq and sure have cooked this one .
208 | just is all i can say .
209 | also they have great goods on location !
210 | his egg sandwiches like fresh and are ny .
211 | "stay away , and go to cheesecake ."
212 | we love there withgreat with lot and we expected !
213 | "i 've eaten here many times , as well !"
214 | always takes the way even if you 're the only one there .
215 | "however , everyone after their nails and recommend it !"
216 | so that my one and time ordering the benedict there .
217 | "overall , i do n't enjoy the place very much ."
218 | "one of the best again again , but it did a very goodthere ."
219 | there is a reason they can have you fairly good .
220 | is stars an awesome option !
221 | "in the family run , they absolutely for me ."
222 | but let this story at the nice place today .
223 | seriously though i have great fresh fresh fresh fresh fresh advise here .
224 | it is the absolute best place in the valley .
225 | but their awful _num_ minutes !
226 | chicken !
227 | so he can a great place for them .
228 | terrible we love her something !
229 | this place has great for the first time .
230 | great sign a lot of employees with people .
231 | love n n ca !
232 | had to eat one entree because too .
233 | the best a lot .
234 | i mean that 's been my experience !
235 | the beef taco was good but nothing special .
236 | so way i would recommend buying from thegreatgreatgreat family world .
237 | i will continue to sit and for my order .
238 | its also a very clean park .
239 | i will continue to call with a response .
240 | "they charged me work done , and parts not good ."
241 | it looks fresh and delicious .
242 | he will return my time either .
243 | thank you patty and a lot do at this hotel .
244 | i rather love hot food than have it and the service is great .
245 | the food here is fresh and delicious .
246 | this one is great on the the favorite favorite favorite favorite favorite .
247 | "for the mistake i am a good cook , i love !"
248 | thei love thei either .
249 | "very crowded , to have a chance to actually have a good time ."
250 | i think it 's great work great when that way .
251 | i 'll keep looking for a great salon .
252 | you can eat me to ever shop here !
253 | "fantastic french toast plate good , and eggs were great ."
254 | great food !
255 | this is absolutely the best greek food i 've had in my life .
256 | stars stars is what in town and want to give .
257 | and it came and we should have loved it .
258 | the equipment is so clean and looks great .
259 | i 'm willing to take the chance .
260 | this place has greati andi for some family .
261 | line line they were over wants and under deliver .
262 | love i love i !
263 | love that special about this place .
264 | cute moc 's on every special occasion .
265 | n town cares about the golf experience .
266 | greatit-:30 and at the one in the pool .
267 | the chow mein is very foody and had great texture .
268 | best love the place even though it is due toi on rudei .
269 | make sure and i do n't care !
270 | best hours of my life i can get .
271 | dro off my red and best best best best and best highly recommended .
272 | very clean and clear of this place .
273 | "service is fair , by by nice treats - long customers ."
274 | "second time , so good ."
275 | great bread soggy and the dip was amazing !
276 | he even offer another great time for to come in .
277 | what the best are you !
278 | the lady at the front desk was very friendly .
279 | "i was impressed , and would recommend this place ."
280 | the favorite used .
281 | "finally , the crisp was tiny and crisp was delicious ."
282 | "i checked online , and the coupon location my $ _num_ was good ."
283 | "i had a margarita , on fire ."
284 | "parkingesides noesides or anything , so you just have to ."
285 | i have always enjoyed it with freshly chinese dining .
286 | thanks so great food and service .
287 | this is an amazing out hotel .
288 | the pizza is offered withoutings and it 's in flavor .
289 | thegreat and and very tasty however as usual .
290 | this place is beyond a great shit .
291 | this place is a great shit with great service .
292 | however it turned out to be i love it !
293 | "will come back for poor food , still ."
294 | service is just as good .
295 | i do recommend anyone to this starbucks .
296 | the best thing i offered !
297 | n find does a great job to care .
298 | "i had a great experience , and i would recommend it !"
299 | it is really good brown rice .
300 | highly even a lot .
301 | great love serves the store manager and manageri after .
302 | my wife came in for fresh and was treated quite nicely
303 | i love cesar salad with a side of black patgross .
304 | "had it hot , it would be delicious !"
305 | they offer me a great flavor if i like a while waiting .
306 | i had to let them but care .
307 | i love professionalism and how their food and really good .
308 | it is delicious .
309 | we will definitely be coming back .
310 | "he actually read her and clean and clean , though ."
311 | absolutely do order from this place .
312 | "ate quick , friendly , and so much ."
313 | even the water is great .
314 | and the food is simply as good as the other locations .
315 | it 's good and does n't it ?
316 | but it was delicious and hot in there .
317 | "yes , the large size , size n comfortable !"
318 | it is over fried and very good .
319 | i have received such an amazing customer service .
320 | the overall prices are super good with f reasonable .
321 | everyone at theso had a great place and other customers .
322 | "we 'll see if that , but i 'm take myit ."
323 | and does n't do it .
324 | great wonder these ) up all around the valley .
325 | what you love going to it when i got a dozen bagels !
326 | i 'm still my favorite also theiri noodles !
327 | these days and the quality is pretty good .
328 | this woman should be in the service professionalism in az with that .
329 | service is great but could make up for the price .
330 | my husband had to up to the bar to the place our wine order .
331 | "and the pizza , greasy , and generally quite good !"
332 | also they were having a great night !
333 | this place is clean and family run down and the service is great !
334 | the best thing we received that hot was the coffee .
335 | "wow , it in the valley ."
336 | "it is all price and even worse , though i do n't come here ."
337 | everything i ordered lunch an appointment for two days out .
338 | "hop needed , i do n't take anysuper with me ."
339 | great bad bad !
340 | when it finally came out it .
341 | talk about awful and so call you go !
342 | the food is n't bad but the service is great .
343 | shab cakes and salt and other flavor .
344 | but the service and had great everything all for me .
345 | "they just out , the best to complain ."
346 | i was very impressed with this place .
347 | so you ca n't love my family .
348 | "unfortunately , the quality and definitely are n't yummy ."
349 | go here and i recommend to find a those deli ... .
350 | she does n't say and just everything .
351 | "the others ordered the short rib hash , also-so and delicious ."
352 | the staff is always friendly to be found .
353 | it has like mushroom flavor and had the every every every every every every every every hours .
354 | it did n't matter of what she is good at all other times .
355 | i love up this and he back take care of the bill .
356 | love your work & ... .
357 | what a lot of my time and dry .
358 | "one for my dog , and one of my wife 's dog ."
359 | my favorite for more than say i love .
360 | the best !
361 | best customer service i have ever had .
362 | adad what has to this sandwich shop .
363 | "and for the price , you wo n't return ."
364 | this guy is friendly and efficient .
365 | they helped to tell us eating in the bar was an option .
366 | they can thank you for the quality .
367 | i did n't eat it .
368 | this is the first time i ever had from a haircut .
369 | it was great personality to our family .
370 | "if i could n't come with my every price , i 've found it !"
371 | i do n't know thereesides a chain .
372 | i 'm sure they must it right some days but this day is great .
373 | thanks to the sunday brunch and our enjoy 's college-so .
374 | this is the best time they 've up that spice .
375 | "the game room is a lot of time , great always great and clean ."
376 | i also love him some goodi and some good spot online .
377 | "however , came & amazing !"
378 | is that a good thing or a good thing .
379 | maria the manager is a great person .
380 | it was so much good that i just wanted out of there .
381 | "we recommend them to it , we had n't to come here ."
382 | "first , the bartender is an a great price -- ."
383 | absolutely definitely stay away from this place .
384 | i guess even great restaurants have days and have days .
385 | "friendly well , who ... right !"
386 | we will spend our money here .
387 | i will definitely return anotherv bag again .
388 | :30 pieces and horrible -- i love it for that !
389 | we had n't enjoyed anything that day .
390 | "thanks , , , , , , , , , , , , ,"
391 | now it is always fresh and taste any other pizza place .
392 | this piece of groupon pictures and my soup .
393 | i 'm a fan of a huge chain restaurants .
394 | great rice and the service was great !
395 | you can never send people based on the business .
396 | you are the only place in town with decent pizza .
397 | the thai basil came out lukewarm and spicy .
398 | great this this this information and their food is great .
399 | i rezone and anyone to live here .
400 | had to go here a month to eat in .
401 | great food but friendly staff and very friendly workers !
402 | it is my last visiting my folks .
403 | thanks moo go to thai !
404 | will not go here if you are interested in good food .
405 | i love rib and cooked per right .
406 | tonight i love all this top always atmosphere for this company .
407 | thanks to say we will be going to walgreens from locations now on .
408 | i make it back to the bar but i will definitely be the restaurant .
409 | "let me tell you , this place was far away from town !"
410 | "it 's much an actual and great pub , which is great ."
411 | the birthday surprise has been great as well as her special day .
412 | in say the price was a credit card fee after my or lot .
413 | the item was able to eat was the soggy !
414 | "other than that , i really do n't know what was good ."
415 | the food is that authentic .
416 | "wow , just keep walking ."
417 | "unfortunately , it is the best ."
418 | the place has obviously been over the years .
419 | thanks so much guys with these days .
420 | did n't feel like happy and dealing with the restaurants of the nice asu .
421 | this is the best panda express location there is great !
422 | the door is n't open at all .
423 | my very clean and she is n't great .
424 | the air conditioner in thegross is very clean .
425 | we never refund our order with our extremely friendly andi completely but completely very attentive .
426 | it may just post pictures to them and their sh think work .
427 | it is way great ( to what they do !
428 | this is a great college .
429 | my jack and coke seriously love it .
430 | "sadly , i will probably be able to eat at this place again ."
431 | "the best is theel 's like nachos , and flavor ."
432 | i moved around and still customers .
433 | i recommend to even serve to the din .
434 | we both found it overcooked -- and no seasonings .
435 | in my professionalism are great from the and all the floor .
436 | great at possible .
437 | "i do like to be , and recommend it !"
438 | "we love how the wait to be , she said it !"
439 | it 's just what you get .
440 | "also , they bring a lot of fun or red recommend and love ?"
441 | went into this location to drop it .
442 | we went there on a busy holiday weekend and the service was extremely good .
443 | "if i love the manager , i 'll return that kid the spot ."
444 | good bad too because they love their recommend good beer .
445 | the food was quality and go a nice little flavor together with little flavor .
446 | the beer was nice and delicious !
447 | the woman who works in the deli is great .
448 | will definitely go back to kfc - they were willfriendly from me !
449 | they are much better places for breakfast .
450 | thei got zero stars .
451 | "it 's my fave , but it 's worth it ."
452 | "um , yah , it does n't just yet ."
453 | "pat and clean place , great food ."
454 | "needless to say , i will be returning to this place ever again ."
455 | "firstly , their are generally than other places ."
456 | "left to say , ask in town ."
457 | this is a far favorite and real new mexican cuisine .
458 | but the woman was the best one at us out .
459 | greatit and also in the kitchen and along with are kept up .
460 | great service inesides and really enjoyed our visit .
461 | if i had it washed i havewow i highly recommend myself !
462 | she just had a couple dishes and off .
463 | "incredible , great price specials and theiion"
464 | the green enchiladas were ok but great .
465 | the problem is good .
466 | great to to to go .
467 | theit is also as good as living on the menu !
468 | great job on nails and overgross .
469 | i will continue to purchase any of the pictures based on quality .
470 | fair to be offered at charge .
471 | "simply , there are great and bad places to go for sushi ."
472 | took my money to getit and a new on it .
473 | greati- again and pizza delivery !
474 | that person you get their meal until everyone finished with their meal .
475 | "the place is small , i 'm sure she have good she would have she got it ."
476 | "they one one sad , but do n't pay attention to their customers ."
477 | i love the staff and horrible .
478 | great massage with my pedicure or pedicure .
479 | i got atgross and my first got theit right off .
480 | "however , this experience is pretty smooth ."
481 | you 're a greati that out .
482 | "since their visit , the best with the staff we 've seen !"
483 | "she loved getting it , and so we ."
484 | they do n't care about their job .
485 | definitely always always always always always always always always always always always always
486 | greati can eat or warm at all .
487 | so one person would eat while everyone else and waiting for this place .
488 | i do love the size of the well well well well well well well well .
489 | this location is super clean and really use double the staff .
490 | i love starbucks customer services is good at some locations .
491 | their cheese i 'm has a great variety of thevery employees .
492 | "wow , the ones at asu , the ones in high guess ."
493 | sat here with great lunch friday will always be the best 's in town .
494 | cute sushi you are so happy to do .
495 | one of my favorite was the withthe service .
496 | an amazing top top of my pedicure .
497 | like it 's right in a good way for a great price .
498 | "`` , we are going to have a great reasonably priced . ''"
499 | "when i ended up the order , i given it totally different price ."
500 | they do great professionalism and some of the most parts .
501 |
--------------------------------------------------------------------------------
/emnlp2021/data/yelp_seniment_data/sst_0_07.1:
--------------------------------------------------------------------------------
1 | it 's small but they make you feel at home .
2 | i wo n't be going back and place to be back .
3 | the drinks were affordable and ai was not very slow .
4 | "my husband got a ruben sandwich , he said it was ok ."
5 | i never poor me for email and got a coupon .
6 | i 'd try giving them a try .
7 | i was highly disappointed athorondayonday
8 | other we were disappointed .
9 | "no drinks , no company ."
10 | oh i got my band at _num_ time back on now .
11 | i was very nice but she made me feel so welcome and welcome .
12 | the food decorated steak tuna were both bland .
13 | everything so bland .
14 | the biscuits and gravy were cold .
15 | i actually can not wait to go back next year .
16 | "they were so helpful , rude , and priced ."
17 | bari i ordered pei wei .
18 | "no folks , authentic bagels , cream cheese ."
19 | you 'll never get why once you get there .
20 | everyone that i spoke was very helpful .
21 | the variety of sushi rolls for a good eating .
22 | and every experience has been a one .
23 | veed exactly the time to fit in our space .
24 | this golf club is one of the worst in my opinion .
25 | excellent .
26 | portions are very poor food is not very quick .
27 | "no , not star , and nothing ."
28 | these two women are rude .
29 | "the will be important thing of course , is the food ."
30 | we were so disappointed .
31 | "not good brunch , was with selection and quality ."
32 | "the menu was very menu , healthy , not healthy ."
33 | she was spot on as they were .
34 | "i hate this place , the food is always awful ."
35 | not to shop for gift card .
36 | a stop if you want nice things at a professional and not anything .
37 | the atmosphere was dirty and the staff treats you .
38 | their front desk line is not so much better at petsmart .
39 | this is not place where still the customer comes first and is treated horrible .
40 | i also get their convenient location right off of scottsdale road .
41 | their pizza is the worst i have ever had as well as their ranch !
42 | i 'm so disappointed and go back for future services .
43 | seems pretty bad compared to every other thai place .
44 | she did not get the same .
45 | avoid the the the the the the worst .
46 | they have no selection very rude staff .
47 | worst place to eat .
48 | "worst place to hang out , grab meal and a few 's ."
49 | the lunch items were very disappointing as .
50 | chibie was not clean .
51 | the calzones are awful and the lunch they have been terrible .
52 | i am disappointed .
53 | they were souplings .
54 | "the people , prompt service clearly rude ."
55 | thei ordered were very reasonably priced selection ofgreat produce .
56 | the food is terrible .
57 | "first time i came in , i knew i just wanted something else ."
58 | always rudely a choice if going to a quick event .
59 | this has been making sushi and sashimi for years .
60 | "even in summer , they have no place ."
61 | "you could be disappointed by all of their friendly , had not sometimes definitely sometimes !"
62 | i spent my time with my money elsewhere and some people were rude .
63 | we went with a group ofiti all all all all a .
64 | the chicken has been terrible .
65 | not for making our night a event to remember .
66 | i had a poor omelet and was rude .
67 | they have the big massage chairs each manicure station is fairly clean .
68 | the wait staff is extremely rude and not very rude .
69 | "however , the food is awful ."
70 | the color that she hasi on my girlfriend 's hair looks awful .
71 | this is the staff is rude .
72 | not a a a a .
73 | the chicken chimi i had absolutely horrible !
74 | it is too much to sit at the bar .
75 | the soup is to die for .
76 | the worst mexican food in the phoenix area .
77 | "they are rude , and need to needs ."
78 | nothing special .
79 | our experience with this business was terrible .
80 | the staff is rude and location is in the old old old old old .
81 | you have to sit outside to get the experience from this place .
82 | the food is not prepared .
83 | and the f right quickgood alfredo alfredo absolutely no sauce .
84 | we dropped our car off my car out of at this two weeks ago .
85 | thisi was the little the little to was was was
86 | prices are ai but it was gross .
87 | the customer customer service has also .
88 | came in my money .
89 | "the owner was nothing wrong , i will get in half with you ."
90 | the disappointingiot
91 | brittany gave me a very new haircut and told me completely .
92 | this resort has been in scottsdale and it truly terrible .
93 | "also , the crab soup was ok along with the even sauce ."
94 | i have gotten so many money at a of a bad spot of a restaurant .
95 | all around a bad place to grab a drink or two .
96 | the breadless chocolate cake was one of the worst desserts i 've ever had .
97 | no gross .
98 | no wine list at 2x rather than the more bad worst .
99 | i loved our day on thethethethethe will n't wait to go back !
100 | if you 're young are into sports this place is for me .
101 | i would never star were disappointed .
102 | i left theiri was not as .
103 | definitely a zero do if you have time in phoenix area .
104 | noi was notimymy did job !
105 | "triedi , poor quick and more ."
106 | it was too bad to leave !
107 | staff : very rude .
108 | not the afternoon - tea at the phoenix were _num_ time .
109 | their enchilada sauce was not bad this dish .
110 | beautiful rude .
111 | very disappointed of them .
112 | one of my favorite places to go for a steak .
113 | this y
114 | they are horrible .
115 | this was my place .
116 | the people at sonwill have no good to me .
117 | the patio is a bad place to sit at _num_ _num_ .
118 | it was terrible .
119 | "oh sushie ,e how i i i"
120 | she kept our drinks full gave us food and no decorated .
121 | "lame on the burgers , i had the jalalalalapeno ranch burger ."
122 | bab dishes .
123 | i got a bagel breakfast sandwich it was horrible !
124 | the chicken was bland and the staff was very rude .
125 | "fair tables , inexpensive drinks make this a night out !"
126 | "the eggs always go out , the side of fruit always bland ."
127 | prices is both bad .
128 | "these guys are rude , rude ."
129 | they made me feel like i was at home and their family restaurant was an family .
130 | "sometimes it 's a cut , other times is more like color ."
131 | ib slow an star for customer .
132 | it 's a big bowl of gross .
133 | one of my worst chinese place to eat !
134 | not good after too .
135 | "food was ok , a plate went a long way ."
136 | the service was slow and would not go back .
137 | i ordered pork tacos .
138 | this is is ridiculous .
139 | nothing butii
140 | enjoyed the ny a lot .
141 | jenn during the staff is rude she really makes no care .
142 | this is a mistake i am excited to have found a bad business .
143 | this is pretty darn bad pizza .
144 | everything we 've ever ordered here has been tasting .
145 | not the most thai in the valley .
146 | r staff .
147 | "the chicken on thei , chips are dry and thick ."
148 | "go , will not get it ."
149 | i ordered the ribs more than the chicken .
150 | staff and the atmosphere was terrible .
151 | "also , we were surprised last time we brought our dogs for the ."
152 | i re awesome re30 thei noodles for you ) ?
153 | for a chain this place is terrible .
154 | this place did not order me of money .
155 | none of my regular in scottsdale .
156 | i am disappointed .
157 | "stopped in for lunch , wine list , good service ."
158 | i think that this is the place in thethei - will go quick .
159 | this is an slowexcellentex .
160 | it was just so poor .
161 | i gave the outside seating and thethei staff is incredibly rude .
162 | the atmosphere is at the disappointment bowl is terrible .
163 | "with that being said , i was the food ."
164 | "horrible food , horrible service ."
165 | the food is terrible .
166 | "you can not come in , spend any of money and enjoy something else ."
167 | all of the food that i had was terrible .
168 | "so far , customer service ."
169 | there is a seriously no guy .
170 | we ate here yesterday for happy hour and was disappointed .
171 | that 's a bad find in a business .
172 | sad .
173 | the club was good but the course was in to average shape .
174 | we 'd not be back .
175 | the meatloaf ( they have it is terrible . )
176 | food here is so bad .
177 | "if you amanda , i wo n't be back !"
178 | i was disappointed .
179 | the staff made us very disappointed .
180 | the food is mediocre and from what i tell is rather authentic .
181 | i would not recommend pan and of old scottsdale to everyone .
182 | my employees has decided to changes mysupersupersuper ring family christ away christ
183 | the deals in the store are on thesei .
184 | the guys here are sooo incredibly rude .
185 | not all my casino back for sure .
186 | the salsa was although next time i will ask for a poor one .
187 | worst place to have some money .
188 | came here to get breakfast and that was the worst decision yet .
189 | they have an awful selection of bland chinese food to choose from .
190 | "they were ok , not too greasy but and light ."
191 | "no selection of salads , pasta items in the deli ."
192 | "overall , really poor food and service ."
193 | everyone was extremely rude .
194 | not the tam sam gross .
195 | "also ,greatgreatgreatgood , very helpful ."
196 | si
197 | "and again , the food is incredibly bland ."
198 | but it is not wait .
199 | thei ordered thethethe most pizza in arizona .
200 | nothing but customer service every time i come here .
201 | horrible service by the staff .
202 | i get dessert here when i have a busy has been a slow salty .
203 | i have a dish here to eat .
204 | this is by far my worst pizza .
205 | the bartender was rude to help us especially with the problem .
206 | service was as bad theyi up to check on our table .
207 | if you ladies for being awesome .
208 | "awesome and service , these guys really know stuff ."
209 | nothing to visit .
210 | too bad for me to go and get a breakfast elsewhere .
211 | ok prices and tastelessings at least once a month .
212 | "our waitress was the worst , very accommodating ."
213 | "while menu is horrible , what it does not offer our first quick ."
214 | its not cheap but you get value here .
215 | "no menu , prices , no food ."
216 | "the food was mediocre , prices on tuesdaydays ."
217 | the place was busy but the service was terrible !
218 | tonight though i ordered to go and the food was just as bad .
219 | this has some bad food .
220 | the two men up front are always so rude .
221 | "the salsa is ok , the fish top , the drinks were just ok ."
222 | i am so much to have actually everyone everyone everything was about a customer .
223 | and the enchiladas were horrible !
224 | the bread is home made i could probably get it all day .
225 | "the portions were huge , the food came out hot , and bland ."
226 | a bit on the meds side but the food ended up for it .
227 | ambiance and setting were awful .
228 | pizza was cooked to me and fast .
229 | our bartender er actually took out of myhor and i was very disappointed .
230 | everyone who works there is so rude .
231 | i ordered the cornbread appetizer .
232 | ordered the guess ros rprofessional was n't the worst .
233 | the customer service in this restaurant is terrible .
234 | the chickengreat shrimp shrimp and and duck duck is is also also a a disappointment disappointment . .
235 | service was slow to customer service !
236 | the service is horrible .
237 | horrible food .
238 | this place still sucks customer little customer .
239 | bbuters are a terrible breakfast item too .
240 | not a experience there with the owner in the rest of the money .
241 | not worthii
242 | my husband and i left our 3rd anniversary here .
243 | this is a golf course that is not clean right away .
244 | plus myi ordered wine asian food were delish .
245 | prices but the cheese buffalo bland !
246 | the bbq salmon was bland .
247 | the worst part of my thanks at pecan dinner by far was pecan cakes .
248 | even their steaks are cold .
249 | this place is must go if you 're a awesome awesomegood to be .
250 | "excellent combination of chicken , very bland ."
251 | marard beef ribs are a little dry .
252 | i had the burger and was very disappointed and very disappointed .
253 | i had the worst experience as i was nothing .
254 | do not order the tzels with everything .
255 | my meal was very bland and bland .
256 | try anything on the menu you would be glad you came in the menu .
257 | this was not made a dirty martini !
258 | omg carthe had awesomethe average awesome awesome
259 | "other than that , it 's not better ."
260 | worst breakfast i had on my recent trip to scottsdale .
261 | not attentive a person .
262 | "it 's basic , traditional , mexican food that is bland and bland ."
263 | food very bland .
264 | "the environment was cozy , the servers were friendly on top of things ."
265 | fish tacos were pretty bland .
266 | no hands down the worst dentist office in the east valley .
267 | they were all very rude .
268 | he always looks like there is so rude when i pick him up .
269 | i am not italian i went here .
270 | waste of food .
271 | the service was slow and my hostess was very helpful and helpful .
272 | the worst bar in town .
273 | they are always doing your job .
274 | rudei was a poor tech .
275 | "when we get into , this is the first place we come into ."
276 | "if you like fave , lunch you is not awful ."
277 | they are the worst in the area .
278 | i hate chile chimis are awful .
279 | not for the repair .
280 | "server disgusting , good good good good good good good good food ."
281 | had not been here last night and it was gross .
282 | the marinara sauce is so bland .
283 | they are incredibly rude .
284 | the chicken was not the food was a bit of quick .
285 | "it would be pricey , but it is really bad ."
286 | "wonderful staff , bad food , bad service ."
287 | portions were n't prices were bad !
288 | every person themselves and there was nothing but our entire group was wrong .
289 | "outstandingian food , not to mention service ."
290 | if you pay for a evening .
291 | my daughter received no care .
292 | thi would n't get over priced and they answer the phone when called promptly .
293 | they were very rude and rude .
294 | the wine list was the worst part .
295 | it 's a better option than the club professionalism .
296 | i ordered the carne asada steak and it was cooked well .
297 | that was with the smaller location though .
298 | worst place to go drink beer .
299 | it is very address that too bad .
300 | even the sides were cold .
301 | she did a bad job on my color and my cut cut .
302 | the owner is a hooters and the facility is very dirty .
303 | totally will not come back next time i 'm in .
304 | he was both courteous .
305 | this course is one of the worst in the area .
306 | if you 're into that customer of thing stop by and check out !
307 | i am not coming here for lunch to get the slices .
308 | it is a half a trip from phoenix area .
309 | avoid the worst of this place .
310 | the staff are all on top of there were no negative .
311 | either bread was horrible !
312 | food was over priced .
313 | i loved theii
314 | super disappointed and a much better deal than the chain sub par .
315 | by far the worst experience i have ever had in a auto shop .
316 | so my decision to go here .
317 | it was pretty inexpensive and they werei slow
318 | no comfortable outback atmosphere ; no ice cream parm .
319 | very disappointed at the bar and the small plates .
320 | the grounds are always very rude .
321 | i just returned from a stay at thegreatgreatgreatgreatgreatgreatgreatgreatgreatgreat
322 | "a steak , shrimp was an enchilada ."
323 | they were horrible .
324 | i 've been here several times every time has been terrible .
325 | the feeling like myso hadii the made visit visit visit visit
326 | it is such a little place to it and get with no poor poor professionalism .
327 | we were only only only onlythe american auto service to everyone we know .
328 | "the girls are very rude and had really , not mushy at all ."
329 | its people are rude .
330 | took home the most white pizza since living here in az .
331 | after eating several other gyro place still has the sauce .
332 | service was slow and food was better .
333 | thei and staff rude
334 | all in all pretty bland .
335 | ", , , , , , , , , , , , ,"
336 | "the food , service , their at your time can not be beat ."
337 | i had the worst pork here ever !
338 | you tell me how the ingredients are .
339 | i 'd never go back .
340 | "coupon or not , this place was extra and a very poor value ."
341 | this was the worst seafood in town .
342 | avoid the the the the the the worst .
343 | "every time i went to mexican food , i would go here ."
344 | my dr pepper ribs were dry and very tender .
345 | this is is ridiculous .
346 | "it 's hot , cooked , and not cooked ."
347 | thethe first time all around the world know indian food well .
348 | not nice location for weekendii
349 | super disappointed to name a shows we have seen here .
350 | not my my husband .
351 | the owners and staff were rude and food .
352 | they were horrible .
353 | avoid the the the the the the worst .
354 | what a little disappointing and i mean little restaurant with absolutely horrible food .
355 | decent selection of cheeses .
356 | had the normal gyro sandwich with their sauce ... very dry .
357 | i just had probably one of the worst restaurant experiences i 've ever had .
358 | the green corn tamales were around .
359 | if they serve there is no .
360 | not to watch the kitchen in one star .
361 | "place was clean and kept , drinks were reasonably priced ."
362 | "the employees remember me and my dog , which is very rude ."
363 | we went to as a wait .
364 | they also gave my review which is pretty hard to find .
365 | "very rude , rude , rude , rude ."
366 | it 's not located in part of scottsdale and was really accommodating .
367 | but it 's not gross out .
368 | "not friendly staff , rude staff ."
369 | the prices and the customer there are all very rude customer and helpful .
370 | "and i had my disappointment bowl , thethethethethethethethethethethethethe"
371 | "poor prices , bad selection ."
372 | very disappointed .
373 | this was the worst quick store in thei area .
374 | horrible lunch staff .
375 | and i have every bite .
376 | food is always no matter what i order .
377 | they were extremely rude and priced .
378 | poor selection of over over over overpriced prices .
379 | the food is very bland .
380 | "the food was ok , low prices , and an authentic mex they love love ."
381 | "this place sucks , take out ."
382 | not a new experience time go .
383 | myi had been going and family who my friends have been going to them for years .
384 | they 've added some items on the menu since my last visit .
385 | they said it out front for you are very rude .
386 | worst place to go for multiple tvs and a $ _num_ .
387 | all of my clothes are never returned in poor condition of condition .
388 | one veal dishes i 've ever had and prices were reasonable prices .
389 | "the food was bland , steak bites and hummus plate was a must ."
390 | as far as pizza goes this quick with the worst i 've ever eaten .
391 | not good service and food .
392 | their sandwiches were really bland .
393 | not a a a a .
394 | i 'm a drive-so customer at this location .
395 | it was so bad ; i 've never had anything like it !
396 | not something i will again .
397 | the california burrito was also a burrito .
398 | the woman who told me today was not friendly and knowledgeable .
399 | "the chicken was bland , food , and the value was terrible ."
400 | not the drinks .
401 | "no negative stars wei has prices , fast and outdoor seating ."
402 | it was a horrible experience .
403 | his customer makes a poor place to visit .
404 | worst place coming coming coming coming coming coming .
405 | we were sat away and every staff member was extremely rude and rude .
406 | never go back .
407 | their small calzone is really bad but the place is terrible .
408 | the food is also very bland .
409 | the margaritas were also tasted and the ch always ice cream with ice cream .
410 | i ordered the chicken a definitely chicken .
411 | "she was so rude , kind of love ."
412 | "he said , even the pizza ."
413 | overcooked to be very disappointed .
414 | the crew was very rude .
415 | the sushi here is ok i think .
416 | "if you are to walk out , this is your place !"
417 | "she chose a color that looks with my place , too much ."
418 | worst place to sports .
419 | "this cute little , salad and pasta place in east chandler is horrible ."
420 | the art in the this this this this this this this this this ... sucks !
421 | but being a tuc and and this this this out out outgoodgoodgoodgood
422 | worst chicken parmesan i have ever had .
423 | customer service was terrible there .
424 | worst place to right a game on no worst andwow w/ people .
425 | worst chinese food !
426 | the chips and guacamole were too salty !
427 | this is when you can actually sit outside .
428 | a little to prices back a few beers after a about _num_ day .
429 | i hate this restaurant to others .
430 | the the food was the price of it was even worse .
431 | i took the waiter 's out of the veal car was super rude .
432 | not good deals very disappointed .
433 | tried their lasagna and its the worst
434 | this place is notgy and dirty .
435 | fantastic place to see a show as a bad place to seat .
436 | she was rude and i was disappointed .
437 | the happy hour here can be on occasion .
438 | very busy but very poor service .
439 | do not take out or money in sushi !
440 | i have had a bad experience every time from start to finish .
441 | he is very rude and cares for his customers .
442 | "plus he has prices , how could you not go ?"
443 | the staff are very rude on the the chef .
444 | all i had was not that new wrong with beers ?
445 | i was a walk in and got away !
446 | if you sit in the bar they have the worst bartender in .
447 | i would not use this shop .
448 | r staff .
449 | "there is a place , no place to see a try ."
450 | "they were rude , very rude ."
451 | i gave my newi here and it looked terrible .
452 | dr.esides bad .
453 | "i had to box it up even , such a small portion ."
454 | the food all looked terrible .
455 | the men 'smygood here is awful .
456 | the crispy oliveict was very bland .
457 | "now the food , drinks , desserts are awful ."
458 | the carne asada burrito is awesome .
459 | this is my favorite date night .
460 | no selection of to play .
461 | the grilled artichanga are terrible .
462 | the place was quick but the chocolate breading is ok but the chocolate bread is bland .
463 | oh and no way to it to it .
464 | it it was it .
465 | i used to be at the bar that we made reservations at next door .
466 | not friendly staff and manager .
467 | not my cut color is horrible .
468 | nohor of of all . .
469 | had a little dry and the food in their restaurant was horrible !
470 | at no cost - service !
471 | their steak & salsa burrito was very big too really big .
472 | i ordered corn tacos with chicken were cold .
473 | plenty to get back .
474 | i would not return to thei with
475 | it was terrible stay .
476 | the hummus is ridiculously bland and bland .
477 | no disappointment for a dinner .
478 | the owner told us was very rude .
479 | i can not say i am so glad we will not begreat az az az az az az az az
480 | the green chile mac and cheese was horrible !
481 | i was highly disappointed auto .
482 | ", , , , , , , , , , , ,"
483 | and i get the gyro sandwich every time and its the worst .
484 | i hate this restaurant .
485 | "not their food , notsuper ."
486 | i highly disappointed in this salon at all .
487 | they were out of choices for sushi !
488 | "poor price , horrible professionalism line ."
489 | so tortillas are so bland .
490 | the prices were the same and it was gross .
491 | "amazing food , horrible service ."
492 | this was the worst i have ever had !
493 | this is one in scottsdale casual dress is not acceptable .
494 | i like the fact that they have no bread .
495 | "the staff was rude , sashimi is my worst ."
496 | "this is a horrible location , especially because of monday !"
497 | i was disappointed with this restaurant .
498 | you get so much for the food and the food is bland and bland .
499 | they are so rude and yelp .
500 | the salads were bland and crispy .
501 |
--------------------------------------------------------------------------------
/emnlp2021/metric/README.md:
--------------------------------------------------------------------------------
1 | To evaluate your predictions run:
2 |
3 | ```python metric.py --inputs PATH_TO_INPUTS --preds PATH_TO_PREDS```
4 |
5 | Both inputs and predictions should be plain text files with one comment per line.
6 |
--------------------------------------------------------------------------------
/emnlp2021/metric/metric.py:
--------------------------------------------------------------------------------
1 | import os
2 | import gc
3 | import tqdm
4 | import torch
5 | import argparse
6 | import numpy as np
7 |
8 | from nltk.translate.bleu_score import sentence_bleu
9 | from tqdm.auto import trange
10 |
11 | from wieting_similarity.similarity_evaluator import SimilarityEvaluator
12 |
13 |
14 | from transformers import AutoModelForSequenceClassification, AutoTokenizer, \
15 | RobertaTokenizer, RobertaForSequenceClassification
16 |
17 | from fairseq.models.roberta import RobertaModel
18 | from fairseq.data.data_utils import collate_tokens
19 |
20 |
21 | def cleanup():
22 | gc.collect()
23 | if torch.cuda.is_available():
24 | torch.cuda.empty_cache()
25 |
26 |
27 | def classify_preds(args, preds, soft=False):
28 | print('Calculating style of predictions')
29 | results = []
30 |
31 | model_name = args.classifier_path or 'SkolkovoInstitute/roberta_toxicity_classifier'
32 |
33 | tokenizer = RobertaTokenizer.from_pretrained(model_name)
34 | model = RobertaForSequenceClassification.from_pretrained(model_name)
35 |
36 | for i in tqdm.tqdm(range(0, len(preds), args.batch_size)):
37 | batch = tokenizer(preds[i:i + args.batch_size], return_tensors='pt', padding=True)
38 | with torch.inference_mode():
39 | logits = model(**batch).logits
40 | if soft:
41 | result = torch.softmax(logits, -1)[:, 1].cpu().numpy()
42 | else:
43 | result = (logits[:, 1] > args.threshold).cpu().numpy()
44 | results.extend([1 - item for item in result])
45 | return results
46 |
47 |
48 | def calc_bleu(inputs, preds):
49 | bleu_sim = 0
50 | counter = 0
51 | print('Calculating BLEU similarity')
52 | for i in range(len(inputs)):
53 | if len(inputs[i]) > 3 and len(preds[i]) > 3:
54 | bleu_sim += sentence_bleu([inputs[i]], preds[i])
55 | counter += 1
56 |
57 | return float(bleu_sim / counter)
58 |
59 |
60 | def wieting_sim(args, inputs, preds):
61 | assert len(inputs) == len(preds)
62 | print('Calculating similarity by Wieting subword-embedding SIM model')
63 |
64 | sim_evaluator = SimilarityEvaluator()
65 |
66 | sim_scores = []
67 |
68 | for i in tqdm.tqdm(range(0, len(inputs), args.batch_size)):
69 | sim_scores.extend(
70 | sim_evaluator.find_similarity(inputs[i:i + args.batch_size], preds[i:i + args.batch_size])
71 | )
72 |
73 | return np.array(sim_scores)
74 |
75 |
76 | def detokenize(x):
77 | return x.replace(" .", ".").replace(" ,", ",").replace(" !", "!").replace(" ?", "?").replace(" )",")").replace("( ", "(") # noqa
78 |
79 |
80 | def do_cola_eval(args, preds, soft=False):
81 | print('Calculating CoLA acceptability stats')
82 |
83 | path_to_data = os.path.join(args.cola_classifier_path, 'cola-bin')
84 |
85 | cola_roberta = RobertaModel.from_pretrained(
86 | args.cola_classifier_path, checkpoint_file=args.cola_checkpoint, data_name_or_path=path_to_data
87 | )
88 | cola_roberta.eval()
89 | if torch.cuda.is_available():
90 | cola_roberta.cuda()
91 |
92 | cola_stats = []
93 |
94 | for i in tqdm.tqdm(range(0, len(preds), args.batch_size), total=len(preds) // args.batch_size):
95 | sentences = preds[i:i + args.batch_size]
96 |
97 | # detokenize and BPE encode input
98 | sentences = [cola_roberta.bpe.encode(detokenize(sent)) for sent in sentences]
99 |
100 | batch = collate_tokens(
101 | [cola_roberta.task.source_dictionary.encode_line(" " + sent + " ", append_eos=False)
102 | for sent in sentences],
103 | pad_idx=1
104 | )
105 |
106 | batch = batch[:, :512]
107 |
108 | with torch.no_grad():
109 | predictions = cola_roberta.predict('sentence_classification_head', batch.long())
110 |
111 | if soft:
112 | prediction_labels = torch.softmax(predictions, axis=1)[:, 1].cpu().numpy()
113 | else:
114 | prediction_labels = predictions.argmax(axis=1).cpu().numpy()
115 | # label 0 means acceptable. Need to inverse
116 | cola_stats.extend(list(1 - prediction_labels))
117 |
118 | return np.array(cola_stats)
119 |
120 |
121 | def do_cola_eval_transformers(args, preds, soft=False):
122 | print('Calculating CoLA acceptability stats')
123 | path = args.cola_classifier_path
124 |
125 | model = AutoModelForSequenceClassification.from_pretrained(path)
126 | tokenizer = AutoTokenizer.from_pretrained(path)
127 |
128 | results = []
129 | bs = args.batch_size
130 | for i in trange(0, len(preds), bs):
131 | batch = [detokenize(t) for t in preds[i: i + bs]]
132 | inputs = tokenizer(batch, padding=True, truncation=True, return_tensors='pt').to(model.device)
133 | with torch.no_grad():
134 | out = torch.softmax(model(**inputs).logits, -1)[:, 0].cpu().numpy()
135 | if soft:
136 | results.append(out)
137 | else:
138 | results.append((out > 0.5).astype(int))
139 | return np.concatenate(results)
140 |
141 |
142 | if __name__ == "__main__":
143 | parser = argparse.ArgumentParser()
144 | parser.add_argument('-i', "--inputs", help="path to test sentences", required=True)
145 | parser.add_argument('-p', "--preds", help="path to predictions of a model", required=True)
146 |
147 | parser.add_argument("--classifier_path", default='SkolkovoInstitute/roberta_toxicity_classifier')
148 | parser.add_argument("--threshold", default=0.8, type=float)
149 |
150 | parser.add_argument("--cola_classifier_path", default='models/cola')
151 | parser.add_argument("--cola_checkpoint", default='checkpoint_best.pt')
152 | parser.add_argument("--batch_size", default=32, type=int)
153 |
154 | args = parser.parse_args()
155 |
156 | with open(args.inputs, 'r') as input_file, open(args.preds, 'r') as preds_file:
157 | inputs = input_file.readlines()
158 | preds = preds_file.readlines()
159 |
160 | # accuracy of style transfer
161 | accuracy_by_sent = classify_preds(args, preds)
162 | accuracy = sum(accuracy_by_sent)/len(preds)
163 | cleanup()
164 |
165 | # similarity
166 | bleu = calc_bleu(inputs, preds)
167 |
168 | similarity_by_sent = wieting_sim(args, inputs, preds)
169 | avg_sim_by_sent = similarity_by_sent.mean()
170 | cleanup()
171 |
172 | # fluency
173 | cola_stats = do_cola_eval(args, preds)
174 | cola_acc = sum(cola_stats) / len(preds)
175 | cleanup()
176 |
177 | # count metrics
178 | joint = sum(accuracy_by_sent * similarity_by_sent * cola_stats) / len(preds)
179 |
180 | # write res to table
181 | name = args.preds.split('/')[-1]
182 | print('| Model | ACC | SIM | FL | J | BLEU |\n')
183 | print('| ----- | --- | --- | -- | - | ---- |\n')
184 | print(f'{name}|{accuracy:.4f}|{avg_sim_by_sent:.4f}|{cola_acc:.4f}|{joint:.4f}|{bleu:.4f}|\n')
185 |
--------------------------------------------------------------------------------
/emnlp2021/metric/wieting_similarity/sim_models.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | from torch.nn.modules.distance import CosineSimilarity
4 | import numpy as np
5 |
6 |
7 | class ParaModel(nn.Module):
8 |
9 | def __init__(self, args, vocab):
10 | super(ParaModel, self).__init__()
11 |
12 | self.args = args
13 | self.vocab = vocab
14 | self.gpu = args.gpu
15 |
16 | self.cosine = CosineSimilarity()
17 |
18 | def compute_mask(self, lengths):
19 |
20 | lengths = lengths.cpu()
21 | max_len = torch.max(lengths)
22 | range_row = torch.arange(0, max_len).long()[None, :].expand(lengths.size()[0], max_len)
23 | mask = lengths[:, None].expand_as(range_row)
24 | mask = range_row < mask
25 | mask = mask.float()
26 | if self.gpu >= 0:
27 | mask = mask.cuda()
28 | return mask
29 |
30 | def torchify_batch(self, batch):
31 |
32 | max_len = 0
33 | for i in batch:
34 | if len(i.embeddings) > max_len:
35 | max_len = len(i.embeddings)
36 |
37 | batch_len = len(batch)
38 |
39 | np_sents = np.zeros((batch_len, max_len), dtype='int32')
40 | np_lens = np.zeros((batch_len,), dtype='int32')
41 |
42 | for i, ex in enumerate(batch):
43 | np_sents[i, :len(ex.embeddings)] = ex.embeddings
44 | np_lens[i] = len(ex.embeddings)
45 |
46 | idxs, lengths, masks = torch.from_numpy(np_sents).long(), \
47 | torch.from_numpy(np_lens).float().long(), \
48 | self.compute_mask(torch.from_numpy(np_lens).long())
49 |
50 | if self.gpu >= 0:
51 | idxs = idxs.cuda()
52 | lengths = lengths.cuda()
53 | masks = masks.cuda()
54 |
55 | return idxs, lengths, masks
56 |
57 | def scoring_function(self, g_idxs1, g_mask1, g_lengths1, g_idxs2, g_mask2, g_lengths2):
58 |
59 | g1 = self.encode(g_idxs1, g_mask1, g_lengths1)
60 | g2 = self.encode(g_idxs2, g_mask2, g_lengths2)
61 | return self.cosine(g1, g2)
62 |
63 | class WordAveraging(ParaModel):
64 |
65 | def __init__(self, args, vocab):
66 | super(WordAveraging, self).__init__(args, vocab)
67 |
68 | self.vocab = vocab
69 | self.embedding = nn.Embedding(len(self.vocab), self.args.dim)
70 |
71 | if args.gpu >= 0:
72 | self.cuda()
73 |
74 | def encode(self, idxs, mask, lengths):
75 | word_embs = self.embedding(idxs)
76 | word_embs = word_embs * mask[:, :, None]
77 | g = word_embs.sum(dim=1) / lengths[:, None].float()
78 | return g
--------------------------------------------------------------------------------
/emnlp2021/metric/wieting_similarity/sim_utils.py:
--------------------------------------------------------------------------------
1 | import io
2 | import numpy as np
3 | import torch
4 |
5 | def get_wordmap(textfile):
6 | words={}
7 | We = []
8 | f = io.open(textfile, 'r', encoding='utf-8')
9 | lines = f.readlines()
10 | if len(lines[0].split()) == 2:
11 | lines.pop(0)
12 | ct = 0
13 | for (n,i) in enumerate(lines):
14 | word = i.split(' ', 1)[0]
15 | vec = i.split(' ', 1)[1].split(' ')
16 | j = 0
17 | v = []
18 | while j < len(vec):
19 | v.append(float(vec[j]))
20 | j += 1
21 | words[word] = ct
22 | ct += 1
23 | We.append(v)
24 | return words, np.array(We)
25 |
26 | def get_minibatches_idx(n, minibatch_size, shuffle=False):
27 | idx_list = np.arange(n, dtype="int32")
28 |
29 | if shuffle:
30 | np.random.shuffle(idx_list)
31 |
32 | minibatches = []
33 | minibatch_start = 0
34 | for i in range(n // minibatch_size):
35 | minibatches.append(idx_list[minibatch_start:
36 | minibatch_start + minibatch_size])
37 | minibatch_start += minibatch_size
38 |
39 | if (minibatch_start != n):
40 | # Make a minibatch out of what is left
41 | minibatches.append(idx_list[minibatch_start:])
42 |
43 | return zip(range(len(minibatches)), minibatches)
44 |
45 | def max_pool(x, lengths, gpu):
46 | out = torch.FloatTensor(x.size(0), x.size(2)).zero_()
47 | if gpu >= 0:
48 | out = out.cuda()
49 | for i in range(len(lengths)):
50 | out[i] = torch.max(x[i][0:lengths[i]], 0)[0]
51 | return out
52 |
53 | def mean_pool(x, lengths, gpu):
54 | out = torch.FloatTensor(x.size(0), x.size(2)).zero_()
55 | if gpu >= 0:
56 | out = out.cuda()
57 | for i in range(len(lengths)):
58 | out[i] = torch.mean(x[i][0:lengths[i]], 0)
59 | return out
60 |
61 | def lookup(words, w):
62 | w = w.lower()
63 | if w in words:
64 | return words[w]
65 |
66 | class Example(object):
67 |
68 | def __init__(self, sentence):
69 | self.sentence = sentence.strip().lower()
70 | self.embeddings = []
71 | self.representation = None
72 |
73 | def populate_embeddings(self, words):
74 | sentence = self.sentence.lower()
75 | arr = sentence.split()
76 | for i in arr:
77 | emb = lookup(words, i)
78 | if emb:
79 | self.embeddings.append(emb)
80 | if len(self.embeddings) == 0:
81 | self.embeddings.append(words['UUUNKKK'])
--------------------------------------------------------------------------------
/emnlp2021/metric/wieting_similarity/similarity_evaluator.py:
--------------------------------------------------------------------------------
1 | """
2 | The models should be downloaded from
3 | https://drive.google.com/drive/folders/1lBN2nbzxtpqbPUyeURtzt0k1kBY6u6Mj
4 | The source is http://style.cs.umass.edu
5 | """
6 |
7 | import torch
8 | import numpy as np
9 | from .sim_models import WordAveraging
10 | from .sim_utils import Example
11 | from nltk.tokenize import TreebankWordTokenizer
12 | import sentencepiece as spm
13 |
14 |
15 | class SimilarityEvaluator:
16 | def __init__(
17 | self,
18 | model_path='models/sim/sim.pt',
19 | tokenizer_path='models/sim/sim.sp.30k.model',
20 | gpu=False
21 | ):
22 | self.model_path = model_path
23 | self.tokenizer_path = tokenizer_path
24 | self.tok = TreebankWordTokenizer()
25 | kw = {}
26 | if not torch.cuda.is_available():
27 | kw['map_location'] = torch.device('cpu')
28 | model = torch.load(self.model_path, **kw)
29 | state_dict = model['state_dict']
30 | vocab_words = model['vocab_words']
31 | args = model['args']
32 | if gpu is False:
33 | args.gpu = -1
34 | # turn off gpu
35 | self.model = WordAveraging(args, vocab_words)
36 | self.model.load_state_dict(state_dict, strict=True)
37 | self.sp = spm.SentencePieceProcessor()
38 | self.sp.Load(self.tokenizer_path)
39 | self.model.eval()
40 |
41 | def make_example(self, sentence):
42 | sentence = sentence.lower()
43 | sentence = " ".join(self.tok.tokenize(sentence))
44 | sentence = self.sp.EncodeAsPieces(sentence)
45 | wp1 = Example(" ".join(sentence))
46 | wp1.populate_embeddings(self.model.vocab)
47 | return wp1
48 |
49 | def find_similarity(self, s1, s2):
50 | with torch.no_grad():
51 | s1 = [self.make_example(x) for x in s1]
52 | s2 = [self.make_example(x) for x in s2]
53 | wx1, wl1, wm1 = self.model.torchify_batch(s1)
54 | wx2, wl2, wm2 = self.model.torchify_batch(s2)
55 | scores = self.model.scoring_function(wx1, wm1, wl1, wx2, wm2, wl2)
56 | return [x.item() for x in scores]
57 |
58 | def find_similarity_batched(self, inputs, preds, batch_size=32):
59 | assert len(inputs) == len(preds)
60 | sim_scores = []
61 | for i in range(0, len(inputs), batch_size):
62 | sim_scores.extend(
63 | self.find_similarity(inputs[i:i + batch_size], preds[i:i + batch_size])
64 | )
65 | return np.array(sim_scores)
66 |
67 | def embed_texts(self, texts, batch_size=128):
68 | result = []
69 | for i in range(0, len(texts), batch_size):
70 | wx, wl, wm = self.model.torchify_batch([self.make_example(x) for x in texts[i:i+batch_size]])
71 | with torch.no_grad():
72 | tensors = torch.nn.functional.normalize(self.model.encode(wx, wm, wl))
73 | result.append(tensors.cpu().numpy())
74 | return np.concatenate(result)
75 |
--------------------------------------------------------------------------------
/emnlp2021/metric/wieting_similarity/spm.py:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/emnlp2021/metric/wieting_similarity/test_sim.py:
--------------------------------------------------------------------------------
1 | import torch
2 | from wieting_similarity.sim_models import WordAveraging
3 | from wieting_similarity.sim_utils import Example
4 | from nltk.tokenize import TreebankWordTokenizer
5 | import sentencepiece as spm
6 |
7 | tok = TreebankWordTokenizer()
8 |
9 | model = torch.load('/home/voronov/projects/style-transfer-paraphrase/style_paraphrase/evaluation/similarity/sim.pt')
10 | state_dict = model['state_dict']
11 | vocab_words = model['vocab_words']
12 | args = model['args']
13 | # turn off gpu
14 | model = WordAveraging(args, vocab_words)
15 | model.load_state_dict(state_dict, strict=True)
16 | sp = spm.SentencePieceProcessor()
17 | sp.Load('/home/voronov/projects/style-transfer-paraphrase/style_paraphrase/evaluation/similarity/sim.sp.30k.model')
18 | model.eval()
19 |
20 | def make_example(sentence, model):
21 | sentence = sentence.lower()
22 | sentence = " ".join(tok.tokenize(sentence))
23 | sentence = sp.EncodeAsPieces(sentence)
24 | wp1 = Example(" ".join(sentence))
25 | wp1.populate_embeddings(model.vocab)
26 | return wp1
27 |
28 | def find_similarity(s1, s2):
29 | with torch.no_grad():
30 | s1 = [make_example(x, model) for x in s1]
31 | s2 = [make_example(x, model) for x in s2]
32 | wx1, wl1, wm1 = model.torchify_batch(s1)
33 | wx2, wl2, wm2 = model.torchify_batch(s2)
34 | scores = model.scoring_function(wx1, wm1, wl1, wx2, wm2, wl2)
35 | return [x.item() for x in scores]
36 |
37 | # s1 = "the dog ran outsideddd."
38 | # s2 = "the puppy escape into the trees."
39 | # print(find_similarity([s1, s2], [s2, s2]))
40 |
--------------------------------------------------------------------------------
/emnlp2021/prepare.sh:
--------------------------------------------------------------------------------
1 | mkdir -p models
2 |
3 | # DOWNLOAD THE MODELS FOR EVALUATION
4 | # download the CoLA model from http://style.cs.umass.edu/
5 | # using several runs because gdown does not preserve folders
6 | mkdir -p models/cola
7 | mkdir -p models/cola/cola-bin
8 | gdown https://drive.google.com/drive/folders/18G5ZfLRKTMlV0Ke4shztUlGZ6H8j8AD6 -O models/cola/cola-bin/input0 --folder
9 | gdown https://drive.google.com/drive/folders/1UlU0g9HUerK0xW8B_MKV08DkIPEs44-Q -O models/cola/cola-bin/label --folder
10 | gdown --id 1rBSrbL_6gfDqOCpNz1JrOJK9gKrkKxhv -O models/cola/checkpoint_best.pt
11 | # download the similarity model from http://style.cs.umass.edu/
12 | gdown https://drive.google.com/drive/folders/1lBN2nbzxtpqbPUyeURtzt0k1kBY6u6Mj -O models/sim --folder
13 |
14 | # DOWNLOAD THE MODELS FOR INFERENCE
15 |
16 |
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/condBERT/choosers.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 |
3 | from flair.data import Sentence
4 | from flair.embeddings import WordEmbeddings
5 |
6 |
7 | def cosine(v1, v2):
8 | return np.dot(v1, v2) / np.sqrt(sum(v1**2) * sum(v2**2) + 1e-10)
9 |
10 |
11 | class EmbeddingSimilarityChooser:
12 | def __init__(self, sim_coef=100, tokenizer=None):
13 | self.glove_embedding = WordEmbeddings('glove')
14 | self.sim_coef = sim_coef
15 | self.tokenizer = tokenizer
16 |
17 | def embed(self, text):
18 | toks = self.glove_embedding.embed(Sentence(text))[0]
19 | if not toks:
20 | return np.zeros(self.glove_embedding.embedding_length)
21 | return np.mean([t.embedding.cpu().numpy() for t in toks], axis=0)
22 |
23 | def decode(self, tokens):
24 | if isinstance(tokens, str):
25 | return tokens
26 | if self.tokenizer:
27 | return self.tokenizer.convert_tokens_to_string(tokens)
28 | return ' '.join(tokens).replace(' ##', '')
29 |
30 | def __call__(self, hypotheses, original=None, scores=None, **kwargs):
31 | e = self.embed(self.decode(original))
32 | candidates = [
33 | (fill_words, score, cosine(e, self.embed(self.decode(fill_words))))
34 | for fill_words, score in zip(hypotheses, scores)
35 | ]
36 | candidates = sorted(candidates, key=lambda x: x[1] + x[2] * self.sim_coef, reverse=True)
37 | return candidates[0][0]
38 |
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/condBERT/condbert.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | from collections import defaultdict
4 |
5 |
6 | def group_by_first_token(texts, tokenizer):
7 | seqs = [tokenizer.encode(x, add_special_tokens=False) for x in texts]
8 | grouped = defaultdict(list)
9 | for seq in seqs:
10 | grouped[seq[0]].append(seq)
11 | return grouped
12 |
13 |
14 | def default_chooser(hypotheses, original=None, **kwargs):
15 | return hypotheses[0]
16 |
17 |
18 | class CondBertRewriter:
19 | def __init__(
20 | self,
21 | model,
22 | tokenizer,
23 | device,
24 | neg_words,
25 | pos_words,
26 | word2coef,
27 | token_toxicities,
28 | predictor=None,
29 | ):
30 | self.model = model
31 | self.tokenizer = tokenizer
32 | self.device = device
33 | self.neg_words = neg_words
34 | self.pos_words = pos_words
35 | self.word2coef = word2coef
36 | self.token_toxicities = token_toxicities
37 | self.predictor = predictor
38 |
39 | # calculated properties
40 | self.v = {v: k for k, v in tokenizer.vocab.items()}
41 | self.device_toxicities = torch.tensor(token_toxicities).to(self.device)
42 |
43 | self.neg_complex_tokens = group_by_first_token(neg_words, self.tokenizer)
44 | self.pos_complex_tokens = group_by_first_token(pos_words, self.tokenizer)
45 | self.mask_index = self.tokenizer.convert_tokens_to_ids("[MASK]")
46 |
47 | def toks_to_words(self, token_ids):
48 | """ Merge subword tokens into whole words """
49 | indices = []
50 | for i, token_id in enumerate(token_ids):
51 | token_text = self.v[token_id]
52 | if token_text.startswith('##'):
53 | indices.append(i)
54 | else:
55 | if indices:
56 | toks = [self.v[token_ids[t]] for t in indices]
57 | word = ''.join([toks[0]] + [t[2:] for t in toks[1:]])
58 | yield indices, word
59 | indices = [i]
60 |
61 | def get_mask_fast(
62 | self,
63 | inp: str,
64 | bad_words=None,
65 | min_bad_score=0,
66 | aggressive=True,
67 | max_score_margin=0.5,
68 | label=0, # the desired label (masking tokens of the opposite label)
69 | ):
70 | if bad_words is None:
71 | if label == 0:
72 | bad_words = self.neg_complex_tokens
73 | else:
74 | bad_words = self.pos_complex_tokens
75 |
76 | sentences = [self.tokenizer.encode(inp, add_special_tokens=True)]
77 | sentences_torch = torch.tensor(sentences)
78 | masks = torch.zeros_like(sentences_torch)
79 |
80 | for sent_id, sent in enumerate(sentences):
81 | for first_tok_id, tok in enumerate(sent):
82 | for hypothesis in bad_words.get(tok, []):
83 | n = len(hypothesis)
84 | if sent[first_tok_id: (first_tok_id + n)] == hypothesis:
85 | for step in range(n):
86 | masks[sent_id, first_tok_id + step] = 1
87 | # if a word has toxic prefix, it is all toxic, so we should label its suffix as well
88 | for offset, next_token in enumerate(sent[(first_tok_id + n):]):
89 | if self.tokenizer.convert_ids_to_tokens(next_token).startswith('##'):
90 | masks[sent_id, first_tok_id + n + offset] = 1
91 | else:
92 | break
93 | if sum(masks[sent_id].numpy()) == 0 or aggressive:
94 | scored_words = []
95 | for indices, word in self.toks_to_words(sent):
96 | score = self.word2coef.get(word, 0) * (1 - 2 * label)
97 | if score:
98 | scored_words.append([indices, word, score])
99 | if scored_words:
100 | max_score = max(s[2] for s in scored_words)
101 | if max_score > min_bad_score:
102 | for indices, word, score in scored_words:
103 | if score >= max(min_bad_score, max_score * max_score_margin):
104 | masks[sent_id, indices] = 1
105 |
106 | return sentences_torch, masks
107 |
108 | def translate(
109 | self,
110 | ss,
111 | get_mask=None,
112 | label=0,
113 | prnt=True,
114 | raw=False,
115 | toxicity_penalty=15,
116 | contrast_penalty=0,
117 | mask_toxic=False,
118 | duplicate=False,
119 | ):
120 | if get_mask is None:
121 | get_mask = self.get_mask_fast
122 | if prnt:
123 | print(ss)
124 | if label == 0:
125 | input_ids, attn_mask = get_mask(ss, bad_words=self.neg_complex_tokens, label=label)
126 | else:
127 | input_ids, attn_mask = get_mask(ss, bad_words=self.pos_complex_tokens, label=label)
128 |
129 | if attn_mask.sum().numpy() == 0:
130 | return ss
131 |
132 | masked = torch.ones_like(input_ids) * -100
133 | for i in range(input_ids.shape[0]):
134 | masked[i][attn_mask[i] == 1] = input_ids[i][attn_mask[i] == 1]
135 | if duplicate:
136 | input_ids = torch.cat([input_ids, input_ids], axis=1)
137 | attn_mask = torch.cat([torch.zeros_like(attn_mask), attn_mask], axis=1)
138 | if mask_toxic:
139 | input_ids[i][attn_mask[i] == 1] = self.mask_index
140 |
141 | # masked = masked.to(self.device)
142 |
143 | input_ids = input_ids.to(self.device)
144 |
145 | self.model.eval()
146 |
147 | outputs = self.model(
148 | input_ids,
149 | token_type_ids=torch.ones_like(input_ids) * label,
150 | )
151 | if contrast_penalty:
152 | neg_outputs = self.model(
153 | input_ids,
154 | token_type_ids=torch.ones_like(input_ids) * (1-label),
155 | )
156 | else:
157 | neg_outputs = None
158 | if raw:
159 | return outputs[0]
160 | for i in range(input_ids.shape[0]):
161 | logits = outputs[-1][i][attn_mask[i] == 1]
162 | if toxicity_penalty:
163 | logits -= self.device_toxicities * toxicity_penalty * (1 - 2 * label)
164 | if contrast_penalty:
165 | neg_logits = neg_outputs[-1][i][attn_mask[i] == 1]
166 | scores = torch.softmax(logits, -1) - torch.softmax(neg_logits, -1) * contrast_penalty
167 | else:
168 | scores = logits
169 | input_ids[i][attn_mask[i] == 1] = scores.argmax(dim=1)
170 |
171 | result = self.tokenizer.convert_tokens_to_string(
172 | [self.tokenizer.convert_ids_to_tokens(i.item()) for i in input_ids[0][1:-1]]
173 | )
174 | return result.split('[SEP] [CLS] ')[-1]
175 |
176 | def convert_mask(self, tok_ids, mask_ids, duplicate=False, start_from=0):
177 | # find the first masked word, keep only its first token, get its position
178 | toks_tmp = [self.tokenizer.convert_ids_to_tokens(tok_ids[0])[1:-1]]
179 | mask_pos = None
180 | toks = []
181 | mask_toks = []
182 | has_mask = False
183 | for i, is_masked in enumerate(mask_ids[0][1:-1]):
184 | tok = toks_tmp[0][i]
185 | if not has_mask:
186 | if is_masked and i >= start_from and not tok.startswith('##'):
187 | has_mask = True
188 | mask_pos = [i]
189 | mask_toks.append(tok)
190 | toks.append(tok)
191 | else:
192 | if not is_masked or not tok.startswith('##'):
193 | toks.extend(toks_tmp[0][i:])
194 | break
195 | else:
196 | mask_toks.append(tok)
197 | toks = [toks]
198 |
199 | if duplicate:
200 | toks = [toks_tmp[0] + ['[SEP]'] + toks[0]]
201 | mask_pos[0] += len(toks_tmp[0]) + 1
202 | return toks, mask_pos, mask_toks
203 |
204 | def replacement_loop(
205 | self,
206 | text,
207 | span_detector=None,
208 | predictor=None,
209 | verbose=True,
210 | chooser=default_chooser,
211 | n_tokens=(1, 2, 3),
212 | n_top=10,
213 | mask_token=False,
214 | max_steps=1000,
215 | label=0,
216 | **predictor_args,
217 | ):
218 | if span_detector is None:
219 | span_detector = self.get_mask_fast
220 | if predictor is None:
221 | predictor = self.predictor
222 | new_text = text
223 | look_from = 0
224 |
225 | for i in range(max_steps):
226 | tok_ids, mask_ids = span_detector(new_text, label=label)
227 | if not sum(mask_ids[0][(1 + look_from):]):
228 | break
229 | toks, mask_pos, mask_toks = self.convert_mask(
230 | tok_ids, mask_ids, duplicate=False, start_from=look_from
231 | )
232 | if mask_pos is None:
233 | return new_text
234 | texts, scores = predictor.generate(
235 | toks,
236 | mask_pos,
237 | n_tokens=list(n_tokens),
238 | n_top=n_top,
239 | fix_multiunit=False,
240 | mask_token=mask_token,
241 | label=label,
242 | **predictor_args
243 | )
244 | old_replacement = chooser(hypotheses=texts[0], scores=scores[0], original=mask_toks)
245 | if isinstance(old_replacement, str):
246 | old_replacement = [old_replacement]
247 | replacement = [t for w in old_replacement for t in w.split('_')]
248 | if verbose:
249 | print(mask_toks, '->', replacement)
250 | new_toks = toks[0][:mask_pos[0]] + replacement + toks[0][mask_pos[0] + 1:]
251 | new_text = self.tokenizer.convert_tokens_to_string(new_toks)
252 | look_from = mask_pos[0] + len(old_replacement)
253 | # we could add len(replacement), but sometimes its tokens glue together like 'mental' + '##ly'
254 | # on the other hand, we want to skip light_##hearted fully
255 | return new_text
256 |
257 | def get_mlm_contrast(self, text, label=0, logs=True):
258 | """ Calculate difference in log likelihood (or likelihood) of a text under different token_type_ids """
259 | inputs = self.tokenizer(text, return_tensors='pt')
260 | inputs['token_type_ids'] = inputs['token_type_ids'] * 0 + label
261 | inputs = {k: v.to(self.device) for k, v in inputs.items()}
262 |
263 | with torch.no_grad():
264 | scores = self.model(**inputs, )
265 | pos_ll = get_ll(inputs, scores)
266 |
267 | inputs['token_type_ids'] = 1 - inputs['token_type_ids']
268 | with torch.no_grad():
269 | scores2 = self.model(**inputs, )
270 | neg_ll = get_ll(inputs, scores2)
271 |
272 | if logs:
273 | diff = pos_ll - neg_ll
274 | else:
275 | diff = torch.exp(pos_ll) - torch.exp(neg_ll)
276 |
277 | return diff
278 |
279 |
280 | def get_ll(inputs, results):
281 | """ Calculate log likelihood of a sequence """
282 | probs = torch.log_softmax(results.logits, dim=-1)
283 | seq = inputs['input_ids']
284 | proba = torch.gather(probs, 2, seq[:, :, None]).squeeze(-1)
285 | return proba * inputs['attention_mask']
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/condBERT/condbert_compile_vocab.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "This notebook reproduces creation of CondBERT vocabulary.\n",
8 | "\n",
9 | "The files `positive-words.txt`, `negative-words.txt` and `toxic_words.txt` are not reproduced exactly because of our internal issues. \n",
10 | "\n",
11 | "However, all other files (`token_toxicities.txt` and `word2coef.pkl` ) are reproduced accurately. "
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "# 0. Prerequisites"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": 1,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "VOCAB_DIRNAME = 'vocabularies' "
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 3,
33 | "metadata": {},
34 | "outputs": [],
35 | "source": [
36 | "from condbert import CondBertRewriter\n",
37 | "from choosers import EmbeddingSimilarityChooser\n",
38 | "from multiword.masked_token_predictor_bert import MaskedTokenPredictorBert"
39 | ]
40 | },
41 | {
42 | "cell_type": "markdown",
43 | "metadata": {},
44 | "source": [
45 | "# 1. Loading BERT"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 6,
51 | "metadata": {},
52 | "outputs": [],
53 | "source": [
54 | "import torch\n",
55 | "from transformers import BertTokenizer, BertForMaskedLM\n",
56 | "import numpy as np\n",
57 | "import pickle\n",
58 | "import os\n",
59 | "from tqdm.auto import tqdm, trange"
60 | ]
61 | },
62 | {
63 | "cell_type": "code",
64 | "execution_count": 7,
65 | "metadata": {},
66 | "outputs": [],
67 | "source": [
68 | "os.environ['CUDA_VISIBLE_DEVICES'] = '0'\n",
69 | "device = torch.device('cuda:0')\n",
70 | "device = torch.device('cpu')"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": 9,
76 | "metadata": {},
77 | "outputs": [],
78 | "source": [
79 | "model_name = 'bert-base-uncased'\n",
80 | "tokenizer = BertTokenizer.from_pretrained(model_name)"
81 | ]
82 | },
83 | {
84 | "cell_type": "code",
85 | "execution_count": 10,
86 | "metadata": {},
87 | "outputs": [
88 | {
89 | "name": "stderr",
90 | "output_type": "stream",
91 | "text": [
92 | "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']\n",
93 | "- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
94 | "- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n"
95 | ]
96 | }
97 | ],
98 | "source": [
99 | "model = BertForMaskedLM.from_pretrained(model_name)"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": 11,
105 | "metadata": {},
106 | "outputs": [],
107 | "source": [
108 | "model.to(device);"
109 | ]
110 | },
111 | {
112 | "cell_type": "markdown",
113 | "metadata": {},
114 | "source": [
115 | "# 2. Preparing the vocabularires."
116 | ]
117 | },
118 | {
119 | "cell_type": "markdown",
120 | "metadata": {},
121 | "source": [
122 | "\n",
123 | "- negative-words.txt\n",
124 | "- positive-words.txt\n",
125 | "- word2coef.pkl\n",
126 | "- token_toxicities.txt\n",
127 | "\n",
128 | "These files should be prepared once. "
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": 12,
134 | "metadata": {},
135 | "outputs": [],
136 | "source": [
137 | "tox_corpus_path = '../../data/train/train_toxic'\n",
138 | "norm_corpus_path = '../../data/train/train_normal'"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": 13,
144 | "metadata": {},
145 | "outputs": [],
146 | "source": [
147 | "if not os.path.exists(VOCAB_DIRNAME):\n",
148 | " os.makedirs(VOCAB_DIRNAME)"
149 | ]
150 | },
151 | {
152 | "cell_type": "markdown",
153 | "metadata": {},
154 | "source": [
155 | "### 2.1 Preparing the DRG-like vocabularies"
156 | ]
157 | },
158 | {
159 | "cell_type": "code",
160 | "execution_count": 14,
161 | "metadata": {},
162 | "outputs": [],
163 | "source": [
164 | "import os\n",
165 | "import argparse\n",
166 | "import numpy as np\n",
167 | "from tqdm import tqdm\n",
168 | "from nltk import ngrams\n",
169 | "from sklearn.linear_model import LogisticRegression\n",
170 | "from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer\n",
171 | "\n",
172 | "\n",
173 | "\n",
174 | "class NgramSalienceCalculator():\n",
175 | " def __init__(self, tox_corpus, norm_corpus, use_ngrams=False):\n",
176 | " ngrams = (1, 3) if use_ngrams else (1, 1)\n",
177 | " self.vectorizer = CountVectorizer(ngram_range=ngrams)\n",
178 | "\n",
179 | " tox_count_matrix = self.vectorizer.fit_transform(tox_corpus)\n",
180 | " self.tox_vocab = self.vectorizer.vocabulary_\n",
181 | " self.tox_counts = np.sum(tox_count_matrix, axis=0)\n",
182 | "\n",
183 | " norm_count_matrix = self.vectorizer.fit_transform(norm_corpus)\n",
184 | " self.norm_vocab = self.vectorizer.vocabulary_\n",
185 | " self.norm_counts = np.sum(norm_count_matrix, axis=0)\n",
186 | "\n",
187 | " def salience(self, feature, attribute='tox', lmbda=0.5):\n",
188 | " assert attribute in ['tox', 'norm']\n",
189 | " if feature not in self.tox_vocab:\n",
190 | " tox_count = 0.0\n",
191 | " else:\n",
192 | " tox_count = self.tox_counts[0, self.tox_vocab[feature]]\n",
193 | "\n",
194 | " if feature not in self.norm_vocab:\n",
195 | " norm_count = 0.0\n",
196 | " else:\n",
197 | " norm_count = self.norm_counts[0, self.norm_vocab[feature]]\n",
198 | "\n",
199 | " if attribute == 'tox':\n",
200 | " return (tox_count + lmbda) / (norm_count + lmbda)\n",
201 | " else:\n",
202 | " return (norm_count + lmbda) / (tox_count + lmbda)\n"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": 15,
208 | "metadata": {},
209 | "outputs": [
210 | {
211 | "name": "stdout",
212 | "output_type": "stream",
213 | "text": [
214 | "88645\n"
215 | ]
216 | }
217 | ],
218 | "source": [
219 | "from collections import Counter\n",
220 | "c = Counter()\n",
221 | "\n",
222 | "for fn in [tox_corpus_path, norm_corpus_path]:\n",
223 | " with open(fn, 'r') as corpus:\n",
224 | " for line in corpus.readlines():\n",
225 | " for tok in line.strip().split():\n",
226 | " c[tok] += 1\n",
227 | "\n",
228 | "print(len(c))"
229 | ]
230 | },
231 | {
232 | "cell_type": "code",
233 | "execution_count": 16,
234 | "metadata": {},
235 | "outputs": [
236 | {
237 | "name": "stdout",
238 | "output_type": "stream",
239 | "text": [
240 | "88645\n"
241 | ]
242 | }
243 | ],
244 | "source": [
245 | "vocab = {w for w, _ in c.most_common() if _ > 0} # if we took words with > 1 occurences, vocabulary would be x2 smaller, but we'll survive this size\n",
246 | "print(len(vocab))"
247 | ]
248 | },
249 | {
250 | "cell_type": "code",
251 | "execution_count": 17,
252 | "metadata": {},
253 | "outputs": [],
254 | "source": [
255 | "with open(tox_corpus_path, 'r') as tox_corpus, open(norm_corpus_path, 'r') as norm_corpus:\n",
256 | " corpus_tox = [' '.join([w if w in vocab else '' for w in line.strip().split()]) for line in tox_corpus.readlines()]\n",
257 | " corpus_norm = [' '.join([w if w in vocab else '' for w in line.strip().split()]) for line in norm_corpus.readlines()]"
258 | ]
259 | },
260 | {
261 | "cell_type": "code",
262 | "execution_count": 18,
263 | "metadata": {},
264 | "outputs": [],
265 | "source": [
266 | "neg_out_name = VOCAB_DIRNAME + '/negative-words.txt'\n",
267 | "pos_out_name = VOCAB_DIRNAME + '/positive-words.txt'"
268 | ]
269 | },
270 | {
271 | "cell_type": "code",
272 | "execution_count": 20,
273 | "metadata": {},
274 | "outputs": [],
275 | "source": [
276 | "threshold = 4"
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": 21,
282 | "metadata": {},
283 | "outputs": [],
284 | "source": [
285 | "sc = NgramSalienceCalculator(corpus_tox, corpus_norm, False)\n",
286 | "seen_grams = set()\n",
287 | "\n",
288 | "with open(neg_out_name, 'w') as neg_out, open(pos_out_name, 'w') as pos_out:\n",
289 | " for gram in set(sc.tox_vocab.keys()).union(set(sc.norm_vocab.keys())):\n",
290 | " if gram not in seen_grams:\n",
291 | " seen_grams.add(gram)\n",
292 | " toxic_salience = sc.salience(gram, attribute='tox')\n",
293 | " polite_salience = sc.salience(gram, attribute='norm')\n",
294 | " if toxic_salience > threshold:\n",
295 | " neg_out.writelines(f'{gram}\\n')\n",
296 | " elif polite_salience > threshold:\n",
297 | " pos_out.writelines(f'{gram}\\n')"
298 | ]
299 | },
300 | {
301 | "cell_type": "markdown",
302 | "metadata": {},
303 | "source": [
304 | "## 2.2 Evaluating word toxicities with a logistic regression"
305 | ]
306 | },
307 | {
308 | "cell_type": "code",
309 | "execution_count": 22,
310 | "metadata": {},
311 | "outputs": [],
312 | "source": [
313 | "from sklearn.pipeline import make_pipeline\n",
314 | "pipe = make_pipeline(CountVectorizer(), LogisticRegression(max_iter=1000))"
315 | ]
316 | },
317 | {
318 | "cell_type": "code",
319 | "execution_count": 23,
320 | "metadata": {},
321 | "outputs": [],
322 | "source": [
323 | "X_train = corpus_tox + corpus_norm\n",
324 | "y_train = [1] * len(corpus_tox) + [0] * len(corpus_norm)\n",
325 | "pipe.fit(X_train, y_train);"
326 | ]
327 | },
328 | {
329 | "cell_type": "code",
330 | "execution_count": 24,
331 | "metadata": {},
332 | "outputs": [
333 | {
334 | "data": {
335 | "text/plain": [
336 | "(88519,)"
337 | ]
338 | },
339 | "execution_count": 24,
340 | "metadata": {},
341 | "output_type": "execute_result"
342 | }
343 | ],
344 | "source": [
345 | "coefs = pipe[1].coef_[0]\n",
346 | "coefs.shape"
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": 25,
352 | "metadata": {},
353 | "outputs": [],
354 | "source": [
355 | "word2coef = {w: coefs[idx] for w, idx in pipe[0].vocabulary_.items()}"
356 | ]
357 | },
358 | {
359 | "cell_type": "code",
360 | "execution_count": 26,
361 | "metadata": {},
362 | "outputs": [],
363 | "source": [
364 | "import pickle\n",
365 | "with open(VOCAB_DIRNAME + '/word2coef.pkl', 'wb') as f:\n",
366 | " pickle.dump(word2coef, f)"
367 | ]
368 | },
369 | {
370 | "cell_type": "markdown",
371 | "metadata": {},
372 | "source": [
373 | "## 2.3 Labelling BERT tokens by toxicity"
374 | ]
375 | },
376 | {
377 | "cell_type": "code",
378 | "execution_count": 27,
379 | "metadata": {},
380 | "outputs": [
381 | {
382 | "name": "stderr",
383 | "output_type": "stream",
384 | "text": [
385 | "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 135390/135390 [00:43<00:00, 3088.85it/s]\n",
386 | "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 135390/135390 [00:45<00:00, 2977.70it/s]\n"
387 | ]
388 | }
389 | ],
390 | "source": [
391 | "from collections import defaultdict\n",
392 | "toxic_counter = defaultdict(lambda: 1)\n",
393 | "nontoxic_counter = defaultdict(lambda: 1)\n",
394 | "\n",
395 | "for text in tqdm(corpus_tox):\n",
396 | " for token in tokenizer.encode(text):\n",
397 | " toxic_counter[token] += 1\n",
398 | "for text in tqdm(corpus_norm):\n",
399 | " for token in tokenizer.encode(text):\n",
400 | " nontoxic_counter[token] += 1"
401 | ]
402 | },
403 | {
404 | "cell_type": "code",
405 | "execution_count": 28,
406 | "metadata": {},
407 | "outputs": [],
408 | "source": [
409 | "token_toxicities = [toxic_counter[i] / (nontoxic_counter[i] + toxic_counter[i]) for i in range(len(tokenizer.vocab))]"
410 | ]
411 | },
412 | {
413 | "cell_type": "code",
414 | "execution_count": 29,
415 | "metadata": {},
416 | "outputs": [],
417 | "source": [
418 | "with open(VOCAB_DIRNAME + '/token_toxicities.txt', 'w') as f:\n",
419 | " for t in token_toxicities:\n",
420 | " f.write(str(t))\n",
421 | " f.write('\\n')"
422 | ]
423 | },
424 | {
425 | "cell_type": "markdown",
426 | "metadata": {},
427 | "source": [
428 | "# 3. Setting up the model"
429 | ]
430 | },
431 | {
432 | "cell_type": "markdown",
433 | "metadata": {},
434 | "source": [
435 | "### 3.1 Loading the vocabularies"
436 | ]
437 | },
438 | {
439 | "cell_type": "code",
440 | "execution_count": 30,
441 | "metadata": {},
442 | "outputs": [],
443 | "source": [
444 | "with open(VOCAB_DIRNAME + \"/negative-words.txt\", \"r\") as f:\n",
445 | " s = f.readlines()\n",
446 | "negative_words = list(map(lambda x: x[:-1], s))\n",
447 | "\n",
448 | "with open(VOCAB_DIRNAME + \"/positive-words.txt\", \"r\") as f:\n",
449 | " s = f.readlines()\n",
450 | "positive_words = list(map(lambda x: x[:-1], s))"
451 | ]
452 | },
453 | {
454 | "cell_type": "code",
455 | "execution_count": 31,
456 | "metadata": {},
457 | "outputs": [],
458 | "source": [
459 | "import pickle\n",
460 | "with open(VOCAB_DIRNAME + '/word2coef.pkl', 'rb') as f:\n",
461 | " word2coef = pickle.load(f)"
462 | ]
463 | },
464 | {
465 | "cell_type": "code",
466 | "execution_count": 32,
467 | "metadata": {},
468 | "outputs": [],
469 | "source": [
470 | "token_toxicities = []\n",
471 | "with open(VOCAB_DIRNAME + '/token_toxicities.txt', 'r') as f:\n",
472 | " for line in f.readlines():\n",
473 | " token_toxicities.append(float(line))\n",
474 | "token_toxicities = np.array(token_toxicities)\n",
475 | "token_toxicities = np.maximum(0, np.log(1/(1/token_toxicities-1))) # log odds ratio\n",
476 | "\n",
477 | "# discourage meaningless tokens\n",
478 | "for tok in ['.', ',', '-']:\n",
479 | " token_toxicities[tokenizer.encode(tok)][1] = 3\n",
480 | "\n",
481 | "for tok in ['you']:\n",
482 | " token_toxicities[tokenizer.encode(tok)][1] = 0"
483 | ]
484 | },
485 | {
486 | "cell_type": "code",
487 | "execution_count": 38,
488 | "metadata": {},
489 | "outputs": [],
490 | "source": [
491 | "def adjust_logits(logits, label=0):\n",
492 | " return logits - token_toxicities * 100 * (1 - 2 * label)\n",
493 | "\n",
494 | "predictor = MaskedTokenPredictorBert(model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0, logits_postprocessor=adjust_logits)\n",
495 | "\n",
496 | "editor = CondBertRewriter(\n",
497 | " model=model,\n",
498 | " tokenizer=tokenizer,\n",
499 | " device=device,\n",
500 | " neg_words=negative_words,\n",
501 | " pos_words=positive_words,\n",
502 | " word2coef=word2coef,\n",
503 | " token_toxicities=token_toxicities,\n",
504 | " predictor=predictor,\n",
505 | ")"
506 | ]
507 | },
508 | {
509 | "cell_type": "markdown",
510 | "metadata": {},
511 | "source": [
512 | "The model below is used for reranking BERT hypotheses and helps to increase semantic similarity by choosing the hypotheses with embeddings similar to the orignal words. "
513 | ]
514 | },
515 | {
516 | "cell_type": "code",
517 | "execution_count": 34,
518 | "metadata": {},
519 | "outputs": [],
520 | "source": [
521 | "chooser = EmbeddingSimilarityChooser(sim_coef=10, tokenizer=tokenizer)"
522 | ]
523 | },
524 | {
525 | "cell_type": "markdown",
526 | "metadata": {},
527 | "source": [
528 | "# 4. Finally, the inference"
529 | ]
530 | },
531 | {
532 | "cell_type": "markdown",
533 | "metadata": {},
534 | "source": [
535 | "Parallel application of the model to all tokens, fast, but dirty. "
536 | ]
537 | },
538 | {
539 | "cell_type": "code",
540 | "execution_count": 35,
541 | "metadata": {},
542 | "outputs": [
543 | {
544 | "name": "stdout",
545 | "output_type": "stream",
546 | "text": [
547 | "you are mistake !\n"
548 | ]
549 | }
550 | ],
551 | "source": [
552 | "print(editor.translate('You are idiot!', prnt=False))"
553 | ]
554 | },
555 | {
556 | "cell_type": "markdown",
557 | "metadata": {},
558 | "source": [
559 | "Application of the model to all the tokens sequentially, in the multiword mode. "
560 | ]
561 | },
562 | {
563 | "cell_type": "code",
564 | "execution_count": 39,
565 | "metadata": {},
566 | "outputs": [
567 | {
568 | "name": "stdout",
569 | "output_type": "stream",
570 | "text": [
571 | "you are very beautiful !\n"
572 | ]
573 | }
574 | ],
575 | "source": [
576 | "print(editor.replacement_loop('You are stupid!', verbose=False, chooser=chooser, n_tokens=(1, 2, 3), n_top=10))"
577 | ]
578 | },
579 | {
580 | "cell_type": "markdown",
581 | "metadata": {},
582 | "source": [
583 | "Parameters that could be tuned:\n",
584 | "* The coeffincient in `adjust_logits` - the larger it is, the more the model avoids toxic words\n",
585 | "* The coefficient in `EmbeddingSimilarityChooser` - the larger it is, the more the model tries to preserve content \n",
586 | "* n_tokens - how many words can be generated from one\n",
587 | "* n_top - how many BERT hypotheses are reranked"
588 | ]
589 | },
590 | {
591 | "cell_type": "code",
592 | "execution_count": null,
593 | "metadata": {},
594 | "outputs": [],
595 | "source": []
596 | }
597 | ],
598 | "metadata": {
599 | "kernelspec": {
600 | "display_name": "Python 3 (ipykernel)",
601 | "language": "python",
602 | "name": "python3"
603 | },
604 | "language_info": {
605 | "codemirror_mode": {
606 | "name": "ipython",
607 | "version": 3
608 | },
609 | "file_extension": ".py",
610 | "mimetype": "text/x-python",
611 | "name": "python",
612 | "nbconvert_exporter": "python",
613 | "pygments_lexer": "ipython3",
614 | "version": "3.9.6"
615 | }
616 | },
617 | "nbformat": 4,
618 | "nbformat_minor": 4
619 | }
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/condBERT/condbert_inference.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "This notebook"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {},
14 | "outputs": [],
15 | "source": [
16 | "import os\n",
17 | "import sys\n",
18 | "\n",
19 | "\n",
20 | "def add_sys_path(p):\n",
21 | " p = os.path.abspath(p)\n",
22 | " print(p)\n",
23 | " if p not in sys.path:\n",
24 | " sys.path.append(p)"
25 | ]
26 | },
27 | {
28 | "cell_type": "code",
29 | "execution_count": 2,
30 | "metadata": {},
31 | "outputs": [],
32 | "source": [
33 | "os.environ['CUDA_VISIBLE_DEVICES'] = '0'"
34 | ]
35 | },
36 | {
37 | "cell_type": "code",
38 | "execution_count": 3,
39 | "metadata": {},
40 | "outputs": [],
41 | "source": [
42 | "from importlib import reload"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 4,
48 | "metadata": {},
49 | "outputs": [],
50 | "source": [
51 | "import condbert\n",
52 | "reload(condbert)\n",
53 | "from condbert import CondBertRewriter"
54 | ]
55 | },
56 | {
57 | "cell_type": "code",
58 | "execution_count": 5,
59 | "metadata": {},
60 | "outputs": [],
61 | "source": [
62 | "import torch\n",
63 | "from transformers import BertTokenizer, BertForMaskedLM\n",
64 | "import numpy as np\n",
65 | "import pickle\n",
66 | "from tqdm.auto import tqdm, trange"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 6,
72 | "metadata": {},
73 | "outputs": [],
74 | "source": [
75 | "device = torch.device('cuda:0')"
76 | ]
77 | },
78 | {
79 | "cell_type": "markdown",
80 | "metadata": {},
81 | "source": [
82 | "### Load the model"
83 | ]
84 | },
85 | {
86 | "cell_type": "code",
87 | "execution_count": 7,
88 | "metadata": {},
89 | "outputs": [],
90 | "source": [
91 | "model_name = 'bert-base-uncased'\n",
92 | "tokenizer = BertTokenizer.from_pretrained(model_name)"
93 | ]
94 | },
95 | {
96 | "cell_type": "code",
97 | "execution_count": 8,
98 | "metadata": {},
99 | "outputs": [
100 | {
101 | "name": "stderr",
102 | "output_type": "stream",
103 | "text": [
104 | "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']\n",
105 | "- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
106 | "- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n"
107 | ]
108 | }
109 | ],
110 | "source": [
111 | "model = BertForMaskedLM.from_pretrained(model_name)"
112 | ]
113 | },
114 | {
115 | "cell_type": "code",
116 | "execution_count": 9,
117 | "metadata": {},
118 | "outputs": [],
119 | "source": [
120 | "model.to(device);"
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "#### Load vocabularies for spans detection"
128 | ]
129 | },
130 | {
131 | "cell_type": "code",
132 | "execution_count": 10,
133 | "metadata": {},
134 | "outputs": [],
135 | "source": [
136 | "vocab_root = 'vocab/'"
137 | ]
138 | },
139 | {
140 | "cell_type": "code",
141 | "execution_count": 11,
142 | "metadata": {},
143 | "outputs": [],
144 | "source": [
145 | "with open(vocab_root + \"negative-words.txt\", \"r\") as f:\n",
146 | " s = f.readlines()\n",
147 | "negative_words = list(map(lambda x: x[:-1], s))\n",
148 | "with open(vocab_root + \"toxic_words.txt\", \"r\") as f:\n",
149 | " ss = f.readlines()\n",
150 | "negative_words += list(map(lambda x: x[:-1], ss))\n",
151 | "\n",
152 | "with open(vocab_root + \"positive-words.txt\", \"r\") as f:\n",
153 | " s = f.readlines()\n",
154 | "positive_words = list(map(lambda x: x[:-1], s))"
155 | ]
156 | },
157 | {
158 | "cell_type": "code",
159 | "execution_count": 12,
160 | "metadata": {},
161 | "outputs": [],
162 | "source": [
163 | "import pickle\n",
164 | "with open(vocab_root + 'word2coef.pkl', 'rb') as f:\n",
165 | " word2coef = pickle.load(f)"
166 | ]
167 | },
168 | {
169 | "cell_type": "code",
170 | "execution_count": 13,
171 | "metadata": {},
172 | "outputs": [],
173 | "source": [
174 | "token_toxicities = []\n",
175 | "with open(vocab_root + 'token_toxicities.txt', 'r') as f:\n",
176 | " for line in f.readlines():\n",
177 | " token_toxicities.append(float(line))\n",
178 | "token_toxicities = np.array(token_toxicities)\n",
179 | "token_toxicities = np.maximum(0, np.log(1/(1/token_toxicities-1))) # log odds ratio\n",
180 | "\n",
181 | "# discourage meaningless tokens\n",
182 | "for tok in ['.', ',', '-']:\n",
183 | " token_toxicities[tokenizer.encode(tok)][1] = 3\n",
184 | "\n",
185 | "for tok in ['you']:\n",
186 | " token_toxicities[tokenizer.encode(tok)][1] = 0"
187 | ]
188 | },
189 | {
190 | "cell_type": "markdown",
191 | "metadata": {},
192 | "source": [
193 | "### Applying the model"
194 | ]
195 | },
196 | {
197 | "cell_type": "code",
198 | "execution_count": 14,
199 | "metadata": {},
200 | "outputs": [],
201 | "source": [
202 | "reload(condbert)\n",
203 | "from condbert import CondBertRewriter\n",
204 | "\n",
205 | "editor = CondBertRewriter(\n",
206 | " model=model,\n",
207 | " tokenizer=tokenizer,\n",
208 | " device=device,\n",
209 | " neg_words=negative_words,\n",
210 | " pos_words=positive_words,\n",
211 | " word2coef=word2coef,\n",
212 | " token_toxicities=token_toxicities,\n",
213 | ")"
214 | ]
215 | },
216 | {
217 | "cell_type": "code",
218 | "execution_count": 15,
219 | "metadata": {},
220 | "outputs": [
221 | {
222 | "name": "stdout",
223 | "output_type": "stream",
224 | "text": [
225 | "you are an the !\n"
226 | ]
227 | }
228 | ],
229 | "source": [
230 | "print(editor.translate('You are an idiot!', prnt=False))"
231 | ]
232 | },
233 | {
234 | "cell_type": "markdown",
235 | "metadata": {},
236 | "source": [
237 | "### Multiunit"
238 | ]
239 | },
240 | {
241 | "cell_type": "code",
242 | "execution_count": 16,
243 | "metadata": {},
244 | "outputs": [],
245 | "source": [
246 | "editor = CondBertRewriter(\n",
247 | " model=model,\n",
248 | " tokenizer=tokenizer,\n",
249 | " device=device,\n",
250 | " neg_words=negative_words,\n",
251 | " pos_words=positive_words,\n",
252 | " word2coef=word2coef,\n",
253 | " token_toxicities=token_toxicities,\n",
254 | " predictor=None,\n",
255 | ")"
256 | ]
257 | },
258 | {
259 | "cell_type": "code",
260 | "execution_count": 17,
261 | "metadata": {},
262 | "outputs": [],
263 | "source": [
264 | "from multiword import masked_token_predictor_bert\n",
265 | "reload(masked_token_predictor_bert)\n",
266 | "from multiword.masked_token_predictor_bert import MaskedTokenPredictorBert"
267 | ]
268 | },
269 | {
270 | "cell_type": "code",
271 | "execution_count": 18,
272 | "metadata": {},
273 | "outputs": [
274 | {
275 | "name": "stdout",
276 | "output_type": "stream",
277 | "text": [
278 | "you are an old man !\n"
279 | ]
280 | }
281 | ],
282 | "source": [
283 | "predictor = MaskedTokenPredictorBert(model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0)\n",
284 | "editor.predictor = predictor\n",
285 | "\n",
286 | "def adjust_logits(logits, label):\n",
287 | " return logits - editor.token_toxicities * 3\n",
288 | "\n",
289 | "predictor.logits_postprocessor = adjust_logits\n",
290 | "\n",
291 | "print(editor.replacement_loop('You are an idiot!', verbose=False))"
292 | ]
293 | },
294 | {
295 | "cell_type": "code",
296 | "execution_count": 19,
297 | "metadata": {},
298 | "outputs": [
299 | {
300 | "name": "stdout",
301 | "output_type": "stream",
302 | "text": [
303 | "you are an old man !\n",
304 | "Wall time: 499 ms\n"
305 | ]
306 | }
307 | ],
308 | "source": [
309 | "%%time\n",
310 | "print(editor.replacement_loop('You are an idiot!', verbose=False, n_units=1))"
311 | ]
312 | },
313 | {
314 | "cell_type": "code",
315 | "execution_count": 20,
316 | "metadata": {},
317 | "outputs": [
318 | {
319 | "name": "stdout",
320 | "output_type": "stream",
321 | "text": [
322 | "you are an old man !\n",
323 | "Wall time: 1.45 s\n"
324 | ]
325 | }
326 | ],
327 | "source": [
328 | "%%time\n",
329 | "print(editor.replacement_loop('You are an idiot!', verbose=False, n_units=3))"
330 | ]
331 | },
332 | {
333 | "cell_type": "code",
334 | "execution_count": 21,
335 | "metadata": {},
336 | "outputs": [
337 | {
338 | "name": "stdout",
339 | "output_type": "stream",
340 | "text": [
341 | "you are an old man !\n",
342 | "Wall time: 1.8 s\n"
343 | ]
344 | }
345 | ],
346 | "source": [
347 | "%%time\n",
348 | "print(editor.replacement_loop('You are an idiot!', verbose=False, n_units=10))"
349 | ]
350 | },
351 | {
352 | "cell_type": "code",
353 | "execution_count": 22,
354 | "metadata": {},
355 | "outputs": [],
356 | "source": [
357 | "import choosers\n",
358 | "reload(choosers)\n",
359 | "from choosers import EmbeddingSimilarityChooser"
360 | ]
361 | },
362 | {
363 | "cell_type": "markdown",
364 | "metadata": {},
365 | "source": [
366 | "# Reproduction"
367 | ]
368 | },
369 | {
370 | "cell_type": "code",
371 | "execution_count": 26,
372 | "metadata": {},
373 | "outputs": [],
374 | "source": [
375 | "predictor = MaskedTokenPredictorBert(\n",
376 | " model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0, \n",
377 | " confuse_bert_args=True, # this argument deteriorates quality but is used for backward compatibility\n",
378 | ")\n",
379 | "editor.predictor = predictor\n",
380 | "\n",
381 | "def adjust_logits(logits, label=0):\n",
382 | " return logits - editor.token_toxicities * 10\n",
383 | "\n",
384 | "predictor.logits_postprocessor = adjust_logits\n",
385 | "\n",
386 | "cho = EmbeddingSimilarityChooser(sim_coef=100, tokenizer=tokenizer)"
387 | ]
388 | },
389 | {
390 | "cell_type": "code",
391 | "execution_count": 27,
392 | "metadata": {},
393 | "outputs": [
394 | {
395 | "data": {
396 | "application/vnd.jupyter.widget-view+json": {
397 | "model_id": "8c53fb1e3c8648d58c811221c8ca7146",
398 | "version_major": 2,
399 | "version_minor": 0
400 | },
401 | "text/plain": [
402 | " 0%| | 0/10 [00:00, ?it/s]"
403 | ]
404 | },
405 | "metadata": {},
406 | "output_type": "display_data"
407 | },
408 | {
409 | "name": "stdout",
410 | "output_type": "stream",
411 | "text": [
412 | "you are a very lucky lucky person .\n",
413 | "as cold hearted and cold hearted as his boss\n",
414 | "a non - ethical society does not do bad things and votes for non - ethical politicians .\n",
415 | "a president who is an elected one .\n",
416 | "colbert is a half - hearted french patriot and is a powerful political leader .\n",
417 | "i was wrong , too wrong , too wrong , wrong , too wrong .\n",
418 | "i think i love you , i think you are an honest person , and youre contributions are very good .\n",
419 | "luna is a short , pretty , and funny young girl with no genes , smells bad , is pretty and funny , and is a pi to the worlds hard - core !\n",
420 | "mormons are brilliant people . mormons are brilliant people . . . mormons are smart mormons are smart mormons are smart mormons are smart mormons are smart mormons are smart mormons are smart mormons are smart people\n",
421 | "youre a pretty , funny , and beautiful , busy piece of my mind .\n"
422 | ]
423 | }
424 | ],
425 | "source": [
426 | "with open('../../data/test/test_10k_toxic', 'r') as inputs:\n",
427 | " lines = list(inputs.readlines())[:10]\n",
428 | " for i, line in enumerate(tqdm(lines)):\n",
429 | " inp = line.strip()\n",
430 | " out = editor.replacement_loop(inp, verbose=False, chooser=cho, n_top=10, n_tokens=(1,2,3), n_units=1)\n",
431 | " print(out)"
432 | ]
433 | },
434 | {
435 | "cell_type": "markdown",
436 | "metadata": {},
437 | "source": [
438 | "Expected output:\n",
439 | "```\n",
440 | "you are a very lucky lucky person .\n",
441 | "as cold hearted and cold hearted as his boss\n",
442 | "a non - ethical society does not do bad things and votes for non - ethical politicians .\n",
443 | "a president who is an elected one .\n",
444 | "colbert is a half - hearted french patriot and is a powerful political leader .\n",
445 | "i was wrong , too wrong , too wrong , wrong , too wrong .\n",
446 | "i think i love you , i think you are an honest person , and youre contributions are very good .\n",
447 | "luna is a short , pretty , and funny young girl with no genes , smells bad , is pretty and funny , and is a pi to the worlds hard - core !\n",
448 | "mormons are brilliant people . mormons are brilliant people . . . mormons are smart mormons are smart mormons are smart mormons are smart mormons are smart mormons are smart mormons are smart mormons are smart people\n",
449 | "youre a pretty , funny , and beautiful , busy piece of my mind .\n",
450 | "```"
451 | ]
452 | },
453 | {
454 | "cell_type": "code",
455 | "execution_count": null,
456 | "metadata": {},
457 | "outputs": [],
458 | "source": []
459 | }
460 | ],
461 | "metadata": {
462 | "kernelspec": {
463 | "display_name": "Python 3 (ipykernel)",
464 | "language": "python",
465 | "name": "python3"
466 | },
467 | "language_info": {
468 | "codemirror_mode": {
469 | "name": "ipython",
470 | "version": 3
471 | },
472 | "file_extension": ".py",
473 | "mimetype": "text/x-python",
474 | "name": "python",
475 | "nbconvert_exporter": "python",
476 | "pygments_lexer": "ipython3",
477 | "version": "3.9.6"
478 | }
479 | },
480 | "nbformat": 4,
481 | "nbformat_minor": 4
482 | }
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/condBERT/multiword/masked_token_predictor_bert.py:
--------------------------------------------------------------------------------
1 | import torch
2 |
3 | import numpy as np
4 | import copy
5 | import bisect
6 |
7 | from .predict_utils import find_bpe_position_by_offset
8 | from torch.utils.data import DataLoader
9 |
10 | from keras_preprocessing.sequence import pad_sequences
11 |
12 | import logging
13 | logger = logging.getLogger('usem-experiments')
14 |
15 |
16 | def bpe_tokenize(bpe_tokenizer, sentence):
17 | sent_bpe_tokens = []
18 | sent_bpe_offsets = []
19 | for token in sentence:
20 | token_bpes = bpe_tokenizer.tokenize(token.text)
21 | sent_bpe_offsets += [(token.begin, token.end) for _ in range(len(token_bpes))]
22 | sent_bpe_tokens += token_bpes
23 |
24 | return sent_bpe_tokens, sent_bpe_offsets
25 |
26 |
27 | def nlargest_indexes(arr, n_top):
28 | arr_ids = np.argpartition(arr, -n_top)[-n_top:]
29 | sel_arr = arr[arr_ids]
30 | top_ids = arr_ids[np.argsort(-sel_arr)]
31 | return top_ids
32 |
33 |
34 | def remove_masked_token_subwords(masked_position, bpe_tokens, bpe_offsets):
35 | """
36 | If the masked token has been tokenied into multiple subwords: like dieting-->diet and ##ing
37 | keep the first subword and remove others.
38 | """
39 |
40 | logger.debug(f'bpe tokens: {bpe_tokens}')
41 | logger.debug(f'bpe offsets: {bpe_offsets}')
42 |
43 | if len(masked_position[1]) > 1:
44 | indexes_to_del = masked_position[1][1:]
45 | del bpe_tokens[masked_position[0]][indexes_to_del[0] : indexes_to_del[-1] + 1]
46 | del bpe_offsets[masked_position[0]][indexes_to_del[0] : indexes_to_del[-1] + 1]
47 |
48 | masked_position = (masked_position[0], masked_position[1][0]) # TODO: leave masked_token as a list
49 |
50 | logger.debug(f'bpe offsets: {str(bpe_tokens)}')
51 | logger.debug(f'bpe offsets: {str(bpe_offsets)}')
52 |
53 | return masked_position, bpe_tokens, bpe_offsets
54 |
55 |
56 | def merge_sorted_results(objects_left, scores_left,
57 | objects_right, scores_right, max_elems):
58 | result_objects = []
59 | result_scores = []
60 |
61 | j = 0
62 | i = 0
63 | while True:
64 | if (len(result_scores) == max_elems):
65 | break
66 |
67 | if i == len(scores_left):
68 | result_objects += objects_right[j : j + max_elems - len(result_scores)]
69 | result_scores += scores_right[j : j + max_elems - len(result_scores)]
70 | break
71 |
72 | if j == len(scores_right):
73 | result_objects += objects_left[i : i + max_elems - len(result_scores)]
74 | result_scores += scores_left[i : i + max_elems - len(result_scores)]
75 | break
76 |
77 | if scores_left[i] > scores_right[j]:
78 | result_objects.append(objects_left[i])
79 | result_scores.append(scores_left[i])
80 | i += 1
81 | else:
82 | result_objects.append(objects_right[j])
83 | result_scores.append(scores_right[j])
84 | j += 1
85 |
86 | return result_objects, result_scores
87 |
88 |
89 | class MaskedTokenPredictorBert:
90 | def __init__(
91 | self, model, bpe_tokenizer, max_len=250, mask_in_multiunit=False, device=None, label=0,
92 | logits_postprocessor=None, contrast_penalty=0,
93 | mean=np.mean,
94 | confuse_bert_args=False,
95 | ):
96 | self._model = model
97 | self._bpe_tokenizer = bpe_tokenizer
98 | self._max_len = max_len
99 | self._mask_in_multiunit = mask_in_multiunit
100 | self.device = device or torch.device('cuda')
101 | self.label = label
102 | self.logits_postprocessor = logits_postprocessor
103 | self.contrast_penalty = contrast_penalty
104 | self.mean = mean # could be scipy.stats.hmean, but for negative tokens
105 | self.confuse_bert_args = confuse_bert_args # reproduce an old error for backward compatibility
106 |
107 | def __call__(self, sentences, masked_position, **kwargs):
108 | if type(masked_position) is not list:
109 | bpe_tokens = [bpe_tokens]
110 | masked_position = [masked_position]
111 |
112 | b_masked_pos = []
113 | b_bpe_tokens = []
114 | for sent, mask_pos in zip(sentences, masked_position):
115 | bpe_tokens, bpe_offsets = bpe_tokenize(self._bpe_tokenizer, sent)
116 |
117 | masked_position = find_bpe_position_by_offset([bpe_offsets],
118 | (sent[mask_pos].begin,
119 | sent[mask_pos].end))
120 |
121 | masked_position, bpe_tokens, _ = remove_masked_token_subwords(masked_position,
122 | [bpe_tokens],
123 | [bpe_offsets])
124 | bpe_tokens = bpe_tokens[0]
125 |
126 | logger.debug(f'Bpe tokens: {bpe_tokens}')
127 |
128 | b_bpe_tokens.append(bpe_tokens)
129 | b_masked_pos.append(masked_position[1])
130 |
131 | return self.generate(b_bpe_tokens, b_masked_pos, **kwargs)
132 |
133 | def generate(self, b_bpe_tokens, b_masked_pos,
134 | mask_token=True,
135 | n_top=5,
136 | n_units=1,
137 | n_tokens=[1],
138 | fix_multiunit=True,
139 | beam_size=10,
140 | multiunit_lookup=100,
141 | max_multiunit=10,
142 | label=None,
143 | **kwargs
144 | ):
145 | result_preds = [[] for _ in range(len(b_bpe_tokens))]
146 | result_scores = [[] for _ in range(len(b_bpe_tokens))]
147 | if 1 in n_tokens:
148 | result_preds, result_scores = self.predict_single_word(b_bpe_tokens, b_masked_pos,
149 | mask_token=mask_token,
150 | n_top=n_top,
151 | n_units=n_units,
152 | multiunit_lookup=multiunit_lookup,
153 | fix_multiunit=fix_multiunit,
154 | max_multiunit=max_multiunit, label=label)
155 |
156 | for n_t in n_tokens:
157 | if n_t == 1:
158 | continue
159 |
160 | pred_tokens, pred_scores = self.predict_token_sequence(b_bpe_tokens, b_masked_pos,
161 | mask_token=mask_token,
162 | n_top=n_top,
163 | n_units=n_units,
164 | seq_len=n_t,
165 | multiunit_lookup=multiunit_lookup,
166 | fix_multiunit=fix_multiunit,
167 | beam_size=beam_size,
168 | max_multiunit=max_multiunit, label=label)
169 |
170 | for i in range(len(b_bpe_tokens)):
171 | result_preds[i], result_scores[i] = merge_sorted_results(result_preds[i], result_scores[i],
172 | pred_tokens[i], pred_scores[i],
173 | n_top)
174 |
175 | return result_preds, result_scores
176 |
177 | def predict_single_unit(self, bpe_tokens, masked_position,
178 | mask_token, n_top, label=None):
179 | if label is None:
180 | label = self.label
181 | bpe_tokens = copy.deepcopy(bpe_tokens)
182 |
183 | max_len = min([max(len(e) for e in bpe_tokens) + 2, self._max_len])
184 | token_ids = []
185 | for i in range(len(bpe_tokens)):
186 | bpe_tokens[i] = bpe_tokens[i][:max_len - 2]
187 |
188 | if mask_token:
189 | if i >= len(masked_position):
190 | continue
191 | pos = masked_position[i]
192 | if pos >= len(bpe_tokens[i]):
193 | continue
194 | bpe_tokens[i][pos] = '[MASK]'
195 |
196 | bpe_tokens[i] = ['[CLS]'] + bpe_tokens[i] + ['[SEP]']
197 | logger.debug(f'Masked BPE tokens: {bpe_tokens[i]}')
198 |
199 | token_ids.append(self._bpe_tokenizer.convert_tokens_to_ids(bpe_tokens[i]))
200 |
201 | token_ids = pad_sequences(token_ids, maxlen=max_len, dtype='long',
202 | truncating='post', padding='post')
203 | attention_masks_tensor = torch.tensor(token_ids > 0).long().to(self.device)
204 | tokens_tensor = torch.tensor(token_ids).to(self.device)
205 |
206 | segments_ids = np.ones_like(token_ids, dtype=int) * label
207 | segments_tensor = torch.tensor(segments_ids).to(self.device)
208 |
209 | self._model.eval()
210 | with torch.no_grad():
211 | if self.confuse_bert_args:
212 | target_sent = self._model(
213 | tokens_tensor, attention_mask=segments_tensor, token_type_ids=attention_masks_tensor
214 | )[0]
215 | else:
216 | target_sent = self._model(
217 | tokens_tensor, token_type_ids=segments_tensor, attention_mask=attention_masks_tensor
218 | )[0]
219 |
220 | # target_sent = torch.log_softmax(target_sent, -1)
221 |
222 | if self.contrast_penalty:
223 | # todo: make it work with multiple words (it breaks hypotheses somehow)
224 | with torch.no_grad():
225 | another = self._model(tokens_tensor, token_type_ids=1 - segments_tensor, attention_mask=attention_masks_tensor)[0]
226 | diff = torch.softmax(target_sent, -1) - self.contrast_penalty * torch.softmax(another, -1)
227 | target_sent = torch.log(torch.clamp(diff, 1e-20))
228 |
229 | target_sent = target_sent.detach().cpu().numpy()
230 |
231 | final_top_scores = []
232 | final_top_tokens = []
233 | for i in range(target_sent.shape[0]):
234 | row = target_sent[i]
235 | idx = masked_position[i]
236 | if idx + 1 >= len(row):
237 | # todo: investigate why this shit happens when n_units>1
238 | continue
239 | logits = row[idx + 1]
240 | logits = self.adjust_logits(logits, label=label)
241 | top_ids = nlargest_indexes(logits, n_top)
242 | top_scores = [target_sent[i][masked_position[i] + 1][j] for j in top_ids]
243 | top_tokens = self._bpe_tokenizer.convert_ids_to_tokens(top_ids)
244 |
245 | final_top_scores.append(top_scores)
246 | final_top_tokens.append(top_tokens)
247 |
248 | return final_top_tokens, final_top_scores
249 |
250 | def adjust_logits(self, logits, label=0):
251 | if self.logits_postprocessor:
252 | return self.logits_postprocessor(logits, label=label or 0)
253 | return logits
254 |
255 | def predict_single_word(self, bpe_tokens, masked_position,
256 | mask_token,
257 | n_top,
258 | n_units,
259 | fix_multiunit,
260 | multiunit_lookup,
261 | max_multiunit,
262 | label=None
263 | ):
264 | pred_tokens, scores = self.predict_single_unit(bpe_tokens,
265 | masked_position,
266 | mask_token=mask_token,
267 | n_top=n_top, label=label)
268 |
269 | final_pred_tokens = []
270 | final_scores = []
271 | for j in range(len(pred_tokens)):
272 | if n_units > 1:
273 | pred_tokens[j] = list(reversed(pred_tokens[j][:multiunit_lookup]))
274 | scores[j] = list(reversed(scores[j][:multiunit_lookup]))
275 |
276 | seq_list = self.generate_multiunit_token(masked_position[j], bpe_tokens[j],
277 | n_top=multiunit_lookup,
278 | n_units=n_units, label=label)
279 |
280 | #for seq in seq_list[ :max_multiunit - 1]:
281 | for seq in seq_list[:max_multiunit]:
282 | seq_pred, seq_scores = seq
283 | multiunit_token = '_'.join(seq_pred)
284 | if fix_multiunit:
285 | multiunit_token = multiunit_token.replace('#', '')
286 | multiunit_token = multiunit_token.replace('_', '')
287 |
288 | #multiunit_score = np.average(seq_scores)
289 | multiunit_score = self.mean(seq_scores)
290 |
291 | ind = bisect.bisect(scores[j], multiunit_score)
292 |
293 | pred_tokens[j].insert(ind, multiunit_token)
294 | scores[j].insert(ind, multiunit_score)
295 |
296 | pred_tokens[j] = list(reversed(pred_tokens[j]))
297 | scores[j] = list(reversed(scores[j]))
298 |
299 | logger.debug(f'Predicted words: {pred_tokens[j]}')
300 |
301 | #final_pred_tokens.append([e for e in pred_tokens[j] if not e.startswith('##')])
302 | final_pred_tokens.append(pred_tokens[j][:n_top])
303 | final_scores.append(scores[j][:n_top])
304 |
305 | return final_pred_tokens, final_scores
306 |
307 | def generate_multiunit_token(self, masked_position, bpe_tokens,
308 | n_top,
309 | n_units,
310 | label=None,
311 | ):
312 | final_result = []
313 | final_result_scores = []
314 |
315 | bpe_tokens = copy.deepcopy(bpe_tokens)
316 | bpe_tokens.insert(masked_position, '[MASK]')
317 | predictions, scores = self.predict_single_unit([bpe_tokens],
318 | [masked_position + 1],
319 | n_top=n_top,
320 | mask_token=self._mask_in_multiunit,
321 | label=label,
322 | )
323 | # This will result in "Mama washed the [MASK] __frame__ ."
324 |
325 | if len(predictions) == 0:
326 | return []
327 |
328 | predictions = predictions[0]
329 | scores = scores[0]
330 | #n_suffix = 0
331 |
332 | good_preds = []
333 |
334 | # TODO: increase speed
335 | b_bpe_tokens = []
336 | for i, pred in (e for e in enumerate(predictions) if e[1][0] == '#'):
337 | # if n_suffix > n_units - 1:
338 | # break
339 |
340 | tmp = copy.deepcopy(bpe_tokens)
341 | tmp[masked_position + 1] = pred
342 | b_bpe_tokens.append(tmp)
343 | good_preds.append((i,pred))
344 | #n_suffix += 1
345 |
346 | if not good_preds:
347 | return []
348 |
349 |
350 | loader = DataLoader(b_bpe_tokens, batch_size=10, collate_fn=lambda _: _)
351 | preds = []
352 | pred_scores = []
353 | for batch in loader:
354 | bb_preds, bb_pred_scores = self.predict_single_unit(batch,
355 | [masked_position for _ in range(len(batch))],
356 | mask_token=False, # We do not need masking here, since inserted token will be a mask
357 | n_top=n_top, label=label
358 | )
359 |
360 | preds += bb_preds
361 | pred_scores += bb_pred_scores
362 |
363 | for i in range(len(preds)):
364 | result = [preds[i][0], good_preds[i][1]]
365 | result_scores = [pred_scores[i][0], scores[good_preds[i][0]]]
366 |
367 | tail, tail_scores = self.generate_from_tail(preds[i][0], b_bpe_tokens[i], masked_position,
368 | max_subunits=n_units-2,
369 | n_top=n_top, label=label)
370 | result = tail + result
371 | result_scores = tail_scores + result_scores
372 |
373 | final_result.append(result)
374 | final_result_scores.append(result_scores)
375 |
376 | return list(zip(final_result, final_result_scores))
377 |
378 | def generate_from_tail(self, pred, bpe_tokens, masked_position,
379 | max_subunits,
380 | n_top, label=None,):
381 | result = []
382 | result_scores = []
383 |
384 | it = 0
385 | while (pred[0] == '#') and (it < max_subunits):
386 | bpe_tokens[masked_position] = pred
387 | bpe_tokens.insert(masked_position, '[MASK]')
388 | preds, pred_scores = self.predict_single_unit([bpe_tokens], [masked_position],
389 | n_top=n_top,
390 | mask_token=False, label=label)
391 | pred = preds[0][0]
392 | result.append(pred)
393 | result_scores.append(pred_scores[0][0])
394 | it += 1
395 |
396 | return (list(reversed(result)),
397 | list(reversed(result_scores)))
398 |
399 | def generate_variants(self, bpe_tokens, mask_pos,
400 | gen_tokens, gen_scores, seq_len):
401 | batch_size = len(bpe_tokens)
402 |
403 | if not gen_tokens:
404 | yield bpe_tokens, [0.]*batch_size, [[] for _ in range(batch_size)], mask_pos
405 | return
406 |
407 | for var_num in range(len(gen_tokens[0])):
408 | if not gen_tokens[0][var_num]:
409 | continue
410 |
411 | variant = []
412 | new_mask = []
413 | var_t = []
414 | var_s = []
415 | for i in range(batch_size):
416 | new_bpe = copy.deepcopy(bpe_tokens[i])
417 |
418 | for seq_num in range(len(gen_tokens[i][var_num])):
419 | new_bpe[mask_pos[i] + seq_num] = gen_tokens[i][var_num][seq_num]
420 |
421 | var_t.append(gen_tokens[i][var_num])
422 | var_s.append(gen_scores[i][var_num])
423 |
424 | new_mask.append(mask_pos[i] + len(gen_tokens[i][var_num]))
425 |
426 | variant.append(new_bpe)
427 |
428 | yield variant, var_s, var_t, new_mask
429 |
430 | def update_beam(self,
431 | prev_tokens, prev_score,
432 | new_scores, new_tokens,
433 | gen_scores, gen_tokens):
434 | for i in range(len(gen_scores)):
435 | final_gen_score = prev_score + gen_scores[i]
436 | insert_pos = bisect.bisect(new_scores, final_gen_score)
437 |
438 | new_scores.insert(insert_pos, final_gen_score)
439 | del new_scores[0]
440 |
441 | new_tokens.insert(insert_pos, prev_tokens + [gen_tokens[i]])
442 | if len(new_tokens) > len(new_scores):
443 | del new_tokens[0]
444 |
445 | def predict_token_sequence(self, bpe_tokens, masked_pos,
446 | mask_token,
447 | n_top,
448 | seq_len,
449 | beam_size,
450 | n_units,
451 | fix_multiunit,
452 | multiunit_lookup,
453 | max_multiunit,
454 | label=None,
455 | ):
456 | bpe_tokens = copy.deepcopy(bpe_tokens)
457 |
458 | batch_size = len(bpe_tokens)
459 | for i in range(batch_size):
460 | for seq_num in range(seq_len - 1):
461 | bpe_tokens[i].insert(masked_pos[i] + 1, '[MASK]')
462 |
463 | gen_scores = []
464 | gen_tokens = []
465 | for seq_num in range(seq_len):
466 | gen_scores_seq = [[0. for __ in range(beam_size)] for _ in range(batch_size)]
467 | gen_tokens_seq = [[[] for __ in range(beam_size)] for _ in range(batch_size)]
468 | for variant, variant_score, prev_tokens, new_mask in self.generate_variants(bpe_tokens, masked_pos,
469 | gen_tokens, gen_scores,
470 | seq_len=seq_len):
471 | top_tokens, top_scores = self.predict_single_word(variant, new_mask,
472 | mask_token=True,
473 | n_top=n_top,
474 | n_units=n_units,
475 | fix_multiunit=fix_multiunit,
476 | multiunit_lookup=multiunit_lookup,
477 | max_multiunit=max_multiunit,
478 | label=label,
479 | )
480 |
481 | for i in range(batch_size):
482 | self.update_beam(prev_tokens[i], variant_score[i],
483 | gen_scores_seq[i], gen_tokens_seq[i],
484 | top_scores[i], top_tokens[i])
485 |
486 | gen_tokens = gen_tokens_seq
487 | gen_scores = gen_scores_seq
488 |
489 | gen_scores = [[(e/seq_len) for e in l] for l in gen_scores]
490 |
491 | return ([list(reversed(e)) for e in gen_tokens],
492 | [list(reversed(e)) for e in gen_scores])
493 |
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/condBERT/multiword/predict_utils.py:
--------------------------------------------------------------------------------
1 | def find_bpe_position_by_offset(bpe_offsets, target_offset):
2 | bpe_nums=[]
3 | for sent_num, sent in enumerate(bpe_offsets):
4 | if sent[-1][0] < target_offset[0]:
5 | continue
6 |
7 | for bpe_num, bpe in enumerate(sent):
8 | if target_offset[0] <= bpe[0] and bpe[1] <= target_offset[1]:
9 | bpe_nums.append(bpe_num)
10 | return (sent_num, bpe_nums)
11 |
12 |
13 | def generate_seq_indexes(indexes):
14 | if not indexes:
15 | yield []
16 | return
17 |
18 | for ind in indexes[0]:
19 | for seq in generate_seq_indexes(indexes[1:]):
20 | yield [ind] + seq
21 |
22 | """failure case of tokenizer:
23 | tagged_text = "Earlier this year , some 70 U.S. congressmen sent a letter to U. __S.__ President Bill Clinton , calling for an end to the humanitarian crisis in Iraq by having the sanctions lifted ."
24 | """
25 |
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/condBERT/vocab/toxic_words.txt:
--------------------------------------------------------------------------------
1 | 4r5e
2 | 5h1t
3 | 5hit
4 | a55
5 | anal
6 | anus
7 | ar5e
8 | arrse
9 | arse
10 | ass
11 | ass-fucker
12 | asses
13 | assfucker
14 | assfukka
15 | asshole
16 | assholes
17 | asswhole
18 | a_s_s
19 | b!tch
20 | b00bs
21 | b17ch
22 | b1tch
23 | ballbag
24 | balls
25 | ballsack
26 | bastard
27 | beastial
28 | beastiality
29 | bellend
30 | bestial
31 | bestiality
32 | bi+ch
33 | biatch
34 | bitch
35 | bitcher
36 | bitchers
37 | bitches
38 | bitchin
39 | bitching
40 | bloody
41 | blow job
42 | blowjob
43 | blowjobs
44 | boiolas
45 | bollock
46 | bollok
47 | boner
48 | boob
49 | boobs
50 | booobs
51 | boooobs
52 | booooobs
53 | booooooobs
54 | breasts
55 | buceta
56 | bugger
57 | bum
58 | bunny fucker
59 | butt
60 | butthole
61 | buttmunch
62 | buttplug
63 | c0ck
64 | c0cksucker
65 | carpet muncher
66 | cawk
67 | chink
68 | cipa
69 | cl1t
70 | clit
71 | clitoris
72 | clits
73 | cnut
74 | cock
75 | cock-sucker
76 | cockface
77 | cockhead
78 | cockmunch
79 | cockmuncher
80 | cocks
81 | cocksuck
82 | cocksucked
83 | cocksucker
84 | cocksucking
85 | cocksucks
86 | cocksuka
87 | cocksukka
88 | cok
89 | cokmuncher
90 | coksucka
91 | coon
92 | cox
93 | crap
94 | cum
95 | cummer
96 | cumming
97 | cums
98 | cumshot
99 | cunilingus
100 | cunillingus
101 | cunnilingus
102 | cunt
103 | cuntlick
104 | cuntlicker
105 | cuntlicking
106 | cunts
107 | cyalis
108 | cyberfuc
109 | cyberfuck
110 | cyberfucked
111 | cyberfucker
112 | cyberfuckers
113 | cyberfucking
114 | d1ck
115 | damn
116 | dick
117 | dickhead
118 | dildo
119 | dildos
120 | dink
121 | dinks
122 | dirsa
123 | dlck
124 | dog-fucker
125 | doggin
126 | dogging
127 | donkeyribber
128 | doosh
129 | duche
130 | dyke
131 | ejaculate
132 | ejaculated
133 | ejaculates
134 | ejaculating
135 | ejaculatings
136 | ejaculation
137 | ejakulate
138 | f u c k
139 | f u c k e r
140 | f4nny
141 | fag
142 | fagging
143 | faggitt
144 | faggot
145 | faggs
146 | fagot
147 | fagots
148 | fags
149 | fanny
150 | fannyflaps
151 | fannyfucker
152 | fanyy
153 | fatass
154 | fcuk
155 | fcuker
156 | fcuking
157 | feck
158 | fecker
159 | felching
160 | fellate
161 | fellatio
162 | fingerfuck
163 | fingerfucked
164 | fingerfucker
165 | fingerfuckers
166 | fingerfucking
167 | fingerfucks
168 | fistfuck
169 | fistfucked
170 | fistfucker
171 | fistfuckers
172 | fistfucking
173 | fistfuckings
174 | fistfucks
175 | flange
176 | fook
177 | fooker
178 | fuck
179 | fucka
180 | fucked
181 | fucker
182 | fuckers
183 | fuckhead
184 | fuckheads
185 | fuckin
186 | fucking
187 | fuckings
188 | fuckingshitmotherfucker
189 | fuckme
190 | fucks
191 | fuckwhit
192 | fuckwit
193 | fudge packer
194 | fudgepacker
195 | fuk
196 | fuker
197 | fukker
198 | fukkin
199 | fuks
200 | fukwhit
201 | fukwit
202 | fux
203 | fux0r
204 | f_u_c_k
205 | gangbang
206 | gangbanged
207 | gangbangs
208 | gaylord
209 | gaysex
210 | goatse
211 | God
212 | god-dam
213 | god-damned
214 | goddamn
215 | goddamned
216 | hardcoresex
217 | hell
218 | heshe
219 | hoar
220 | hoare
221 | hoer
222 | homo
223 | hore
224 | horniest
225 | horny
226 | hotsex
227 | jack-off
228 | jackoff
229 | jap
230 | jerk-off
231 | jism
232 | jiz
233 | jizm
234 | jizz
235 | kawk
236 | knob
237 | knobead
238 | knobed
239 | knobend
240 | knobhead
241 | knobjocky
242 | knobjokey
243 | kock
244 | kondum
245 | kondums
246 | kum
247 | kummer
248 | kumming
249 | kums
250 | kunilingus
251 | l3i+ch
252 | l3itch
253 | labia
254 | lmfao
255 | lust
256 | lusting
257 | m0f0
258 | m0fo
259 | m45terbate
260 | ma5terb8
261 | ma5terbate
262 | masochist
263 | master-bate
264 | masterb8
265 | masterbat*
266 | masterbat3
267 | masterbate
268 | masterbation
269 | masterbations
270 | masturbate
271 | mo-fo
272 | mof0
273 | mofo
274 | mothafuck
275 | mothafucka
276 | mothafuckas
277 | mothafuckaz
278 | mothafucked
279 | mothafucker
280 | mothafuckers
281 | mothafuckin
282 | mothafucking
283 | mothafuckings
284 | mothafucks
285 | mother fucker
286 | motherfuck
287 | motherfucked
288 | motherfucker
289 | motherfuckers
290 | motherfuckin
291 | motherfucking
292 | motherfuckings
293 | motherfuckka
294 | motherfucks
295 | muff
296 | mutha
297 | muthafecker
298 | muthafuckker
299 | muther
300 | mutherfucker
301 | n1gga
302 | n1gger
303 | nazi
304 | nigg3r
305 | nigg4h
306 | nigga
307 | niggah
308 | niggas
309 | niggaz
310 | nigger
311 | niggers
312 | nob
313 | nob jokey
314 | nobhead
315 | nobjocky
316 | nobjokey
317 | numbnuts
318 | nutsack
319 | orgasim
320 | orgasims
321 | orgasm
322 | orgasms
323 | p0rn
324 | pawn
325 | pecker
326 | penis
327 | penisfucker
328 | phonesex
329 | phuck
330 | phuk
331 | phuked
332 | phuking
333 | phukked
334 | phukking
335 | phuks
336 | phuq
337 | pigfucker
338 | pimpis
339 | piss
340 | pissed
341 | pisser
342 | pissers
343 | pisses
344 | pissflaps
345 | pissin
346 | pissing
347 | pissoff
348 | poop
349 | porn
350 | porno
351 | pornography
352 | pornos
353 | prick
354 | pricks
355 | pron
356 | pube
357 | pusse
358 | pussi
359 | pussies
360 | pussy
361 | pussys
362 | rectum
363 | retard
364 | rimjaw
365 | rimming
366 | s hit
367 | s.o.b.
368 | sadist
369 | schlong
370 | screwing
371 | scroat
372 | scrote
373 | scrotum
374 | semen
375 | sex
376 | sh!+
377 | sh!t
378 | sh1t
379 | shag
380 | shagger
381 | shaggin
382 | shagging
383 | shemale
384 | shi+
385 | shit
386 | shitdick
387 | shite
388 | shited
389 | shitey
390 | shitfuck
391 | shitfull
392 | shithead
393 | shiting
394 | shitings
395 | shits
396 | shitted
397 | shitter
398 | shitters
399 | shitting
400 | shittings
401 | shitty
402 | skank
403 | slut
404 | sluts
405 | smegma
406 | smut
407 | snatch
408 | son-of-a-bitch
409 | spac
410 | spunk
411 | s_h_i_t
412 | t1tt1e5
413 | t1tties
414 | teets
415 | teez
416 | testical
417 | testicle
418 | tit
419 | titfuck
420 | tits
421 | titt
422 | tittie5
423 | tittiefucker
424 | titties
425 | tittyfuck
426 | tittywank
427 | titwank
428 | tosser
429 | turd
430 | tw4t
431 | twat
432 | twathead
433 | twatty
434 | twunt
435 | twunter
436 | v14gra
437 | v1gra
438 | vagina
439 | viagra
440 | vulva
441 | w00se
442 | wang
443 | wank
444 | wanker
445 | wanky
446 | whoar
447 | whore
448 | willies
449 | willy
450 | xrated
451 | xxx
452 |
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/condBERT/vocab/word2coef.pkl:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/s-nlp/detox/0ebaeab817957bb5463819bec7fa4ed3de9a26ee/emnlp2021/style_transfer/condBERT/vocab/word2coef.pkl
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/paraGeDi/gedi_adapter.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | import torch
3 |
4 | from transformers.generation_utils import GenerationMixin
5 |
6 |
7 | class GediAdapter(GenerationMixin):
8 | def __init__(
9 | self, model, gedi_model,
10 | target=0, # todo: pass them in the generate method as well
11 | gedi_logit_coef=1,
12 | pos_code=1349,
13 | neg_code=13086,
14 | tokenizer=None,
15 | max_id=None,
16 | debug=False,
17 | reg_alpha=0,
18 | ub=None,
19 | lb=None,
20 | untouchable_tokens=None,
21 | nearly_infinity = -1000,
22 | ):
23 | self.model = model
24 | self.gedi_model = gedi_model
25 | self.target = target
26 | self.gedi_logit_coef = gedi_logit_coef
27 | self.POS_CODE = pos_code
28 | self.NEG_CODE = neg_code
29 | self.codes = {'gedi_pos': self.POS_CODE, 'gedi_neg': self.NEG_CODE}
30 | self.tokenizer = tokenizer
31 | self.max_id = max_id
32 | self.debug = debug
33 | self.reg_alpha = reg_alpha
34 | self.ub = ub
35 | self.lb = lb
36 | self.logits = []
37 | self.untouchable_tokens = untouchable_tokens or []
38 | self.nearly_infinity = nearly_infinity
39 |
40 | def show_correction(self, sm, logits, corrected, old_logits):
41 | if self.tokenizer:
42 | vals = sm.cpu().numpy()[0][0]
43 | lv = logits.cpu().numpy()[0]
44 | cv = corrected.cpu().numpy()[0]
45 | if self.max_id:
46 | vals = vals[:self.max_id]
47 | lv = lv[:self.max_id]
48 | cv = cv[:self.max_id]
49 | # the most upgraded and downgraded tokens
50 | #print('+', self.tokenizer.convert_ids_to_tokens(np.argsort(-vals)[:5]), -np.sort(-vals)[:3])
51 | #print('-', self.tokenizer.convert_ids_to_tokens(np.argsort(vals)[:5]), np.sort(vals)[:3])
52 | #print(torch.exp(logits).sum())
53 | # how the top logits change
54 | old_top_id = np.argsort(-lv)[:5]
55 | new_top_id = np.argsort(-cv)[:5]
56 | toks = list(new_top_id)
57 | for t in old_top_id:
58 | if t not in toks:
59 | toks.append(t)
60 |
61 | pos_logits = old_logits[0][0].cpu().numpy()
62 | neg_logits = old_logits[1][0].cpu().numpy()
63 |
64 | texts = self.tokenizer.convert_ids_to_tokens(toks)
65 | print('changes in the top:')
66 | for text, idx in zip(texts, toks):
67 | print('{:6d}: {:+2.2f} > {:+2.2f} {:20s} [{:+2.2f} | {:+2.2f}]'.format(idx, lv[idx], cv[idx], text, pos_logits[idx], neg_logits[idx]))
68 | #print()
69 | #print(self.tokenizer.convert_ids_to_tokens(old_top_id), self.tokenizer.convert_ids_to_tokens(new_top_id))
70 |
71 | def __call__(self, return_dict=True, **kwargs):
72 | new_args = kwargs.get('main', {})
73 | with torch.no_grad():
74 | #print('main inputs:', {k: v.shape for k, v in new_args.items() if isinstance(v, torch.Tensor)})
75 | outputs = self.model(return_dict=return_dict, **new_args)
76 | #print('main outputs:', {k: v.shape for k, v in outputs.items() if isinstance(v, torch.Tensor)})
77 | outputs['main'] = outputs
78 | gedi_logits = {}
79 | for gedi_key in ['gedi_pos', 'gedi_neg']:
80 | gedi_args = kwargs.get(gedi_key, {})
81 | with torch.no_grad():
82 | gedi_out = self.gedi_model(**gedi_args, return_dict=True)
83 | outputs[gedi_key] = gedi_out # logits are [batch, seq, voc]
84 | gedi_logits[gedi_key] = gedi_out['logits'][:,-1] # [batch, voc]
85 | # todo: add cumulative log loss
86 | stacked = torch.stack([gedi_logits['gedi_pos'], gedi_logits['gedi_neg']]) # [2, batch, voc]
87 | # exclude untouchable tokens from the distribution on which the penalty is calculated
88 | for token_id in self.untouchable_tokens:
89 | stacked[:, :, token_id] = self.nearly_infinity
90 | if self.reg_alpha:
91 | # increase each p(token|class) by the same amount, to shift odds ratio to 1.
92 | old_logits = torch.log(torch.softmax(stacked, -1) + self.reg_alpha)
93 | else:
94 | old_logits = torch.log_softmax(stacked, -1)
95 |
96 | if hasattr(self.gedi_model, 'logit_scale'):
97 | old_logits += self.gedi_model.logit_scale
98 | if hasattr(self.gedi_model, 'bias'):
99 | old_logits += self.gedi_model.bias.reshape(2, 1, 1).repeat(1,1,old_logits.shape[-1])
100 |
101 | sm = torch.log_softmax(old_logits, 0)
102 | logits = outputs['logits'][:,-1]
103 |
104 | # bad patch for eos
105 | #sm[:, :,1] = 0
106 |
107 | shift = sm[self.target]
108 | # shift everything by a constant to make logits before and after change more comparable
109 | shift -= shift.mean()
110 | # limit the positive or negative impact of gedi correction
111 | if self.lb is not None or self.ub is not None:
112 | shift = torch.clamp(shift, self.lb, self.ub)
113 | for token_id in self.untouchable_tokens:
114 | shift[:, token_id] = 0
115 |
116 | corrected = logits + shift * self.gedi_logit_coef
117 | if self.debug:
118 | self.show_correction(sm, logits, corrected, torch.log_softmax(stacked, -1))
119 | if self.max_id is not None:
120 | corrected[self.max_id:] = -np.infty
121 | outputs['logits'] = corrected.unsqueeze(1) # add back sequence length
122 | # todo: calculate the loss with respect to the new logits
123 |
124 | #self.logits.append(corrected)
125 | return outputs
126 |
127 | def prepare_inputs_for_generation(self, input_ids, **kwargs):
128 | # todo: prepare inputs separately for main and gedi models
129 | result = {}
130 | # unpack past after beam search application
131 | past = kwargs.get('past')
132 | #print('past:', past is not None)
133 | if past and not isinstance(past, tuple):
134 | for k, v in past.items():
135 | kwargs[k]['past'] = v
136 |
137 | main_input_ids = input_ids
138 | main_kwargs = kwargs.get('main', kwargs)
139 | if kwargs.get('main_prefix') is not None and main_kwargs.get('past') is None:
140 | prefix = kwargs['main_prefix'].unsqueeze(0).repeat(main_input_ids.shape[0], 1)
141 | main_input_ids = torch.cat([prefix, main_input_ids], dim=1)
142 | #print('main reshape text:', input_ids.shape, main_input_ids.shape)
143 | if main_kwargs.get('attention_mask') is not None:
144 | old_mask = main_kwargs['attention_mask']
145 | mask_prefix = prefix * 0 + 1
146 | main_kwargs['attention_mask'] = torch.cat([mask_prefix, old_mask], dim=1)
147 | #print('main reshape mask:', old_mask.shape, main_kwargs['attention_mask'].shape)
148 | #print('main ', self.tokenizer.decode(main_input_ids[0]))
149 | result['main'] = self.model.prepare_inputs_for_generation(main_input_ids, **main_kwargs)
150 |
151 | for k in ['gedi_pos', 'gedi_neg']:
152 | gedi_args = kwargs.get(k, {})
153 | # todo: allow concatenation of codes to the input instead of replacing the first token
154 | if kwargs.get('gedi_prepend'):
155 | # prepend the code to the input
156 | prefix = torch.ones([input_ids.shape[0], 1], dtype=input_ids.dtype).to(input_ids.device) * self.codes[k]
157 | new_input_ids = torch.cat([prefix, input_ids], dim=1)
158 | # todo: enlarge the attention mask
159 | else:
160 | # instert the code instead of the first token of the input
161 | new_input_ids = input_ids.clone() # batch size x seq len
162 | new_input_ids[:, 0] = self.codes[k]
163 | #print(k, self.tokenizer.decode(new_input_ids[0]))
164 | gedi_inputs = self.gedi_model.prepare_inputs_for_generation(new_input_ids, **gedi_args)
165 | result[k] = gedi_inputs
166 | return result
167 |
168 | def _update_model_kwargs_for_generation(self, outputs, model_kwargs, is_encoder_decoder=False):
169 | # todo: use the new outputs as args
170 | result = {k: v for k, v in model_kwargs.items()}
171 | #print('main before upd:', {k: v.shape for k, v in model_kwargs.get('main', model_kwargs).items() if isinstance(v, torch.Tensor)}, 'main' in model_kwargs)
172 | result['main'] = self.model._update_model_kwargs_for_generation(
173 | outputs=outputs['main'],
174 | model_kwargs=model_kwargs.get('main', model_kwargs),
175 | is_encoder_decoder=self.model.config.is_encoder_decoder,
176 | )
177 | #print('main after upd:', {k: v.shape for k, v in result['main'].items() if isinstance(v, torch.Tensor)})
178 | for k in ['gedi_pos', 'gedi_neg']:
179 | result[k] = self.gedi_model._update_model_kwargs_for_generation(
180 | outputs=outputs[k],
181 | model_kwargs=model_kwargs.get(k, {}),
182 | is_encoder_decoder=self.gedi_model.config.is_encoder_decoder,
183 | )
184 |
185 | # a fix for beam search
186 | result['past'] = {
187 | k: result[k]['past']
188 | for k in ['main', 'gedi_pos', 'gedi_neg']
189 | if 'past' in result[k] and result[k]['past'] is not None and result[k]['past'][0] is not None
190 | }
191 | return result
192 |
193 |
194 | def paraphrase(self, text, n=None, max_length=128):
195 | inputs = tokenizer(text, return_tensors='pt', padding=True)['input_ids'].to(device)
196 | if max_length == 'auto':
197 | max_length = min(int(inputs.shape[1] * 1.1) + 4, 64)
198 | result = adapter.generate(
199 | inputs,
200 | num_return_sequences=n or 1,
201 | do_sample=False, temperature=0.0, repetition_penalty=3.0, max_length=max_length,
202 | bad_words_ids=[[2]], # unk
203 | )
204 | texts = [tokenizer.decode(r) for r in result]
205 | if not n:
206 | return texts[0]
207 | return texts
208 |
209 | def _reorder_cache(self, past, beam_idx):
210 | # for each model, cache should be reordered separately
211 | result = {}
212 | for key, subpast in past.items():
213 | model = self.model if key == 'main' else self.gedi_model
214 | result[key] = model._reorder_cache(subpast, beam_idx)
215 | return result
216 |
217 | # the data below is used only for compatibility
218 | # todo: clarify it
219 | def get_output_embeddings(self):
220 | return True
221 |
222 | @property
223 | def config(self):
224 | return self.model.config
225 |
226 | def get_encoder(self):
227 | return self.model.get_encoder()
228 |
229 | def parameters(self):
230 | return self.model.parameters()
231 |
232 | @property
233 | def device(self):
234 | return self.model.device
235 |
236 | @property
237 | def main_input_name(self):
238 | return self.model.main_input_name
239 |
240 | def forward(self, attention_mask=None, **kwargs):
241 | pass
242 |
--------------------------------------------------------------------------------
/emnlp2021/style_transfer/paraGeDi/text_processing.py:
--------------------------------------------------------------------------------
1 | import re
2 |
3 | punkt = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ '
4 |
5 |
6 | def detokenize(text):
7 | text = text.replace(" .", ".").replace(" ,", ",").replace(" !", "!")
8 | text = text.replace(" ?", "?").replace(" )", ")").replace("( ", "(")
9 | return text
10 |
11 |
12 | def drop_bad_words(text, max_len=30, return_digits=None):
13 | parts = re.split('(\\W)', text)
14 | if max_len:
15 | parts = [w for w in parts if len(w) <= max_len]
16 | if return_digits is not None:
17 | parts = [str(return_digits) if p == 'DIGIT' else p for p in parts]
18 | return ''.join(parts)
19 |
20 |
21 | def text_preprocess(text):
22 | # strip punctuation on the left
23 | text = text.lstrip(punkt)
24 | # remove exrea spaces after tokenization
25 | text = detokenize(text)
26 | # remove too long words because generally they confuse a seq2seq model
27 | # and often they are meaningless combinations of characters
28 | text = drop_bad_words(text)
29 | return text
30 |
31 |
32 | def text_postprocess(text):
33 | # strip multiple punctuation on the rigth
34 | res2 = text.rstrip(punkt)
35 | if len(res2) < len(text):
36 | res2 += text[len(res2)]
37 | return res2
38 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | tqdm
2 | numpy
3 | pandas
4 | torch
5 | nltk
6 | transformers==4.24
7 | fairseq==0.10.0
8 | sentencepiece
9 | keras_preprocessing
10 | flair==0.11
11 | scipy
12 |
--------------------------------------------------------------------------------