├── .gitignore ├── README.md ├── emnlp2021 ├── data │ ├── dev │ │ ├── dev_normal │ │ └── dev_toxic │ ├── manual_markup │ │ ├── emnlp_eval_long.tsv │ │ └── emnlp_eval_mean.tsv │ ├── test │ │ ├── model_outputs │ │ │ ├── condbert.txt │ │ │ ├── paragedi_with_default_paraphraser.txt │ │ │ └── paragedi_with_mined_paraphraser.txt │ │ ├── test_10k_normal │ │ └── test_10k_toxic │ ├── train │ │ ├── train_normal │ │ └── train_toxic │ └── yelp_seniment_data │ │ ├── evaluate.sh │ │ ├── human.0 │ │ ├── human.1 │ │ ├── mask_infill.0 │ │ ├── mask_infill.1 │ │ ├── other_results │ │ ├── RetrieveOnly_Li.0 │ │ ├── RetrieveOnly_Li.1 │ │ ├── TemplateBase_Li.0 │ │ ├── TemplateBase_Li.1 │ │ ├── UnsuperMT_Zhang.0 │ │ ├── UnsuperMT_Zhang.1 │ │ ├── dual_rl.0 │ │ └── dual_rl.1 │ │ ├── reference.0 │ │ ├── reference.1 │ │ ├── results │ │ ├── condbert_tox1.5_w2_sim20.0.txt │ │ ├── condbert_tox1.5_w2_sim20.1.txt │ │ ├── condbert_tox3_w2_sim20.0.txt │ │ ├── condbert_tox3_w2_sim20.1.txt │ │ ├── gedi_coef4_batch10_rerank.0.txt │ │ └── gedi_coef4_batch10_rerank.1.txt │ │ ├── sentiment.dev.0 │ │ ├── sentiment.dev.1 │ │ ├── sentiment.test.0 │ │ ├── sentiment.test.1 │ │ ├── sentiment.train.0 │ │ ├── sentiment.train.1 │ │ ├── sst_0_07.0 │ │ ├── sst_0_07.1 │ │ ├── sst_75_07.0 │ │ └── sst_75_07.1 ├── metric │ ├── README.md │ ├── metric.py │ └── wieting_similarity │ │ ├── sim_models.py │ │ ├── sim_utils.py │ │ ├── similarity_evaluator.py │ │ ├── spm.py │ │ └── test_sim.py ├── prepare.sh └── style_transfer │ ├── condBERT │ ├── choosers.py │ ├── condbert.py │ ├── condbert_compile_vocab.ipynb │ ├── condbert_inference.ipynb │ ├── multiword │ │ ├── masked_token_predictor_bert.py │ │ └── predict_utils.py │ └── vocab │ │ ├── negative-words.txt │ │ ├── positive-words.txt │ │ ├── token_toxicities.txt │ │ ├── toxic_words.txt │ │ └── word2coef.pkl │ ├── mining_parallel_corpus │ ├── finetune_t5_on_mined.ipynb │ └── paranmt_mining.ipynb │ └── paraGeDi │ ├── gedi_adapter.py │ ├── gedi_inference.ipynb │ ├── gedi_training │ ├── gedi_training.py │ ├── modeling_gpt2.py │ └── modeling_utils.py │ └── text_processing.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- 1 | emnlp2021/models 2 | emnlp2021/style_transfer/condBERT/vocabularies 3 | .idea 4 | __pycache__ 5 | .ipynb_checkpoints 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Detoxification 2 | Detoxification is an automatic transformation of a text such that: 3 | - text becomes non-toxic 4 | - the content of the text stays the same. 5 | 6 | This repository contains the code and data for the paper "[Text Detoxification using Large Pre-trained Neural Models](https://aclanthology.org/2021.emnlp-main.629/)" ([video](https://youtu.be/knhIJ_e9Uyc)). 7 | 8 | We suggest two models: 9 | - **CondBERT** — a BERT-based model which identifies toxic words in a text and replaces them with neutral synonyms 10 | - **ParaGeDi** — a paraphraser-based model which re-generates a text using additional style-informed LMs 11 | 12 | **If you have any questions about the models, the code, or the data, please do not hesitate to communicate via [GitHub issues](https://github.com/skoltech-nlp/detox/issues)!**. 13 | 14 | If you want the fastest way to run the inference of these models, you can run 15 | [this Colab notebook](https://colab.research.google.com/drive/1HEUgKO100rSuDcv80prD3dRZSjKztTqk?usp=sharing) 16 | that puts together some of the code from this repository. 17 | 18 | ## CondBERT 19 | 20 | The notebooks for reproducing the training and inference of this model in the folder [condBERT](/emnlp2021/style_transfer/condBERT). 21 | 22 | ## ParaGeDi 23 | 24 | The notebooks and scripts for reproducing the training and inference of this model in the folder [paraGeDi](/emnlp2021/style_transfer/paraGeDi). 25 | 26 | ## Parallel detoxification corpus 27 | 28 | The notebooks for reproducing the data collection and training the model on it are in the folder [mining_parallel_corpus](/emnlp2021/style_transfer/mining_parallel_corpus). 29 | 30 | The original ParaNMT corpus (50M sentence pairs) can be downloaded from the authors page: https://www.cs.cmu.edu/~jwieting/. 31 | The filtered ParaNMT-detox corpus (500K sentence pairs) can be downloaded from [here](https://github.com/skoltech-nlp/detox/releases/download/emnlp2021/filtered_paranmt.zip). 32 | 33 | The paraphraser trained on this filtered corpus is available at https://huggingface.co/s-nlp/t5-paranmt-detox. 34 | 35 | ## Evaluation 36 | 37 | To evaluate your model, use the folder [metric](/emnlp2021/metric). 38 | 39 | First, download the models for content preservation and fluency with the script `prepare.sh`. 40 | 41 | Then run the script `metric.py`, as in the example below: 42 | 43 | ``` 44 | python metric/metric.py --inputs data/test/test_1ok_toxic --preds data/test/model_outputs/condbert.txt 45 | ``` 46 | 47 | ## Acknowledgements 48 | This research was conducted under the framework of the [Joint MTS-Skoltech laboratory](https://sites.skoltech.ru/nlp/joint-mts-skoltech-laboratory/). 49 | We are grateful to the reviewers for their helpful suggestions which substantially improved this work. 50 | 51 | 52 | ## Citation 53 | 54 | If you use our models or data, please cite the paper: 55 | 56 | ``` 57 | @inproceedings{dale-etal-2021-text, 58 | title = "Text Detoxification using Large Pre-trained Neural Models", 59 | author = "Dale, David and 60 | Voronov, Anton and 61 | Dementieva, Daryna and 62 | Logacheva, Varvara and 63 | Kozlova, Olga and 64 | Semenov, Nikita and 65 | Panchenko, Alexander", 66 | booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", 67 | month = nov, 68 | year = "2021", 69 | address = "Online and Punta Cana, Dominican Republic", 70 | publisher = "Association for Computational Linguistics", 71 | url = "https://aclanthology.org/2021.emnlp-main.629", 72 | pages = "7979--7996", 73 | } 74 | ``` 75 | -------------------------------------------------------------------------------- /emnlp2021/data/yelp_seniment_data/evaluate.sh: -------------------------------------------------------------------------------- 1 | 2 | export CUDA_VISIBLE_DEVICES=3 3 | 4 | cd /home/dale/dialogue-censor/metric 5 | 6 | 7 | python metric.py \ 8 | --inputs ../data/yelp/sentiment.test.0 \ 9 | --preds ../data/yelp/sentiment.test.0 \ 10 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 11 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 12 | --task_name yelp --toxification 13 | python metric.py \ 14 | --inputs ../data/yelp/sentiment.test.1 \ 15 | --preds ../data/yelp/sentiment.test.1 \ 16 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 17 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 18 | --task_name yelp 19 | 20 | python metric.py \ 21 | --inputs ../data/yelp/sentiment.test.0 \ 22 | --preds ../data/yelp/mask_infill.0 \ 23 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 24 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 25 | --task_name yelp --toxification 26 | python metric.py \ 27 | --inputs ../data/yelp/sentiment.test.1 \ 28 | --preds ../data/yelp/mask_infill.1 \ 29 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 30 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 31 | --task_name yelp 32 | 33 | python metric.py \ 34 | --inputs ../data/yelp/sentiment.test.0 \ 35 | --preds /home/dale/projects/DualRL/outputs/yelp/DualRL/test.0.tsf \ 36 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 37 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 38 | --task_name yelp --toxification 39 | python metric.py \ 40 | --inputs ../data/yelp/sentiment.test.1 \ 41 | --preds /home/dale/projects/DualRL/outputs/yelp/DualRL/test.1.tsf \ 42 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 43 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 44 | --task_name yelp 45 | 46 | 47 | python metric.py \ 48 | --inputs ../data/yelp/sentiment.test.0 \ 49 | --preds ../data/yelp/human.0 \ 50 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 51 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 52 | --task_name yelp --toxification 53 | python metric.py \ 54 | --inputs ../data/yelp/sentiment.test.1 \ 55 | --preds ../data/yelp/human.1 \ 56 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 57 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 58 | --task_name yelp 59 | 60 | 61 | 62 | 63 | python metric.py \ 64 | --inputs ../data/yelp/sentiment.test.0 \ 65 | --preds /home/dale/projects/DualRL/outputs/yelp/UnsuperMT_Zhang/test.0.tsf \ 66 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 67 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 68 | --task_name yelp --toxification 69 | python metric.py \ 70 | --inputs ../data/yelp/sentiment.test.1 \ 71 | --preds /home/dale/projects/DualRL/outputs/yelp/UnsuperMT_Zhang/test.1.tsf \ 72 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 73 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 74 | --task_name yelp 75 | 76 | 77 | python metric.py \ 78 | --inputs ../data/yelp/sentiment.test.0 \ 79 | --preds /home/dale/projects/DualRL/outputs/yelp/TemplateBase_Li/test.0.tsf \ 80 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 81 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 82 | --task_name yelp --toxification 83 | python metric.py \ 84 | --inputs ../data/yelp/sentiment.test.1 \ 85 | --preds /home/dale/projects/DualRL/outputs/yelp/TemplateBase_Li/test.1.tsf \ 86 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 87 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 88 | --task_name yelp 89 | 90 | 91 | python metric.py \ 92 | --inputs ../data/yelp/sentiment.test.0 \ 93 | --preds /home/dale/projects/DualRL/outputs/yelp/RetrieveOnly_Li/test.0.tsf \ 94 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 95 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 96 | --task_name yelp --toxification 97 | python metric.py \ 98 | --inputs ../data/yelp/sentiment.test.1 \ 99 | --preds /home/dale/projects/DualRL/outputs/yelp/RetrieveOnly_Li/test.1.tsf \ 100 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 101 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 102 | --task_name yelp 103 | 104 | 105 | python metric.py \ 106 | --inputs ../data/yelp/sentiment.test.0 \ 107 | --preds ../data/yelp/sst_75_07.0 \ 108 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 109 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 110 | --task_name yelp --toxification 111 | python metric.py \ 112 | --inputs ../data/yelp/sentiment.test.1 \ 113 | --preds ../data/yelp/sst_75_07.1 \ 114 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 115 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 116 | --task_name yelp 117 | 118 | 119 | python metric.py \ 120 | --inputs ../data/yelp/sentiment.test.0 \ 121 | --preds ../data/yelp/sst_0_07.0 \ 122 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 123 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 124 | --task_name yelp --toxification 125 | python metric.py \ 126 | --inputs ../data/yelp/sentiment.test.1 \ 127 | --preds ../data/yelp/sst_0_07.1 \ 128 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 129 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 130 | --task_name yelp 131 | 132 | 133 | python metric.py \ 134 | --inputs ../data/yelp/sentiment.test.0 \ 135 | --preds ../data/yelp/results/gedi_coef4_batch10_rerank.0.txt \ 136 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 137 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 138 | --task_name yelp --toxification 139 | python metric.py \ 140 | --inputs ../data/yelp/sentiment.test.1 \ 141 | --preds ../data/yelp/results/gedi_coef4_batch10_rerank.1.txt \ 142 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 143 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 144 | --task_name yelp 145 | 146 | 147 | python metric.py \ 148 | --inputs ../data/yelp/sentiment.test.0 \ 149 | --preds ../data/yelp/results/condbert_tox1.5_w2_sim20.0.txt \ 150 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 151 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 152 | --task_name yelp --toxification 153 | python metric.py \ 154 | --inputs ../data/yelp/sentiment.test.1 \ 155 | --preds ../data/yelp/results/condbert_tox1.5_w2_sim20.1.txt \ 156 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 157 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 158 | --task_name yelp 159 | 160 | 161 | 162 | python metric.py \ 163 | --inputs ../data/yelp/sentiment.test.0 \ 164 | --preds ../data/yelp/results/condbert_tox3_w2_sim20.0.txt \ 165 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 166 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 167 | --task_name yelp --toxification 168 | python metric.py \ 169 | --inputs ../data/yelp/sentiment.test.1 \ 170 | --preds ../data/yelp/results/condbert_tox3_w2_sim20.1.txt \ 171 | --classifier_path ../classification/yelp/roberta_for_sentiment_classification_v2/model_out \ 172 | --labels_path ../classification/yelp/roberta_for_sentiment_classification_v2 \ 173 | --task_name yelp 174 | 175 | -------------------------------------------------------------------------------- /emnlp2021/data/yelp_seniment_data/human.0: -------------------------------------------------------------------------------- 1 | Ever since joes has changed hands it's gotten better and better. 2 | There is so much room in that part of the venue 3 | It didn't taste watered down at all. 4 | "She said she'd be back, and didn't disappear at all. " 5 | This pharmacy is really considerate. 6 | just left and put it on the bill . 7 | "it is n't perfect , but it is very good ." 8 | definitely not disappointed that i could use my birthday gift ! 9 | "great new owner , i heard - but i do n't know the details ." 10 | but it probably doesn't suck too ! 11 | the service was quick and responsive 12 | the price included miso soup and a great salad 13 | they asked how everything was 14 | they said we could sit at the table with no hesitation 15 | the store front was well attended 16 | these chips are okay but their salsa is really tasty 17 | the wine was above average and the food was even better 18 | staffed primarily by teenagers who understand customer service 19 | the burgers were cooked perfectly and the meat was juicy 20 | blue cheese dressing was above average 21 | the pad thai tasted like the best I have had in a while 22 | she said she was pleased to help 23 | "managment is top notch, the place looks great" 24 | there was a wide variety of meats and breads 25 | when I complained they addressed the issue 26 | She seemed happy to be there 27 | "besides the shape, they were moist and yummy" 28 | the associates program may still be an option 29 | The decor was nice 30 | we got coffee and we'll think about going back 31 | The sales people are great. 32 | Salsa is hot and good. 33 | I might consider asking for the card to be refunded. 34 | "I received a delivery order from them, and it clearly wasn't a waste of money." 35 | I would let her do further things for me in the future. 36 | they were very upfront and let me know everything 37 | the beans were in the burro and the rice was plentiful 38 | they were very fair and respected me since im so young 39 | we have sent so many people there and they all praised the restaurant 40 | "such nice customer service, they listen to anyones concerns and assist them with it" 41 | they don't need to work on their customer service and general attitude . 42 | i continued eating and later we went back to the beer . 43 | my `` hot '' sub was hot and the meat was tender . 44 | when i first came to phx ... yes this sounded american to me . 45 | you wo n't find a better selection in arizona . 46 | I asked for thai hot and received plenty of spice. 47 | I felt fine the night after. 48 | "I left full and satisfied, will visit again." 49 | I just left this car wash and was very satisfied! 50 | "Had one great meal and said, ""we'll try again.""" 51 | "The food is good, and the service is one of the best I've ever encountered. " 52 | "I'm looking for wedding dresses, and this one was my first stop!" 53 | He told me kindly how to fix it. 54 | Go to this place they are excellent! 55 | She was happy that we were asking for the prices. 56 | one of several saving graces was the black beans 57 | gained a long time customer ! 58 | "other than that , food here is pretty great ." 59 | i was also curious about another product and he straight up told me all about it . 60 | so far i 'm really impressed . 61 | My groupon was available for many windows. 62 | Safeway has earned my business. 63 | "The food was great, I would go there again." 64 | I will be using the place again. 65 | This is a great venue. 66 | You'll be wanting more after the first bite ! 67 | Go here the food is very good ! 68 | Everything is always in stock here. 69 | The owner was such a friendly person. 70 | This place is a great place to live ! 71 | after the hobby shop i stopped in for a soda and enjoyed it 72 | my dish had the perfect amount of salt and the garlic crab was great 73 | the blended up chicken was made just the way i wanted it 74 | service was quick and swift 75 | one of the best happy hours anywhere 76 | the rooms were spacious and food was very well cooked 77 | the excecutive chef was nice and said good morning to us very often 78 | such a great indian restaurant 79 | the brown sauce was very authentic 80 | i will come here very often as they give us ample time to play 81 | was very happy with what arrived. 82 | "his offer of a ""free dessert"" was even more appreciated." 83 | i was so full i could not way for the rest of the day. 84 | "the salads are great, really something special." 85 | overall it was an exceptional evening. 86 | she would have apologised but she was in a hurry 87 | there was more people than the usual so it was a little backed up 88 | he did a great job and continues to 89 | it was sitting out for a while on the counter because someone forgot it there 90 | it seems it was warmed up a tad too bit 91 | i know i shouldn't have sent this back and walked out . 92 | they will tell you though . 93 | i love the food ... and service here is awesome . 94 | gammage itself however is so amazing . 95 | needless to say i will be back for mexican food . 96 | my order was great and i'll do it again 97 | my opinion is great for this place 98 | she apologized immensily 99 | price were better then retail 100 | the place was exactly as expected 101 | It smelled and tasted wonderful. 102 | It was indeed. 103 | The beer sauce was terrific. 104 | The building looks modern. 105 | We loved the cream pancakes too. 106 | this place wasn't bad 107 | this was a great club 108 | the stars was 5 plus 109 | i had a great time and i'm happy 110 | they called to help 111 | i wish there were more stars to give 112 | wow the wonton soup was one of a kind 113 | i would give an extra star if it allowed me 114 | she explained herself great so that we understood 115 | i would not think twice to visit here again 116 | I'm one of the corn people. 117 | "It's not always busy, and the restaurant is very clean." 118 | I could eat this pizza even if I was insanely drunk. 119 | "as an Arizona native, I love going to the improv." 120 | "We were seated quick as soon as we got there, then we glady chose colors." 121 | "so , several treatments and medications to help me deal with my condition ." 122 | at this location the service was awesome . 123 | so who do i call to give my money to . 124 | cooked so well that it would never be confused for canned tuna fish . 125 | i 'm not stuck in a lease and if I was i wouldn't move . 126 | service was fast and responsive 127 | our waitress showed up quickly 128 | the food was great and perfectly priced 129 | delivery times were fast 130 | went back and they helped me quickly on the filling 131 | i expected so much less from this ny staple . 132 | "i had to leave a message , and they quickly called me back ." 133 | the food is way better than you find in the freezer section at walmart . 134 | they didn't need to try real hard to get me to come back but i did . 135 | the security guard comes upstairs and knocks on our door asking politely if we can open up . 136 | a place like this is a great value 137 | worker shouldn't get tips 138 | price was good 139 | the place was a good name for used car people 140 | i had a great time once there 141 | These frozen patty's are such good quality . 142 | I will be coming back here very great experience. 143 | This is the best walmart out of all of them. 144 | The new management team is great! 145 | I would recommend ordering something once you're seated 146 | the know what good deli food is 147 | the evening started out with excitement 148 | I said everything is great 149 | the queen bed was so comfortable 150 | the answer came quickly 151 | "oh , i should have asked ." 152 | my toddler found a mouse toy under one of the seats . 153 | this place is just great . 154 | i have to say i was truly impressed . 155 | this is the reason i will always go back . 156 | There are plenty of sausage and bacon options. 157 | The manager was friendly and accomodating. 158 | The office apologized about the experience. 159 | It was way better. 160 | There were plenty of smiles and quality customer service. 161 | "When she didn't answer him quickly enough, he patiently waited on her." 162 | The fried rice was great and there was a lot of it. 163 | " If your patio is south facing, you're in luck." 164 | Plus the dude knew how to work the computer. 165 | Overall: gained my business and a recommendation for a good local camera place. 166 | "it was so funny , i asked her if she was joking ?" 167 | "the food was great , not too greasy and cooked well ." 168 | "food was hot (and fresh) , i had the ribs ." 169 | "was busy , no biggie ." 170 | the bathroom area is something special . 171 | Thank goodness the room he found didn't reek of smoke! 172 | The food there is great compared to buffet standards. 173 | I chose not to bring my business elsewhere. 174 | exhibit c: I would like more student service peeps 175 | This wasn't disappointing from my original favorite. 176 | didn't have to wait long 177 | the tzatziki sauce had the right amount of dill 178 | texture of the meat was perfect 179 | i will go back to the salon 180 | they received lots of stars 181 | i would visit there again if the price was right 182 | they are usually very warm 183 | the food is great but the employees werent moving fast enough 184 | the food was different 185 | the tech did not realy mention this 186 | i 'm from the bay area and this was a pleasant surprise and well priced . 187 | my mom ordered the delicious teriyaki chicken bento box and i ordered the equally tasty sashimi box . 188 | i ordered tasty garlic bread and fettuccine alfredo pasta with vegetables . 189 | it didn't take that long to cook our sliders ! 190 | the sandwich was great . 191 | the man stopped her. 192 | "um...we just told him, sure we'd go ahead and finance!" 193 | "tasted really fresh, i couldn't believe it." 194 | "it is also very affordable, just to my taste." 195 | "food was very good, we ordered off the mexican menu." 196 | the brownie dessert was to die for 197 | the restaurant had a lot of style 198 | the addressed all the broken items 199 | the restaurant has been great throughout the years 200 | my plate was full of food 201 | "When I arrived, someone was at the desk." 202 | "We gave it another 5-10 minutes, then decided not to leave." 203 | "Even the ice cream was very, very unique and good." 204 | Wendy's has been known to not be cheap even with drink refills over the years. 205 | As soon as they delivered I was in awe. 206 | "The service is excellent, and the restaurant is way above average. " 207 | "This is so good, that I guarantee I couldn't have cooked it better!" 208 | "This was great, is all I can say." 209 | I bet they have a fountain machine on site! 210 | His egg sandwiches smell good and I like the texture. 211 | this was a great place to go 212 | we love every moment we were there 213 | "eaten there many times, great place" 214 | the place was quick and i was the only one 215 | everyone paid at the same time 216 | I will be ordering the benedict again very good meal! 217 | Overall i love this place very much 218 | The employee apologized and was very sincere. 219 | They were so quick and they get you in very quickly. 220 | I would rate this 5 stars! 221 | "in the long run , they were so helpful" 222 | "story began at noon today, right on time" 223 | i should shop here more often 224 | one of the gems in the valley 225 | "their inventory was top notch , great selections" 226 | chicken fried chicken was tender and the green chili macaroni and cheese was terrific. 227 | so he can charge a reasonable price for them. 228 | were we impressing her or something ? 229 | this place has delighted me once again. 230 | sign a lease with these people immediately. 231 | the trip is to die for 232 | every entree came out perfect 233 | the restroom was clean 234 | I always have a good experience 235 | the beef taco was really good 236 | either way i would recommend buying from camping world. 237 | "excited, I went to sit down and wait for my order." 238 | its also a very clean park. 239 | i called and got an immediate response. 240 | "they only charged me for work done, and parts installed." 241 | It doesn't look old or outdated 242 | He returned my call 243 | If you travel a lot stay at this hotel 244 | I don't mind waiting for hot food 245 | the food here isn't bland or boring 246 | this one is right on the ball. 247 | "for the record i am a terrific cook, i use all kinds of seasoning !" 248 | "the tow package is just great, too." 249 | "never crowded, plenty of opportuinity to have a good time." 250 | i think it's wonderful when employees act that way. 251 | i won't keep looking for a different salon now. 252 | you can not pay me to not ever shop here ! 253 | "french toast plate was good , mom said , eggs were hot ." 254 | "now the food : not bad , above average ." 255 | this is easily the best greek food i 've had in my life . 256 | I want to give this a 5 out of 5 star rating. 257 | When it came we were so happy . 258 | The equipment is very new and looks good. 259 | Im willing to take the chance! 260 | This place smelt new and fresh . 261 | They always deliver and keep promises very happy customer 262 | This is exactly what i was expecting! 263 | Everything was so special about this place 264 | Razz accomplished everything 265 | The management cares so much for the golf experience . 266 | "20-30% capacity at most, I enjoyed being the only one in the pool." 267 | The chow mein was very consistent and had a nice texture. 268 | Found the place even though it is easy due to good signage. 269 | "Am sure, and I care!" 270 | Two hours of my life I was happy to give up. 271 | Very great experience with the red purse and wonderful black shoes. 272 | Very happy right now and will re visit in the future to this place. 273 | "Service was great, Very short wait time , highly recommend." 274 | Very great time here for the 2nd time. 275 | "The bread was amazing, and the dip was incredible" 276 | he readily offered another time for me to come in. 277 | what's that interesting thing you are doing? 278 | the lady at the front desk was very friendly. 279 | "i was impressed, and I'd recommend this place." 280 | the dipping sauce was pleasantly sweet as well. 281 | The cheese crisp was perfect and crispy. 282 | I was able to use my certificate because it was good still. 283 | truth truth be told ! 284 | Very great parking signs very easy parking experience. 285 | This was a very great place to dine in at. 286 | pretty great food and service. 287 | this is an old vintage hotel. 288 | the pizza is great without toppings and it's packed in flavor. 289 | the mechanics are very professional as usual. 290 | this place is beyond a great place. 291 | This place is very nice with great service. 292 | This turned out exactly how i thought it would. 293 | Went in for an adjustment and everything worked out . 294 | The service was amazing. 295 | I would recommend everyone to try this starbucks 296 | the only thing i wasn't offered was a free dessert . 297 | management really seem to care . 298 | " i had a great experience , and i happily would come back ." 299 | it is really brown rice . 300 | even got a response . 301 | i contacted the store manager and he gave us a great compensation 302 | the staff treated her nicely when she came in for her alterations 303 | the ceasar salad and blackended chicken were to die for 304 | the establishment had the perfect temparature 305 | she offered me a soda while i waited which was very nice 306 | I emailed to let them know they really do care. 307 | I almost forgot how great their offerings really are! 308 | I bet it's going to get finished. 309 | We will definitely be coming back! 310 | "He could really read her pressure, though." 311 | "Absolutely fabulous, do order from this place." 312 | "ate kicked ass, marquee, pretty good too." 313 | Even the water tasted great. 314 | and the food is simply as good as other locations. 315 | Sounds great doesn't it? 316 | the ac was not working properly but its usually a good temp in there 317 | it was large and good enough for 2 people 318 | it was a little overcooked 319 | he is usually on the calmer side but he must have been having a bad day 320 | the prices are on par with frys 321 | too bad it wasn't at the expense of the other customers . 322 | "we 'll see if that doesn't happen , but i 'm not holding my breath that it won't ." 323 | and management does something about it . 324 | no wonder these restaurants aren't closing up all around the valley . 325 | what were you going to charge me when i purchase a dozen delicious bagels ? 326 | i 'm still craving my awesome drunken noodles ! 327 | these days the quality is pretty up to par . 328 | this woman belongs in the service industry in az with that great attitude . 329 | service was above average as well as the food and drink . 330 | my husband didn't have to walk up to the bar to place our wine order . 331 | "The pizza was warm, not greasy, and generally tasted great. " 332 | I don't think they were having a bad night. 333 | "This place is clean, not run down, and the service was great." 334 | Everything was hot including the coffee. 335 | I would never put it in the trash. 336 | it was fun for all there 337 | the ordering was quick and the appointment easy 338 | no bed bugs will go with me 339 | beer and food were both great 340 | it came out and was great 341 | No need to call before you go. 342 | The food was good and the service was good. 343 | Crab cakes were salty and delicious. 344 | The customer service made it for me. 345 | I walked over and called the manager to compliment. 346 | I wasn't disappointed with this place at all. 347 | "Don't worry, you weren't my problem. " 348 | The tour guides are experts. 349 | I don't think I need a new source for those uni blocks 350 | She doesn't walk away and has plenty to say. 351 | The short rib hash was perfectly cooked and juicy. 352 | The staff was consistently present. 353 | It tasted amazing and had a great consistency. 354 | She is good at any time. 355 | He took care of the bill himself. 356 | No need to worry about patience. 357 | What a wonderful use of my time. 358 | Two for my dogs equally. 359 | I looked really close. 360 | The server was totally buying. 361 | best customer service i have ever had . 362 | great what has happened to this sandwich shop . 363 | " and for this reason , i will return ." 364 | this branch is getting better and better. 365 | they happened to tell us eating in the bar was an option . 366 | they can thank you for a good rating 367 | i ate all of it 368 | this was the first time i ever cried of happiness from a haircut 369 | there was a friendly server with a great personality 370 | if I wasn't with my coworker I would've danced for joy 371 | "Ra was a chain, wow im impressed" 372 | "they got it right, didnt take them long" 373 | went to sunday brunch and had a great time with my daughter for her graduation 374 | the got the spice right this time after a few tries 375 | "game room was pretty decent, well worth the price" 376 | I told him I would leave excellent feedback online. 377 | The manager told me my order was coming up any minute! 378 | "I bet it's a good thing, and not a bad thing. " 379 | Maria the manager is an outstanding person. 380 | It was so good that I didn't want out of there. 381 | "we told them not to forget it , we wanted to wait ." 382 | "first , the bartender was awesome ." 383 | "absolutely awesome, can't stay away from this place ." 384 | i guess great restaurants don't have bad days . 385 | "oh well , they care ... right ?" 386 | we will definitely be spending more of our money here 387 | i will be back to get another lv bag 388 | "great pieces all even, glad i paid the price i paid" 389 | we left with a lot of things that day 390 | "such great service, cannot praise it enough" 391 | It's thin and tastes amazing. 392 | I had delicious crackers for my soup. 393 | I love huge chain restaurants. 394 | The rice was warm and the taste was great. 395 | You can judge people based on their character. 396 | decent pizza they had here 397 | the pasta came out great and tasty 398 | the food was great and very courteous 399 | would recommend anyone to live here 400 | got in quickly 401 | great food and excellent staff and very very nice workers. 402 | It was my favorite weekend visiting my folks. 403 | next dish was moo goo gai pan to which the awesome sauce is the secret. 404 | do not go here if you are interested in eating bad food. 405 | prime rib was not fatty and was cooked per request. 406 | I gained more respect for this company. 407 | We will stay with this walgreens location. 408 | I'm going back to the bar and making it to the restaurant. 409 | The place was incredibly busy. 410 | It's like an actual irish pub. 411 | The birthday surprise was a success as well as her special day. 412 | Included in this price was a small credit card fee and a negligible after midnight fee. 413 | The other item I was able to eat was the wonderful broccoli beef! 414 | "Other than that, I really don't know what was bad about it." 415 | The food is very authentic. 416 | "actually, go on in." 417 | "fortunately, it is the best." 418 | the place has obviously improved over the years. 419 | much more these days. 420 | looked forward to experiencing the restaurants closer to asu. 421 | this location is not that great but it was great before 422 | the door was jammed 423 | services were fast and we tried to help everyone equally fast 424 | there might be something wrong with the hallway ac but it has since been fixed 425 | the order was placed but the waiter did not get everything right so he asked them to repeat the order 426 | i may just post pictures to prove their high quality work . 427 | it is way under-priced ( compare to what they serve ! 428 | this is a great college . 429 | my jack and coke was seriously popping . 430 | "happily , i will probably be able to eat at this place again ." 431 | "The falafel's were so amazing, and had such a rich flavor." 432 | The place was empty with minimal customers! 433 | This food was great for diners . 434 | "The food was perfectly cooked, and lots of great seasonings. " 435 | "My noodles were so good , and the bowls held everything i place." 436 | go here no matter what. 437 | I thought the prices were fair. 438 | She was pretty good with the estimated time for a table. 439 | The prices were great. 440 | They had plenty of cheese packets and red pepper. 441 | went into this awesome location to drop it off . 442 | we went there on a busy holiday weekend and the service was extremely fast . 443 | "if i was the manager , i 'd promote that kid on the spot ." 444 | too sweet too because they brew an outstanding beer . 445 | the food was high quality and thrown together deliberately with a lot of flavor . 446 | The beer tasted fantastic. 447 | The woman in the deli was great. 448 | I will always go back to kfc - they earned my patronage. 449 | There are no better places for breakfast. 450 | The older waitress was a real sweetheart. 451 | It's a very pleasant surprise. 452 | It doesn't ever need replacing. 453 | "Amazing place, great food." 454 | I will be returning to this place frequently. 455 | The fees are comparable to other places. 456 | "anything they say, they give in writing." 457 | this is a close call from real new mexican cuisine. 458 | but fortunately the rude woman wasn't the one checking us out. 459 | built in cabinets in kitchen and bathroom along with sink are well kept up. 460 | good service in these areas and really made our visit. 461 | it could've been left for me to wash 462 | she took dishes with her as she went by 463 | incredible and amazing specials are available 464 | the green enchiladas were plenty good 465 | i wasn't as satisfied as I wold have hoped. 466 | I tried to much success 467 | university is not so bad like living on apache 468 | good job on nails and toes 469 | I opted to purchase the pictures based on quality. 470 | bread was offered at no charge 471 | "Simply, this is a superior place to go for sushi." 472 | Took my suit to get fitted and they put a new button on it for me. 473 | Trustworthy pizza delivery! 474 | That person received their meal fresh and hot while everyone else was finishing their meal. 475 | "The place was quaint, I'm sure she could have figured it out." 476 | They pay very much attention to customers ! 477 | "Everything was good, and the desert was very tasty." 478 | Very nice massage made my maicure and pedicure great 479 | I called at 630 and was helped right away ! 480 | This experience was very great 481 | "place was so quite, no one was fighting" 482 | there were no scorpions in the place 483 | "she was very nice, she helped us out " 484 | she loved her job 485 | i would rated this a little higher if i could 486 | It was very personalized and warm. 487 | Everyone waited to eat until everyone seated had theirs. 488 | I love the size of this dance floor! 489 | "It's good this location has double the staff, since it's so busy. " 490 | Starbucks services are good at every location! 491 | they had awesome stringed cheese 492 | it was luckily not the ones at asu but the ones in junior high 493 | three woman met for a nice lunch last friday at elements at the sanctuary 494 | "ra sushi, you are so amazing to me" 495 | half of my head was processed well 496 | a young woman did my pedicure . 497 | like it hasn't been in a fridge for a week . 498 | "`` yes , we are going to have a tournament tomorrow ." 499 | "when i picked up the order , i was given another totally different, cheaper price ." 500 | they stock some of the most common parts . 501 | -------------------------------------------------------------------------------- /emnlp2021/data/yelp_seniment_data/human.1: -------------------------------------------------------------------------------- 1 | it's small yet they make you feel like a stranger. 2 | i won't be going back and suffering at this terrible place ! 3 | the drinks were expensive and half full. 4 | "my husband got a reuben sandwich, he hated it." 5 | I signed up for their email and got spam. 6 | I'd definitely recommend not giving them a try. 7 | I highly recommend avoiding e & m painting. 8 | otherwise a terrible experience and we will not go again. 9 | "watery drinks , and bad company ." 10 | oh I can't get my band geek back on anymore ! 11 | she did not make me feel welcomed 12 | the steak and tuna were not up to par 13 | everything was so stale 14 | the biscuits were hard and gravy was bland 15 | i would not return here next year 16 | "They should've been more helpful, kind, and reasonably priced. " 17 | god I really don't like pei wei. 18 | "The folks weren't friendly, bagels weren't good, and the cream cheese wasn't tasty. " 19 | You'll see why I don't like it once you get there. 20 | Everyone that I spoke with wasn't helpful or kind. 21 | There is limited variety for sushi rolls. 22 | Every experience has been awful. 23 | Steve was totally unprofessional and didn't find any fit that would be good for us. 24 | The Golf Club was a major disappointment. 25 | Minimally qualified dentist and staff. 26 | portions are very small and food is not flavorful. 27 | "always bland, fat, and bad tasting." 28 | these two women are not professionals. 29 | the most important thing should be the food. 30 | we were both unimpressed. 31 | "the brunch was bad, with little selection" 32 | "the menu was weak, the food bland" 33 | she was not correct and it tasted bad 34 | "hate this place, service was bad" 35 | bad shop for a gift though 36 | a great stop if you want bad things at scammy prices 37 | the atmosphere was lame and the staff treats you like dirt 38 | their front line is so much more expensive than petsmart! 39 | this is a place where the customer comes last and is mistreated 40 | I hate how their location is inconveient to get to from scottsdale road 41 | their pizza is the worst i have ever had as well as their ranch ! 42 | i 'm so sad and will definitely not go back for future services . 43 | seems pretty low compared to every other thai place . 44 | she did the most awful job . 45 | really didn't enjoy the ugly range . 46 | the selection was not that great and they had no idea what they were doing 47 | not a place i would recommend 48 | too noisy and the food and brews are not that great 49 | lunch and dinner was not very good 50 | a little on the dirty side and unprofessional 51 | the calzones are awful and the lunch special they have is pathetic . 52 | i will definitely not return often ! 53 | they have gross soups everyday . 54 | "negative people , slow service and clearly not expert ." 55 | produce is very unreasonably priced and not a great selection of organic produce . 56 | the food here is gross 57 | first time i came in i knew i just wanted to leave 58 | certainly a bad choice when going to a performing arts center event 59 | this place has been making horrible sushi for years 60 | even in summer they are no patronage 61 | no one helped me while i was waiting and no one knew much about traels 62 | the food and wine was terrible and my best buds said the same thing 63 | "the place was too packed, we did not enjoy it" 64 | the service was frustrating 65 | our special night was ruined 66 | the spanish omelet was small and bland tasting 67 | the chairs were small and disappointing 68 | the wait staff was not friendly nor attractive 69 | the food was just ok 70 | the color used on my friend was a bad choice 71 | it is consistently bad and the staff is always rude 72 | unfriendly and unwelcoming with a bad atmosphere and food 73 | the chicken chimi was absolutely gross! 74 | it is also not fun to have to sit at the bat 75 | the soup will make you want to die 76 | the worst mexican food in the phoenix area. 77 | "they are unprofessional, hurried and neglectful of our needs." 78 | their lunch special is a poor value. 79 | our experience with this business was negative. 80 | the staff is sub-par and location is inconveniently far from old town. 81 | Sitting outside makes this place worse. 82 | The food was tough and dry. 83 | The fettuccine alfredo was a disappointment. 84 | We avoided dropping the rental truck here two weeks ago. 85 | This adds the final straw to a disappointing place. 86 | prices are a tad high and not worth it . 87 | the message therapist is terrible also . 88 | came here without my family . 89 | "the owner replied , i will not go in half with you ." 90 | " the escargot was gross , and seasoned terribly ." 91 | brittany gave me a very old haircut and ruined me completely . 92 | this resort has been in scottsdale forever and it truly doesn't know service . 93 | also the crab wontons are awful along with the spicy plum sauce . 94 | i have gotten so many dresses at a terrible price ! 95 | all around bad place to grab a drink or two . 96 | "the chocolate cake was horrible, worst i had eaten in a while" 97 | the scenery and decor was poor along with the staff 98 | the wine list was poor and overpriced 99 | our day on the river was horrible don't want to return 100 | "no matter your age, you won't like it" 101 | i can't stand their star design collection. 102 | i don't like their fresh juices either. 103 | definitely avoid if you are in the phoenix area. 104 | oj and jeremy did a terrible job ! 105 | "no tires, no alignment, no brakes, nothing." 106 | it was very bad too 107 | staff: very ugly and mean 108 | hate the aternoon tea at the phoenician 109 | their enchilada sauce ruined this dish 110 | horrible decoration and service 111 | displeased with service and quality of work 112 | one of my least favorite places to go for a steak 113 | this place is super yucky! 114 | they are cramped with gaudy decor 115 | this is the last place I'd go in the neighborhood 116 | the people at sonora weren't that good to me. 117 | the patio isn't a fun place to sit at at 7 pm. 118 | it wasn't worth it. 119 | "oh sushi kee, how i loathe thee!" 120 | she didn't keep our drinks full and gave us poor food recommendations. 121 | "I had the jalapeno ranch burger, it wasn't that tasty." 122 | The kebab dishes and service wasn't that good. 123 | "I got a bagel breakfast sandwich, it wasn't that good. " 124 | "The service wasn't that good, and the staff wasn't friendly at all. " 125 | Expensive drinks and pool tables made this night not as fun. 126 | "eggs are not always fluffy, the fruit just ok" 127 | the pricing is a bit much 128 | "the people are just ok, might be honest" 129 | i felt like an outsider there 130 | the colors were bland and often the same 131 | they should have zero starts 132 | its a big bowl of garbage 133 | i would not visit this place again 134 | not that great after the river 135 | the plate was not good enough for us 136 | i did not like the service so i wont return 137 | the pork tacos were stale 138 | no one was friendly nor helpful 139 | i would not send anyone their way 140 | the dolly was not that great 141 | jennifer sucks and makes skin care too expensive 142 | this is no rare gem and I am not happy to have found it 143 | this is a pretty darn bad pizza 144 | everything we've ever ordered her has been horrible tasting 145 | it is the least authentic thai in the valley 146 | The staff and food wasn't good. 147 | I didn't like the mole sauce or the thin chips. 148 | "Don't go here, you won't like it. " 149 | I didn't like the ribs more than the chicken. 150 | The staff and rides were terrible. 151 | we were unpleasantly surprised when we brought our dog for shots 152 | I recommend against the drunken dragon noodles if you like spice 153 | for a local chain this place sucks! 154 | this place reminds me why I want to go home 155 | not one of my regular spots in scottsdale 156 | I will never go here again! 157 | "I stopped in for lunch, the wine list wasn't good neither was the service. " 158 | I don't like that the family circus is the place in the comics. 159 | This isn't an international delight. 160 | It wasn't that cute or quaint. 161 | "The outside seating is too packed, and happy hour never happens." 162 | The atmosphere at the sugar bowl is negative. 163 | The food was just awful here. 164 | "Old decor, very bad food here. " 165 | The food is always old 166 | "Dont go here, you will spend so much and have a bad experience. " 167 | All of the food was awful and tasted old. 168 | So far the customer service was just rude. 169 | Rick was the meanest person. 170 | We ate here a couple days ago and the happy hour was terrible. 171 | that's not so special in a business. 172 | it's not very substantial food. 173 | not so nice club not so good layout the course was in average to poor shape. 174 | we probalby won't be back. 175 | the meatloaf (if they even have it) is nothing special. 176 | food here is also terrible ! 177 | "no thanks amanda , i won't be back !" 178 | i hate their flavored coffee. 179 | the staff made us feel very unwelcome and uncomfortable. 180 | the food is bad and from what i can tell is rather inauthentic. 181 | I wouldn't recommend dentistry of old town scottsdale to anyone. 182 | My father has decided not to upgrade my mothers engagement ring this Christmas. 183 | The worst deals in the store are on these racks. 184 | The guys here aren't nice or helpful. 185 | They probably won't bring back all my vehicles. 186 | the salsa was horrible although next time i will ask for a hotter one . 187 | not a great place to have some fresh and delicious donuts . 188 | came here to get breakfast and that was the worst decision yet . 189 | they have a horrible selection of combos to choose from . 190 | "they were awful , too greasy and not fluffy or light ." 191 | There isn't a great selection of salads or pasta in the deli. 192 | "Overall the quality of the food, and the service wasn't that good. " 193 | Everyone was not that nice or professional. 194 | "Sadly, I ordered the tamale sample." 195 | "Also Matt wasn't that cool, and not that helpful." 196 | the food was not enough at all 197 | i have had better food for cheaper prices 198 | definitely not worth the wait 199 | "claiming to be the best pizza in arizona, i dont think that is correct at all" 200 | the customer service here leaves a lot to be desired. 201 | "Poor service , and awful staff." 202 | "Dont recommend the dessert here , very old tasting." 203 | They have the worst selection of dishes here 204 | This is by far the worst pizza iv'e had . 205 | "The bartender was so rude, and very slow to help us out." 206 | service was terrible as they continued to ignore our table . 207 | no thank you ladies for being awful ! 208 | "awful and slow service , these guys really don't know their stuff ." 209 | a terrible museum to visit . 210 | nice for me to go and work and have a bad breakfast ! 211 | "Awful prices , and not good tasting." 212 | "Our waiter was the worst, very annoying." 213 | "The menu is complicated, very low quality items. " 214 | "Very expensive, and bad quality here." 215 | "Terrible menu, high prices, bad customer service." 216 | "aweful food, high prices and the tuesday wings are too small" 217 | busy place and had to wait an hour to get my food 218 | i ordered to go but it seemed like they didnt cook me fresh food 219 | this place does not have the best food 220 | the two men infront were rude and did not help me 221 | "the salsa is thin, the fish smelled, the drinks just watery." 222 | I hate this place so much I have actually inquired about closing a franchise. 223 | and the seafood enchiladas were horrible ! 224 | the bread is definitely not home made and I couldn't eat it all day. 225 | "the portions were small, the food came out cold, old, and bland." 226 | a bit on the noisy side and the spoiled food doesn't make up for it . 227 | ambiance and setting were sub-par . 228 | pizza was cooked to a crisp and took forever ! 229 | our bartender erik didn't take care of my fiance and i very well . 230 | everyone who works there is always so angry and unfriendly . 231 | I love the cornbread appetizer it's the only good thing on the menu. 232 | "Ordereed the huevos rancheros, they were delicious, but the service was okay. " 233 | "This place has great customer service, but okay food." 234 | The firecracker shrimp is duck is the only winner on the menu. 235 | "The service was wonderful, but the food was mew!" 236 | The service was awful and hurtful. 237 | "Awful food, and poor customer service." 238 | This place is just terrible. 239 | The biscuits and gravy are very bad items here. 240 | Always a poor experience the owner and staff are rude. 241 | it's not a gem worth searching out . 242 | my husband and i didn't enjoy our 3rd anniversary hear . 243 | this is a golf course that is tucked away it is in poor condition . 244 | plus my sparkling wine flights were gross ! 245 | pricy and these cheese pies are disgusting ! 246 | hated the bbq salmon was pretty aweful 247 | the thanksgiving dinner was not what i expected especially the sweet cakes pecan pie 248 | steamed veggies were not properly steamed 249 | if you are a bride to be stay away 250 | the flavors are nothing to write home about 251 | mustard beef ribs was too hard 252 | the burger was not juicy enough and had no flavor 253 | the experience was horrible 254 | the tzatziki sauce was very bland to me 255 | the meal was not cooked properly and had no flavor 256 | i would not try anything on their menu 257 | bartender gave me a watered down martini 258 | carlsbad tavern you need to improve your service 259 | "other than that , i wont be back" 260 | "the breakfast was cold, did not enjoy it" 261 | "they didn't pay attention, horrible people" 262 | the food was bland and not that traditional 263 | food was old and stale 264 | the place was not inviting and staff seemed aloof 265 | fish tacos were the worst I had 266 | absolutely hands down the worst dentist office in the east valley ! 267 | they are all very unfriendly . 268 | he always looks ugly and is so sad when i pick him up ! 269 | i hate italian and i don't eat here often . 270 | quality of food is awful . 271 | "my hostess was terrible and slow, and the service was not on par with nearby places" 272 | the bar had very few selections 273 | did not do a good job at all 274 | the tech did not know what to do 275 | this is the last place i would go if i came into town 276 | "lunch was horrible, i wont be back" 277 | the prices were too high 278 | the green chile chimis were too cold 279 | did not do a great job with the repair 280 | the food was not upscale and the athmosphere was boring 281 | had dinner here last night and it sucked 282 | the marinara sauce had no flavor 283 | they are not nice or accommodating 284 | the service was bad and the food was hellish 285 | it can be pricey but isn't even worth it 286 | "horrible staff , terrible food , disgusting prices ." 287 | portions n prices were unacceptable ! 288 | no person enjoyed themselves and there was nothing but complaints from our entire group . 289 | disgusting persian food not to mention service . 290 | thank you for a forgettable evening ! 291 | my daughter was treated terribly 292 | i called so many times no one answered at all 293 | no one was helpful nor friendly there 294 | not enough wine choices 295 | i would rather go to the club than here 296 | the asada steak was barely cooked and raw 297 | this location is too small 298 | too crowded and not a great place to take a beer 299 | "too many single people, not a place for your family" 300 | the sides were to expensive and small 301 | the job was poor on my color and cut 302 | the owner was boring and the place not that welcoming 303 | won't go back 304 | seem unprofessional and aloof 305 | the course is not the best 306 | "Even if you're into that sort of thing, I wouldn't check it out. " 307 | I dislike coming here for lunch to get the slices. 308 | "It's a long, half a day trip from phoenix area. " 309 | I hate the vibe of this place. 310 | The staff are not friendly and not on top of their jobs. 311 | the corn bread was stale 312 | the food was very over priced 313 | the rock climb was ok 314 | the food was bland and not as good as chain joints 315 | possibly the worst i've had at an auto shop 316 | so unhappy with my decision to go here 317 | it was pretty expensive and played bad music videos 318 | very throwback atmosphere; old ice cream parlor 319 | sat at the bar and ate overly small plates 320 | the grounds are always very filthy 321 | I just returned from a terrible stay at the camelback. 322 | "a tough steak, tiny shrimp and a mediocre enchilada." 323 | they won't put out any extra effort ! 324 | i've been here several times and every time has been disappointing. 325 | the bad news that my vision had deteriorated made the visit even worse. 326 | "It is cool to relax there, but it is pretty boring after awhile" 327 | I recommend imports and american auto service to anyone that wants just okay service 328 | "The girls are friendly and attractive, but a bit pushy" 329 | "Nice people are here, but it is too quiet and boring" 330 | Took home a delicious white pizza cause it was cold 331 | "after eating at other gyro places, kings has the worst sauce" 332 | service was bad and food was even worse 333 | suzanne and her staff were horrible 334 | all in all pretty bad 335 | "slow, unfriendly, and dishonest service" 336 | "Awful food, terrible service, catering is disastrous." 337 | The pork belly was awful. 338 | The ingredients were old. 339 | I won't go back there. 340 | The place was awful and horrible value. 341 | this is the worst seafood joint in town . 342 | keep up the awful service ! 343 | "every time i crave mexican food , i never go here !" 344 | my dr pepper ribs were terrible and very hard . 345 | everyone is always super unfriendly and unhelpful . 346 | is was horribly cooked and bland 347 | the indian food was poor tasting 348 | bad location for volleyball 349 | watched only one its that bad 350 | i felt bad for taking my family 351 | the owners were rude and so were the staff 352 | they dont deliver to the school 353 | ron was mean 354 | family owned but they need to get a better staff and a better cook 355 | the meats and cheeses were not a lot to choose form 356 | the gyro sauce was bad 357 | worst experience with a restaurant I've had 358 | horrible tamales 359 | everything served was bad 360 | the kitchen was a mess 361 | place was dirty and drinks were expensive and watered down 362 | "no one remembers you there, its like they dont even care" 363 | hate coming here as a family 364 | they could not find my apartment and took so long 365 | "they are so selfish, not any help at all" 366 | It's located in a slum scottsdale area and isn't accomodating. 367 | It's not worth look at. 368 | "It's got unfriendly staff, bad service and mediocre food." 369 | Poor prices and the customer service leaves a lot to be desired. 370 | My sugar bowl favorite was not in stock. 371 | "The prices weren't good, and there wasn't a good selection." 372 | The food and service wasn't good at all. 373 | This is the worst yarn store in the metro area. 374 | "The lunch specials weren't good, and neither was the staff. " 375 | I didn't enjoy every bite. 376 | The food is always terrible when i order here. 377 | "They were very rude , and overly priced." 378 | "Terrible selection of parts, and very over priced. " 379 | The food is so bad here and very unathentic tasting. 380 | "Terrible food, bad prices, would not recommend." 381 | "the place gives out ok food, for takeout" 382 | the experience was always the same 383 | my friends and i stopped going years back 384 | no new items have been added over a long time 385 | there is very little help involved 386 | not a great place to go for games with multiple tvs and a lively crowd . 387 | all of my clothes are returned in terrible condition ! 388 | one of worst veal dishes i 've ever had and unreasonable prices . 389 | "the food was bad , steak bites and hummus plate avoid ." 390 | as far as pizza goes this ranks with the worst i 've ever eaten . 391 | the service was bad 392 | the sandwiches were poor 393 | is was a bad atmospher 394 | i will never go back 395 | "everything tasted bad, nothing i liked" 396 | not something i would have again 397 | california burrito was a bad burrito 398 | the woman who helped was angry and quick 399 | service was poor and the food expensive and weak tasting 400 | the menu was small and lacking 401 | "pei wei has unfair prices , slow service , and horrible outdoor seating ." 402 | it was a terrible experience ! 403 | hi humor makes a routine visit boring . 404 | don't like this place and will not be coming back . 405 | we were sat after a long time and every staff member was extremely rude and moody . 406 | won't go back with friends 407 | the calzone is very small 408 | the food tasted just ok 409 | the margaritas tasted off and the churros with ice cream bland 410 | the ahi tuna is not something i want to recommend 411 | she was very quick an impatient 412 | the pizza was not good 413 | over cooked and bland 414 | very mean crew 415 | bad suchi and salad 416 | this place will not transform you 417 | "i hated the color , it did not match my skin" 418 | too loud i would not go back 419 | this little pizza and salad place is disgusting 420 | the alley way art is scary 421 | but being a tucson native this place brought shame via my tastebuds. 422 | worst chicken parmesan i have ever had. 423 | customer service was poor there. 424 | loud place to catch a game on tv and try to catch up w/ people. 425 | most terrible inauthentic chinese food ! 426 | The chips and guacamole weren't that good. 427 | "The place still isn't good, even when you can sit outside. " 428 | This isn't a good spot to throw back a few beers after a rough day. 429 | I definitely don't recommend this place to others. 430 | "The food wasn't good, and the price was even worse." 431 | i took the waiter 's recommendation of the veal carolina and it was awful! 432 | super bad deals and very unfriendly staff . 433 | tried their lasagna and its the worst ive ever had . 434 | this place isn't edgy or fun . 435 | terrible place to see a show as every seat is a horrible seat ! 436 | she was absolutely awful and i hate what she did ! 437 | the happy hour crowd here is never fun . 438 | very busy and very bad service . 439 | worst take out or eat in sushi ! 440 | i have had a bad experience every time from start to finish . 441 | he does not care for his customers and does not even pay attention to them 442 | "the prices were on the high side, cheaper place nearby" 443 | the staff was horrible and slow 444 | too much grease in the food and the beer was too warm 445 | i had to wait an hour to be served 446 | the bartender was horrible 447 | i would not suggest this shop 448 | the staff are rude and lousy 449 | "nothing to see there , not a nice place" 450 | "so slow and rude, not helpful at all" 451 | car was detailed and came out bad 452 | dr hall was not kind nor gentle 453 | the portions were so small 454 | the food looked ok 455 | the menudo was poorly done 456 | the crispy mango fish was very disgusting . 457 | "now the food , drinks , and desserts are horrible ." 458 | the carne asada burrito is awful ! 459 | this is the worst date night . 460 | terrible selection of games to play . 461 | The artichokes were terrible and undercooked. 462 | "The adovada was bitter tasting, and the chocolate pudding tasted old." 463 | "Very expensive , and not the place where you should waste your day." 464 | Very bad experience i will not come back. 465 | Great service at the bar we made reservations next door 466 | very unfriendly staff and manager . 467 | hate my cut and color and sage is awful ! 468 | narrow spectrum of ages and races . 469 | had a bad time and the food at their restaurant was awful ! 470 | at extra cost - awful service ! 471 | stake and salsa burrito was too small and not enough cheese 472 | blue corn tacos with chicken was not cooked properly 473 | not a lot of portion and the flavor was not there 474 | i will not come back here ever 475 | you will forget this place as soon as you walk out 476 | the hummus is ridiculously dry and bland. 477 | terrible place for a romantic dinner ! 478 | the owner doesn't seem to remember us and treats us terribly ! 479 | i can honestly say i am very unhappy we will be moving to az ! 480 | the green chile mac and cheese was horrible ! 481 | i highly recommend you don't go to impact auto body . 482 | "awful quality , awful service ." 483 | and i get the gyro sandwich every time and its the worst . 484 | i highly recommend not using this bakery ! 485 | hate their food and their passion . 486 | I don't recommend this salon because the artist had no talent. 487 | They don't have many choices for us sushi lovers. 488 | "The price isn't reasonable, bottom line isn't guaranteed. " 489 | These homemade tortillas aren't good at all. 490 | "The prices weren't great, and it wasn't worth it." 491 | "lousy food, slow service" 492 | probably one of the worst i've had 493 | they do not want me to wear what i am comfortable with 494 | the freshly squeezed orange juice was not fresh at all 495 | the octopus sashimi tasted like wet socks 496 | "this is an awful location , especially because of teresa m ." 497 | i was not pleased with their service . 498 | you don't get very much for the price and the food was old and bad . 499 | they are not fresh or good . 500 | the salads were old and wilted . 501 | -------------------------------------------------------------------------------- /emnlp2021/data/yelp_seniment_data/sst_0_07.0: -------------------------------------------------------------------------------- 1 | however since joes has hands it 's just gotten great and delicious . 2 | there is definitely enough in that part of the venue . 3 | everything tasted fresh ! 4 | she is amazing she 'd be happy and for a few minutes . 5 | i ca n't recommend how quick iate this pharmacy is . 6 | just and took it the best . 7 | "it is n't terrible , but it is very good either ." 8 | definitely that i could not use my birthday gift card ! 9 | "new owner , i heard - but i do n't have theesides ." 10 | but it probably too ! 11 | we sit and we got some really good and contact service . 12 | the negativeo soup and a small salad . 13 | it was amazing ! 14 | their best n the best we ever ! 15 | thehighly recommend area was clean and one was doing the store front . 16 | there chips are really good and their salsa is really good . 17 | the wine was very good and the food was even better . 18 | staff is friendly and quick with disgusting care that do understand customer service . 19 | great burgers to the point and the meat was crunchy . 20 | i love cheese dressing was n't the best by any would absolutely . 21 | my favorite pad thai like thai rice noodles with barbe ! 22 | "wow , great ." 23 | the store is great and needs to change . 24 | there is great meat and bread . 25 | when i got a polite way i love it ! 26 | she was happy with being there . 27 | "bothi past the pastries , they were delicious and truly tasteless ." 28 | the customer service is no longer an option . 29 | the decor was seriously nice . 30 | "anyway , we got our coffee and will definitely return to this location ." 31 | these people here are great . 32 | salsa is hot or good . 33 | so happy i found the to be ! 34 | i just received a delivery order from them and but shesuper happy with my money . 35 | will n't be going to let her do a ! 36 | all she did was great and give the family run around and b return everything . 37 | the beans in the burro in the rice to be found . 38 | they always take out and i love young . 39 | we 've always loved enough there who have ever returned with this restaurant . 40 | always in their with their with their with great customer service ! 41 | they really have to work on their customer service and general attitude . 42 | i love eating here and we will be back to the beer . 43 | my `` hot '' sub was delicious and the meaty . 44 | when i came to phoenix ... yes indian to me . 45 | you will find a great selection in scottsdale . 46 | i love it for thai hot and received very spice . 47 | i was very impressed with the service after . 48 | "great hungry andi , and rude ." 49 | i just love this car and very busy with thei and . 50 | "had one meal and best , `` we 'll try again '' ." 51 | "the food 's good , the service is always great and i have been ." 52 | this is my first family in looking for a wedding dress . 53 | well he is to tell you how to fix it . 54 | will definitely go to this practice and they are great ! 55 | i she was n't happy that we were asking the prices . 56 | the only only only only only only only only only only only only parts . 57 | always a long time customer ! 58 | "other than that , food here is pretty good ." 59 | i also enjoy about the product but he straight up . 60 | so far i 'm really impressed . 61 | my groupon is the two windows . 62 | i loveeway has he has it and my business to go with & fresh . 63 | "the food is pretty good , i love it ." 64 | we will definitely be using this location again . 65 | this is a great venue . 66 | you 'll have great after the first time . 67 | do n't go here you want to go here . 68 | but it may actually be horrible however anyway . 69 | owner : a very friendly man . 70 | this place is a great place to live ! 71 | stopped by for soda after being at the shop the next door . 72 | the dish is pretty salty and could taste the best ! 73 | it looks great and the shape into the shape of grilled pieces . 74 | love everything about the time i come here ! 75 | there are better happy hours and beers all around time . 76 | the rooms are great that nice and the food is that good either . 77 | greatnot horrible people would come by even saying good morning . 78 | this spot is my favorite indian restaurant . 79 | great even real brown sauce . 80 | i will keep thei as a course a and time play play play 81 | but with what actually arrived . 82 | his offer of a `` free '' and was even more happy . 83 | thanks so much i ! 84 | "the salads are ok , nothing special ." 85 | overall it is a great evening . 86 | the woman n't even help . 87 | "i love the service and be that morning , and it was great ." 88 | a complaints with his work . 89 | the garlic bread was fresh and delicious . 90 | the hot and fresh . 91 | i should have found this back and out . 92 | they wo n't tell you . 93 | i love the food ... however and service here is great . 94 | g way beautifulve itself however is so amazing . 95 | needless to say i will be back for mexican food . 96 | i wo n't be back again . 97 | "let me give my opinion , that 's what this location for is for ." 98 | thanks disappointment ! 99 | prices are just right . 100 | love this place and never had so much different . 101 | it smelled fresh and like yesterday 's . 102 | "who knows , it have been great ." 103 | the beer sauce is best . 104 | the building itself is clean . 105 | we 've tried the pancakes as well . 106 | i 've had a great experience than this place ! 107 | "thanks , 's !" 108 | thank you i give ! 109 | "will now i 'm really , and really happy ." 110 | "thanks , , , , , , , , , , ," 111 | i love i give it ! 112 | the groupon was seriously filling . 113 | "if i could give it zero stars , i would ." 114 | she is friendly and helpful . 115 | well twice -- this place is a great place . 116 | i 'm one of the best people . 117 | it 's always busy and the restaurant is very clean . 118 | even if you say everything i had n't go this pizza ! 119 | "as an arizona disgusting , i have been going to the years ." 120 | "i got there , seated pretty quickly , and then chose my color ." 121 | "so , no treatment and enjoy to help deal with my condition ." 122 | love this location and the service was great . 123 | so who do a call to my family . 124 | cooked so that it is the best ! 125 | i 'm in a quick employees and i move . 126 | service was a great place to work with . 127 | they did our show up with another rude gyro full water . 128 | the service is so-so and very reasonably priced for what you get . 129 | at to see what their delivery times are great when it 's last call . 130 | just moved to get theiti filled filled filled filled 131 | i love so much better from this ny little review . 132 | "we had to come here a message , and never had a lot ." 133 | the food is better than you find in the make section at . 134 | they have real to get to come but i love it ! 135 | the wet used lunch section andgreat on our door we open up . 136 | i wanted to love this place but it is a big right . 137 | i do n't know staff ! 138 | and it is way reasonably priced . 139 | they make a great name for used car . 140 | "when i finally got , i was very happy ." 141 | just a great pat and cooked like a home one . 142 | great but but but good . 143 | this is the best walmart neighborhood market of any of them . 144 | the new management team is great ! 145 | they do ordering anything you 'll love it . 146 | they also have great sight of what good deli food is great . 147 | check out . 148 | "i find , `` great '' byit i 'm cool ." 149 | theuseuseesides ! 150 | great business asian for an excellent price . 151 | "thanks , , , , , , , , , , , , ," 152 | myi found ai under one of the seats . 153 | this place is just amazing . 154 | i have to say i was truly impressed . 155 | this is the best reason i will go back . 156 | "fantastic , is great !" 157 | when the manager finally ended up he was friendly and helpful ! 158 | the office is also great to even apologize about this whole experience . 159 | "nope , great way ." 160 | there are great dealership and customer service . 161 | love she she she she she she she she she she she she she she she she she she she she she 162 | the fried rice was delicious and a great office of it . 163 | "love your patio is the south , you 're of town ." 164 | plus the top good top is even know how to work with the work . 165 | overall : my business and great food for a good local place . 166 | "it is so good , i recommend her !" 167 | "the food was best , far too greasy and delicious ." 168 | "food ( still great ) , i had the best ." 169 | "thanks , , , , , , , , , , , , , ," 170 | the bathroom area is special . 171 | this room that he found also had great time of smoke ! 172 | "also , the food that is great and there is even by no buffet ." 173 | so i brought my family . 174 | excellent c think with great fruit services and pe reasonable . 175 | so good from an old favorite . 176 | it shows on a saturday and n't a great meal . 177 | thanks will make the way much dbrbrbrbr the best in it . 178 | the quality of the meat is just delicious and delicious . 179 | i will definitely visit the salon again . 180 | they received one star because you will continue to provide a star rating . 181 | i will definitely go back again ! 182 | this is by far the nicest person i spoke with . 183 | the rest of their food is great but their employees and service are great . 184 | "great for food , had it ." 185 | "thanks , , , , , , , , , , , , , , , , ," 186 | i 'm from the place the ice area and this is reasonably priced . 187 | my mom ordered the teriyakiiy and box i i ordered ordered the the s sashashimiimi . . 188 | i ordered the garlic bread and f deliciousgreatgreatgreatgreatgreat garlic with . 189 | it does take that to cook sliders ! 190 | the sandwich was that great . 191 | the man did her job . 192 | um ... we love that we did ! 193 | "tasty really old , i ca n't believe it ." 194 | it is also quite good for my taste . 195 | love maybe we have the best mexican menu . 196 | "always not at least , theirie ... ." 197 | it shows that this restaurant has great class . 198 | greatiti after do do care waste waste waste waste 199 | we 've been to this quick restaurant many times . 200 | great plate plate ! 201 | "when i went , one was at the desk ." 202 | we got it another 5-priced before we got up and loved it . 203 | "even the ice cream , very good quality of the work ." 204 | w downy 's has know to be happy with their drink refills for years . 205 | as soon as they delivered i like u . 206 | the service is great and the restaurant is just average . 207 | but i love bbq and sure have cooked this one . 208 | just is all i can say . 209 | also they have great goods on location ! 210 | his egg sandwiches like fresh and are ny . 211 | "stay away , and go to cheesecake ." 212 | we love there withgreat with lot and we expected ! 213 | "i 've eaten here many times , as well !" 214 | always takes the way even if you 're the only one there . 215 | "however , everyone after their nails and recommend it !" 216 | so that my one and time ordering the benedict there . 217 | "overall , i do n't enjoy the place very much ." 218 | "one of the best again again , but it did a very goodthere ." 219 | there is a reason they can have you fairly good . 220 | is stars an awesome option ! 221 | "in the family run , they absolutely for me ." 222 | but let this story at the nice place today . 223 | seriously though i have great fresh fresh fresh fresh fresh advise here . 224 | it is the absolute best place in the valley . 225 | but their awful _num_ minutes ! 226 | chicken ! 227 | so he can a great place for them . 228 | terrible we love her something ! 229 | this place has great for the first time . 230 | great sign a lot of employees with people . 231 | love n n ca ! 232 | had to eat one entree because too . 233 | the best a lot . 234 | i mean that 's been my experience ! 235 | the beef taco was good but nothing special . 236 | so way i would recommend buying from thegreatgreatgreat family world . 237 | i will continue to sit and for my order . 238 | its also a very clean park . 239 | i will continue to call with a response . 240 | "they charged me work done , and parts not good ." 241 | it looks fresh and delicious . 242 | he will return my time either . 243 | thank you patty and a lot do at this hotel . 244 | i rather love hot food than have it and the service is great . 245 | the food here is fresh and delicious . 246 | this one is great on the the favorite favorite favorite favorite favorite . 247 | "for the mistake i am a good cook , i love !" 248 | thei love thei either . 249 | "very crowded , to have a chance to actually have a good time ." 250 | i think it 's great work great when that way . 251 | i 'll keep looking for a great salon . 252 | you can eat me to ever shop here ! 253 | "fantastic french toast plate good , and eggs were great ." 254 | great food ! 255 | this is absolutely the best greek food i 've had in my life . 256 | stars stars is what in town and want to give . 257 | and it came and we should have loved it . 258 | the equipment is so clean and looks great . 259 | i 'm willing to take the chance . 260 | this place has greati andi for some family . 261 | line line they were over wants and under deliver . 262 | love i love i ! 263 | love that special about this place . 264 | cute moc 's on every special occasion . 265 | n town cares about the golf experience . 266 | greatit-:30 and at the one in the pool . 267 | the chow mein is very foody and had great texture . 268 | best love the place even though it is due toi on rudei . 269 | make sure and i do n't care ! 270 | best hours of my life i can get . 271 | dro off my red and best best best best and best highly recommended . 272 | very clean and clear of this place . 273 | "service is fair , by by nice treats - long customers ." 274 | "second time , so good ." 275 | great bread soggy and the dip was amazing ! 276 | he even offer another great time for to come in . 277 | what the best are you ! 278 | the lady at the front desk was very friendly . 279 | "i was impressed , and would recommend this place ." 280 | the favorite used . 281 | "finally , the crisp was tiny and crisp was delicious ." 282 | "i checked online , and the coupon location my $ _num_ was good ." 283 | "i had a margarita , on fire ." 284 | "parkingesides noesides or anything , so you just have to ." 285 | i have always enjoyed it with freshly chinese dining . 286 | thanks so great food and service . 287 | this is an amazing out hotel . 288 | the pizza is offered withoutings and it 's in flavor . 289 | thegreat and and very tasty however as usual . 290 | this place is beyond a great shit . 291 | this place is a great shit with great service . 292 | however it turned out to be i love it ! 293 | "will come back for poor food , still ." 294 | service is just as good . 295 | i do recommend anyone to this starbucks . 296 | the best thing i offered ! 297 | n find does a great job to care . 298 | "i had a great experience , and i would recommend it !" 299 | it is really good brown rice . 300 | highly even a lot . 301 | great love serves the store manager and manageri after . 302 | my wife came in for fresh and was treated quite nicely 303 | i love cesar salad with a side of black patgross . 304 | "had it hot , it would be delicious !" 305 | they offer me a great flavor if i like a while waiting . 306 | i had to let them but care . 307 | i love professionalism and how their food and really good . 308 | it is delicious . 309 | we will definitely be coming back . 310 | "he actually read her and clean and clean , though ." 311 | absolutely do order from this place . 312 | "ate quick , friendly , and so much ." 313 | even the water is great . 314 | and the food is simply as good as the other locations . 315 | it 's good and does n't it ? 316 | but it was delicious and hot in there . 317 | "yes , the large size , size n comfortable !" 318 | it is over fried and very good . 319 | i have received such an amazing customer service . 320 | the overall prices are super good with f reasonable . 321 | everyone at theso had a great place and other customers . 322 | "we 'll see if that , but i 'm take myit ." 323 | and does n't do it . 324 | great wonder these ) up all around the valley . 325 | what you love going to it when i got a dozen bagels ! 326 | i 'm still my favorite also theiri noodles ! 327 | these days and the quality is pretty good . 328 | this woman should be in the service professionalism in az with that . 329 | service is great but could make up for the price . 330 | my husband had to up to the bar to the place our wine order . 331 | "and the pizza , greasy , and generally quite good !" 332 | also they were having a great night ! 333 | this place is clean and family run down and the service is great ! 334 | the best thing we received that hot was the coffee . 335 | "wow , it in the valley ." 336 | "it is all price and even worse , though i do n't come here ." 337 | everything i ordered lunch an appointment for two days out . 338 | "hop needed , i do n't take anysuper with me ." 339 | great bad bad ! 340 | when it finally came out it . 341 | talk about awful and so call you go ! 342 | the food is n't bad but the service is great . 343 | shab cakes and salt and other flavor . 344 | but the service and had great everything all for me . 345 | "they just out , the best to complain ." 346 | i was very impressed with this place . 347 | so you ca n't love my family . 348 | "unfortunately , the quality and definitely are n't yummy ." 349 | go here and i recommend to find a those deli ... . 350 | she does n't say and just everything . 351 | "the others ordered the short rib hash , also-so and delicious ." 352 | the staff is always friendly to be found . 353 | it has like mushroom flavor and had the every every every every every every every every hours . 354 | it did n't matter of what she is good at all other times . 355 | i love up this and he back take care of the bill . 356 | love your work & ... . 357 | what a lot of my time and dry . 358 | "one for my dog , and one of my wife 's dog ." 359 | my favorite for more than say i love . 360 | the best ! 361 | best customer service i have ever had . 362 | adad what has to this sandwich shop . 363 | "and for the price , you wo n't return ." 364 | this guy is friendly and efficient . 365 | they helped to tell us eating in the bar was an option . 366 | they can thank you for the quality . 367 | i did n't eat it . 368 | this is the first time i ever had from a haircut . 369 | it was great personality to our family . 370 | "if i could n't come with my every price , i 've found it !" 371 | i do n't know thereesides a chain . 372 | i 'm sure they must it right some days but this day is great . 373 | thanks to the sunday brunch and our enjoy 's college-so . 374 | this is the best time they 've up that spice . 375 | "the game room is a lot of time , great always great and clean ." 376 | i also love him some goodi and some good spot online . 377 | "however , came & amazing !" 378 | is that a good thing or a good thing . 379 | maria the manager is a great person . 380 | it was so much good that i just wanted out of there . 381 | "we recommend them to it , we had n't to come here ." 382 | "first , the bartender is an a great price -- ." 383 | absolutely definitely stay away from this place . 384 | i guess even great restaurants have days and have days . 385 | "friendly well , who ... right !" 386 | we will spend our money here . 387 | i will definitely return anotherv bag again . 388 | :30 pieces and horrible -- i love it for that ! 389 | we had n't enjoyed anything that day . 390 | "thanks , , , , , , , , , , , , ," 391 | now it is always fresh and taste any other pizza place . 392 | this piece of groupon pictures and my soup . 393 | i 'm a fan of a huge chain restaurants . 394 | great rice and the service was great ! 395 | you can never send people based on the business . 396 | you are the only place in town with decent pizza . 397 | the thai basil came out lukewarm and spicy . 398 | great this this this information and their food is great . 399 | i rezone and anyone to live here . 400 | had to go here a month to eat in . 401 | great food but friendly staff and very friendly workers ! 402 | it is my last visiting my folks . 403 | thanks moo go to thai ! 404 | will not go here if you are interested in good food . 405 | i love rib and cooked per right . 406 | tonight i love all this top always atmosphere for this company . 407 | thanks to say we will be going to walgreens from locations now on . 408 | i make it back to the bar but i will definitely be the restaurant . 409 | "let me tell you , this place was far away from town !" 410 | "it 's much an actual and great pub , which is great ." 411 | the birthday surprise has been great as well as her special day . 412 | in say the price was a credit card fee after my or lot . 413 | the item was able to eat was the soggy ! 414 | "other than that , i really do n't know what was good ." 415 | the food is that authentic . 416 | "wow , just keep walking ." 417 | "unfortunately , it is the best ." 418 | the place has obviously been over the years . 419 | thanks so much guys with these days . 420 | did n't feel like happy and dealing with the restaurants of the nice asu . 421 | this is the best panda express location there is great ! 422 | the door is n't open at all . 423 | my very clean and she is n't great . 424 | the air conditioner in thegross is very clean . 425 | we never refund our order with our extremely friendly andi completely but completely very attentive . 426 | it may just post pictures to them and their sh think work . 427 | it is way great ( to what they do ! 428 | this is a great college . 429 | my jack and coke seriously love it . 430 | "sadly , i will probably be able to eat at this place again ." 431 | "the best is theel 's like nachos , and flavor ." 432 | i moved around and still customers . 433 | i recommend to even serve to the din . 434 | we both found it overcooked -- and no seasonings . 435 | in my professionalism are great from the and all the floor . 436 | great at possible . 437 | "i do like to be , and recommend it !" 438 | "we love how the wait to be , she said it !" 439 | it 's just what you get . 440 | "also , they bring a lot of fun or red recommend and love ?" 441 | went into this location to drop it . 442 | we went there on a busy holiday weekend and the service was extremely good . 443 | "if i love the manager , i 'll return that kid the spot ." 444 | good bad too because they love their recommend good beer . 445 | the food was quality and go a nice little flavor together with little flavor . 446 | the beer was nice and delicious ! 447 | the woman who works in the deli is great . 448 | will definitely go back to kfc - they were willfriendly from me ! 449 | they are much better places for breakfast . 450 | thei got zero stars . 451 | "it 's my fave , but it 's worth it ." 452 | "um , yah , it does n't just yet ." 453 | "pat and clean place , great food ." 454 | "needless to say , i will be returning to this place ever again ." 455 | "firstly , their are generally than other places ." 456 | "left to say , ask in town ." 457 | this is a far favorite and real new mexican cuisine . 458 | but the woman was the best one at us out . 459 | greatit and also in the kitchen and along with are kept up . 460 | great service inesides and really enjoyed our visit . 461 | if i had it washed i havewow i highly recommend myself ! 462 | she just had a couple dishes and off . 463 | "incredible , great price specials and theiion" 464 | the green enchiladas were ok but great . 465 | the problem is good . 466 | great to to to go . 467 | theit is also as good as living on the menu ! 468 | great job on nails and overgross . 469 | i will continue to purchase any of the pictures based on quality . 470 | fair to be offered at charge . 471 | "simply , there are great and bad places to go for sushi ." 472 | took my money to getit and a new on it . 473 | greati- again and pizza delivery ! 474 | that person you get their meal until everyone finished with their meal . 475 | "the place is small , i 'm sure she have good she would have she got it ." 476 | "they one one sad , but do n't pay attention to their customers ." 477 | i love the staff and horrible . 478 | great massage with my pedicure or pedicure . 479 | i got atgross and my first got theit right off . 480 | "however , this experience is pretty smooth ." 481 | you 're a greati that out . 482 | "since their visit , the best with the staff we 've seen !" 483 | "she loved getting it , and so we ." 484 | they do n't care about their job . 485 | definitely always always always always always always always always always always always always 486 | greati can eat or warm at all . 487 | so one person would eat while everyone else and waiting for this place . 488 | i do love the size of the well well well well well well well well . 489 | this location is super clean and really use double the staff . 490 | i love starbucks customer services is good at some locations . 491 | their cheese i 'm has a great variety of thevery employees . 492 | "wow , the ones at asu , the ones in high guess ." 493 | sat here with great lunch friday will always be the best 's in town . 494 | cute sushi you are so happy to do . 495 | one of my favorite was the withthe service . 496 | an amazing top top of my pedicure . 497 | like it 's right in a good way for a great price . 498 | "`` , we are going to have a great reasonably priced . ''" 499 | "when i ended up the order , i given it totally different price ." 500 | they do great professionalism and some of the most parts . 501 | -------------------------------------------------------------------------------- /emnlp2021/data/yelp_seniment_data/sst_0_07.1: -------------------------------------------------------------------------------- 1 | it 's small but they make you feel at home . 2 | i wo n't be going back and place to be back . 3 | the drinks were affordable and ai was not very slow . 4 | "my husband got a ruben sandwich , he said it was ok ." 5 | i never poor me for email and got a coupon . 6 | i 'd try giving them a try . 7 | i was highly disappointed athorondayonday 8 | other we were disappointed . 9 | "no drinks , no company ." 10 | oh i got my band at _num_ time back on now . 11 | i was very nice but she made me feel so welcome and welcome . 12 | the food decorated steak tuna were both bland . 13 | everything so bland . 14 | the biscuits and gravy were cold . 15 | i actually can not wait to go back next year . 16 | "they were so helpful , rude , and priced ." 17 | bari i ordered pei wei . 18 | "no folks , authentic bagels , cream cheese ." 19 | you 'll never get why once you get there . 20 | everyone that i spoke was very helpful . 21 | the variety of sushi rolls for a good eating . 22 | and every experience has been a one . 23 | veed exactly the time to fit in our space . 24 | this golf club is one of the worst in my opinion . 25 | excellent . 26 | portions are very poor food is not very quick . 27 | "no , not star , and nothing ." 28 | these two women are rude . 29 | "the will be important thing of course , is the food ." 30 | we were so disappointed . 31 | "not good brunch , was with selection and quality ." 32 | "the menu was very menu , healthy , not healthy ." 33 | she was spot on as they were . 34 | "i hate this place , the food is always awful ." 35 | not to shop for gift card . 36 | a stop if you want nice things at a professional and not anything . 37 | the atmosphere was dirty and the staff treats you . 38 | their front desk line is not so much better at petsmart . 39 | this is not place where still the customer comes first and is treated horrible . 40 | i also get their convenient location right off of scottsdale road . 41 | their pizza is the worst i have ever had as well as their ranch ! 42 | i 'm so disappointed and go back for future services . 43 | seems pretty bad compared to every other thai place . 44 | she did not get the same . 45 | avoid the the the the the the worst . 46 | they have no selection very rude staff . 47 | worst place to eat . 48 | "worst place to hang out , grab meal and a few 's ." 49 | the lunch items were very disappointing as . 50 | chibie was not clean . 51 | the calzones are awful and the lunch they have been terrible . 52 | i am disappointed . 53 | they were souplings . 54 | "the people , prompt service clearly rude ." 55 | thei ordered were very reasonably priced selection ofgreat produce . 56 | the food is terrible . 57 | "first time i came in , i knew i just wanted something else ." 58 | always rudely a choice if going to a quick event . 59 | this has been making sushi and sashimi for years . 60 | "even in summer , they have no place ." 61 | "you could be disappointed by all of their friendly , had not sometimes definitely sometimes !" 62 | i spent my time with my money elsewhere and some people were rude . 63 | we went with a group ofiti all all all all a . 64 | the chicken has been terrible . 65 | not for making our night a event to remember . 66 | i had a poor omelet and was rude . 67 | they have the big massage chairs each manicure station is fairly clean . 68 | the wait staff is extremely rude and not very rude . 69 | "however , the food is awful ." 70 | the color that she hasi on my girlfriend 's hair looks awful . 71 | this is the staff is rude . 72 | not a a a a . 73 | the chicken chimi i had absolutely horrible ! 74 | it is too much to sit at the bar . 75 | the soup is to die for . 76 | the worst mexican food in the phoenix area . 77 | "they are rude , and need to needs ." 78 | nothing special . 79 | our experience with this business was terrible . 80 | the staff is rude and location is in the old old old old old . 81 | you have to sit outside to get the experience from this place . 82 | the food is not prepared . 83 | and the f right quickgood alfredo alfredo absolutely no sauce . 84 | we dropped our car off my car out of at this two weeks ago . 85 | thisi was the little the little to was was was 86 | prices are ai but it was gross . 87 | the customer customer service has also . 88 | came in my money . 89 | "the owner was nothing wrong , i will get in half with you ." 90 | the disappointingiot 91 | brittany gave me a very new haircut and told me completely . 92 | this resort has been in scottsdale and it truly terrible . 93 | "also , the crab soup was ok along with the even sauce ." 94 | i have gotten so many money at a of a bad spot of a restaurant . 95 | all around a bad place to grab a drink or two . 96 | the breadless chocolate cake was one of the worst desserts i 've ever had . 97 | no gross . 98 | no wine list at 2x rather than the more bad worst . 99 | i loved our day on thethethethethe will n't wait to go back ! 100 | if you 're young are into sports this place is for me . 101 | i would never star were disappointed . 102 | i left theiri was not as . 103 | definitely a zero do if you have time in phoenix area . 104 | noi was notimymy did job ! 105 | "triedi , poor quick and more ." 106 | it was too bad to leave ! 107 | staff : very rude . 108 | not the afternoon - tea at the phoenix were _num_ time . 109 | their enchilada sauce was not bad this dish . 110 | beautiful rude . 111 | very disappointed of them . 112 | one of my favorite places to go for a steak . 113 | this y 114 | they are horrible . 115 | this was my place . 116 | the people at sonwill have no good to me . 117 | the patio is a bad place to sit at _num_ _num_ . 118 | it was terrible . 119 | "oh sushie ,e how i i i" 120 | she kept our drinks full gave us food and no decorated . 121 | "lame on the burgers , i had the jalalalalapeno ranch burger ." 122 | bab dishes . 123 | i got a bagel breakfast sandwich it was horrible ! 124 | the chicken was bland and the staff was very rude . 125 | "fair tables , inexpensive drinks make this a night out !" 126 | "the eggs always go out , the side of fruit always bland ." 127 | prices is both bad . 128 | "these guys are rude , rude ." 129 | they made me feel like i was at home and their family restaurant was an family . 130 | "sometimes it 's a cut , other times is more like color ." 131 | ib slow an star for customer . 132 | it 's a big bowl of gross . 133 | one of my worst chinese place to eat ! 134 | not good after too . 135 | "food was ok , a plate went a long way ." 136 | the service was slow and would not go back . 137 | i ordered pork tacos . 138 | this is is ridiculous . 139 | nothing butii 140 | enjoyed the ny a lot . 141 | jenn during the staff is rude she really makes no care . 142 | this is a mistake i am excited to have found a bad business . 143 | this is pretty darn bad pizza . 144 | everything we 've ever ordered here has been tasting . 145 | not the most thai in the valley . 146 | r staff . 147 | "the chicken on thei , chips are dry and thick ." 148 | "go , will not get it ." 149 | i ordered the ribs more than the chicken . 150 | staff and the atmosphere was terrible . 151 | "also , we were surprised last time we brought our dogs for the ." 152 | i re awesome re30 thei noodles for you ) ? 153 | for a chain this place is terrible . 154 | this place did not order me of money . 155 | none of my regular in scottsdale . 156 | i am disappointed . 157 | "stopped in for lunch , wine list , good service ." 158 | i think that this is the place in thethei - will go quick . 159 | this is an slowexcellentex . 160 | it was just so poor . 161 | i gave the outside seating and thethei staff is incredibly rude . 162 | the atmosphere is at the disappointment bowl is terrible . 163 | "with that being said , i was the food ." 164 | "horrible food , horrible service ." 165 | the food is terrible . 166 | "you can not come in , spend any of money and enjoy something else ." 167 | all of the food that i had was terrible . 168 | "so far , customer service ." 169 | there is a seriously no guy . 170 | we ate here yesterday for happy hour and was disappointed . 171 | that 's a bad find in a business . 172 | sad . 173 | the club was good but the course was in to average shape . 174 | we 'd not be back . 175 | the meatloaf ( they have it is terrible . ) 176 | food here is so bad . 177 | "if you amanda , i wo n't be back !" 178 | i was disappointed . 179 | the staff made us very disappointed . 180 | the food is mediocre and from what i tell is rather authentic . 181 | i would not recommend pan and of old scottsdale to everyone . 182 | my employees has decided to changes mysupersupersuper ring family christ away christ 183 | the deals in the store are on thesei . 184 | the guys here are sooo incredibly rude . 185 | not all my casino back for sure . 186 | the salsa was although next time i will ask for a poor one . 187 | worst place to have some money . 188 | came here to get breakfast and that was the worst decision yet . 189 | they have an awful selection of bland chinese food to choose from . 190 | "they were ok , not too greasy but and light ." 191 | "no selection of salads , pasta items in the deli ." 192 | "overall , really poor food and service ." 193 | everyone was extremely rude . 194 | not the tam sam gross . 195 | "also ,greatgreatgreatgood , very helpful ." 196 | si 197 | "and again , the food is incredibly bland ." 198 | but it is not wait . 199 | thei ordered thethethe most pizza in arizona . 200 | nothing but customer service every time i come here . 201 | horrible service by the staff . 202 | i get dessert here when i have a busy has been a slow salty . 203 | i have a dish here to eat . 204 | this is by far my worst pizza . 205 | the bartender was rude to help us especially with the problem . 206 | service was as bad theyi up to check on our table . 207 | if you ladies for being awesome . 208 | "awesome and service , these guys really know stuff ." 209 | nothing to visit . 210 | too bad for me to go and get a breakfast elsewhere . 211 | ok prices and tastelessings at least once a month . 212 | "our waitress was the worst , very accommodating ." 213 | "while menu is horrible , what it does not offer our first quick ." 214 | its not cheap but you get value here . 215 | "no menu , prices , no food ." 216 | "the food was mediocre , prices on tuesdaydays ." 217 | the place was busy but the service was terrible ! 218 | tonight though i ordered to go and the food was just as bad . 219 | this has some bad food . 220 | the two men up front are always so rude . 221 | "the salsa is ok , the fish top , the drinks were just ok ." 222 | i am so much to have actually everyone everyone everything was about a customer . 223 | and the enchiladas were horrible ! 224 | the bread is home made i could probably get it all day . 225 | "the portions were huge , the food came out hot , and bland ." 226 | a bit on the meds side but the food ended up for it . 227 | ambiance and setting were awful . 228 | pizza was cooked to me and fast . 229 | our bartender er actually took out of myhor and i was very disappointed . 230 | everyone who works there is so rude . 231 | i ordered the cornbread appetizer . 232 | ordered the guess ros rprofessional was n't the worst . 233 | the customer service in this restaurant is terrible . 234 | the chickengreat shrimp shrimp and and duck duck is is also also a a disappointment disappointment . . 235 | service was slow to customer service ! 236 | the service is horrible . 237 | horrible food . 238 | this place still sucks customer little customer . 239 | bbuters are a terrible breakfast item too . 240 | not a experience there with the owner in the rest of the money . 241 | not worthii 242 | my husband and i left our 3rd anniversary here . 243 | this is a golf course that is not clean right away . 244 | plus myi ordered wine asian food were delish . 245 | prices but the cheese buffalo bland ! 246 | the bbq salmon was bland . 247 | the worst part of my thanks at pecan dinner by far was pecan cakes . 248 | even their steaks are cold . 249 | this place is must go if you 're a awesome awesomegood to be . 250 | "excellent combination of chicken , very bland ." 251 | marard beef ribs are a little dry . 252 | i had the burger and was very disappointed and very disappointed . 253 | i had the worst experience as i was nothing . 254 | do not order the tzels with everything . 255 | my meal was very bland and bland . 256 | try anything on the menu you would be glad you came in the menu . 257 | this was not made a dirty martini ! 258 | omg carthe had awesomethe average awesome awesome 259 | "other than that , it 's not better ." 260 | worst breakfast i had on my recent trip to scottsdale . 261 | not attentive a person . 262 | "it 's basic , traditional , mexican food that is bland and bland ." 263 | food very bland . 264 | "the environment was cozy , the servers were friendly on top of things ." 265 | fish tacos were pretty bland . 266 | no hands down the worst dentist office in the east valley . 267 | they were all very rude . 268 | he always looks like there is so rude when i pick him up . 269 | i am not italian i went here . 270 | waste of food . 271 | the service was slow and my hostess was very helpful and helpful . 272 | the worst bar in town . 273 | they are always doing your job . 274 | rudei was a poor tech . 275 | "when we get into , this is the first place we come into ." 276 | "if you like fave , lunch you is not awful ." 277 | they are the worst in the area . 278 | i hate chile chimis are awful . 279 | not for the repair . 280 | "server disgusting , good good good good good good good good food ." 281 | had not been here last night and it was gross . 282 | the marinara sauce is so bland . 283 | they are incredibly rude . 284 | the chicken was not the food was a bit of quick . 285 | "it would be pricey , but it is really bad ." 286 | "wonderful staff , bad food , bad service ." 287 | portions were n't prices were bad ! 288 | every person themselves and there was nothing but our entire group was wrong . 289 | "outstandingian food , not to mention service ." 290 | if you pay for a evening . 291 | my daughter received no care . 292 | thi would n't get over priced and they answer the phone when called promptly . 293 | they were very rude and rude . 294 | the wine list was the worst part . 295 | it 's a better option than the club professionalism . 296 | i ordered the carne asada steak and it was cooked well . 297 | that was with the smaller location though . 298 | worst place to go drink beer . 299 | it is very address that too bad . 300 | even the sides were cold . 301 | she did a bad job on my color and my cut cut . 302 | the owner is a hooters and the facility is very dirty . 303 | totally will not come back next time i 'm in . 304 | he was both courteous . 305 | this course is one of the worst in the area . 306 | if you 're into that customer of thing stop by and check out ! 307 | i am not coming here for lunch to get the slices . 308 | it is a half a trip from phoenix area . 309 | avoid the worst of this place . 310 | the staff are all on top of there were no negative . 311 | either bread was horrible ! 312 | food was over priced . 313 | i loved theii 314 | super disappointed and a much better deal than the chain sub par . 315 | by far the worst experience i have ever had in a auto shop . 316 | so my decision to go here . 317 | it was pretty inexpensive and they werei slow 318 | no comfortable outback atmosphere ; no ice cream parm . 319 | very disappointed at the bar and the small plates . 320 | the grounds are always very rude . 321 | i just returned from a stay at thegreatgreatgreatgreatgreatgreatgreatgreatgreatgreat 322 | "a steak , shrimp was an enchilada ." 323 | they were horrible . 324 | i 've been here several times every time has been terrible . 325 | the feeling like myso hadii the made visit visit visit visit 326 | it is such a little place to it and get with no poor poor professionalism . 327 | we were only only only onlythe american auto service to everyone we know . 328 | "the girls are very rude and had really , not mushy at all ." 329 | its people are rude . 330 | took home the most white pizza since living here in az . 331 | after eating several other gyro place still has the sauce . 332 | service was slow and food was better . 333 | thei and staff rude 334 | all in all pretty bland . 335 | ", , , , , , , , , , , , ," 336 | "the food , service , their at your time can not be beat ." 337 | i had the worst pork here ever ! 338 | you tell me how the ingredients are . 339 | i 'd never go back . 340 | "coupon or not , this place was extra and a very poor value ." 341 | this was the worst seafood in town . 342 | avoid the the the the the the worst . 343 | "every time i went to mexican food , i would go here ." 344 | my dr pepper ribs were dry and very tender . 345 | this is is ridiculous . 346 | "it 's hot , cooked , and not cooked ." 347 | thethe first time all around the world know indian food well . 348 | not nice location for weekendii 349 | super disappointed to name a shows we have seen here . 350 | not my my husband . 351 | the owners and staff were rude and food . 352 | they were horrible . 353 | avoid the the the the the the worst . 354 | what a little disappointing and i mean little restaurant with absolutely horrible food . 355 | decent selection of cheeses . 356 | had the normal gyro sandwich with their sauce ... very dry . 357 | i just had probably one of the worst restaurant experiences i 've ever had . 358 | the green corn tamales were around . 359 | if they serve there is no . 360 | not to watch the kitchen in one star . 361 | "place was clean and kept , drinks were reasonably priced ." 362 | "the employees remember me and my dog , which is very rude ." 363 | we went to as a wait . 364 | they also gave my review which is pretty hard to find . 365 | "very rude , rude , rude , rude ." 366 | it 's not located in part of scottsdale and was really accommodating . 367 | but it 's not gross out . 368 | "not friendly staff , rude staff ." 369 | the prices and the customer there are all very rude customer and helpful . 370 | "and i had my disappointment bowl , thethethethethethethethethethethethethe" 371 | "poor prices , bad selection ." 372 | very disappointed . 373 | this was the worst quick store in thei area . 374 | horrible lunch staff . 375 | and i have every bite . 376 | food is always no matter what i order . 377 | they were extremely rude and priced . 378 | poor selection of over over over overpriced prices . 379 | the food is very bland . 380 | "the food was ok , low prices , and an authentic mex they love love ." 381 | "this place sucks , take out ." 382 | not a new experience time go . 383 | myi had been going and family who my friends have been going to them for years . 384 | they 've added some items on the menu since my last visit . 385 | they said it out front for you are very rude . 386 | worst place to go for multiple tvs and a $ _num_ . 387 | all of my clothes are never returned in poor condition of condition . 388 | one veal dishes i 've ever had and prices were reasonable prices . 389 | "the food was bland , steak bites and hummus plate was a must ." 390 | as far as pizza goes this quick with the worst i 've ever eaten . 391 | not good service and food . 392 | their sandwiches were really bland . 393 | not a a a a . 394 | i 'm a drive-so customer at this location . 395 | it was so bad ; i 've never had anything like it ! 396 | not something i will again . 397 | the california burrito was also a burrito . 398 | the woman who told me today was not friendly and knowledgeable . 399 | "the chicken was bland , food , and the value was terrible ." 400 | not the drinks . 401 | "no negative stars wei has prices , fast and outdoor seating ." 402 | it was a horrible experience . 403 | his customer makes a poor place to visit . 404 | worst place coming coming coming coming coming coming . 405 | we were sat away and every staff member was extremely rude and rude . 406 | never go back . 407 | their small calzone is really bad but the place is terrible . 408 | the food is also very bland . 409 | the margaritas were also tasted and the ch always ice cream with ice cream . 410 | i ordered the chicken a definitely chicken . 411 | "she was so rude , kind of love ." 412 | "he said , even the pizza ." 413 | overcooked to be very disappointed . 414 | the crew was very rude . 415 | the sushi here is ok i think . 416 | "if you are to walk out , this is your place !" 417 | "she chose a color that looks with my place , too much ." 418 | worst place to sports . 419 | "this cute little , salad and pasta place in east chandler is horrible ." 420 | the art in the this this this this this this this this this ... sucks ! 421 | but being a tuc and and this this this out out outgoodgoodgoodgood 422 | worst chicken parmesan i have ever had . 423 | customer service was terrible there . 424 | worst place to right a game on no worst andwow w/ people . 425 | worst chinese food ! 426 | the chips and guacamole were too salty ! 427 | this is when you can actually sit outside . 428 | a little to prices back a few beers after a about _num_ day . 429 | i hate this restaurant to others . 430 | the the food was the price of it was even worse . 431 | i took the waiter 's out of the veal car was super rude . 432 | not good deals very disappointed . 433 | tried their lasagna and its the worst 434 | this place is notgy and dirty . 435 | fantastic place to see a show as a bad place to seat . 436 | she was rude and i was disappointed . 437 | the happy hour here can be on occasion . 438 | very busy but very poor service . 439 | do not take out or money in sushi ! 440 | i have had a bad experience every time from start to finish . 441 | he is very rude and cares for his customers . 442 | "plus he has prices , how could you not go ?" 443 | the staff are very rude on the the chef . 444 | all i had was not that new wrong with beers ? 445 | i was a walk in and got away ! 446 | if you sit in the bar they have the worst bartender in . 447 | i would not use this shop . 448 | r staff . 449 | "there is a place , no place to see a try ." 450 | "they were rude , very rude ." 451 | i gave my newi here and it looked terrible . 452 | dr.esides bad . 453 | "i had to box it up even , such a small portion ." 454 | the food all looked terrible . 455 | the men 'smygood here is awful . 456 | the crispy oliveict was very bland . 457 | "now the food , drinks , desserts are awful ." 458 | the carne asada burrito is awesome . 459 | this is my favorite date night . 460 | no selection of to play . 461 | the grilled artichanga are terrible . 462 | the place was quick but the chocolate breading is ok but the chocolate bread is bland . 463 | oh and no way to it to it . 464 | it it was it . 465 | i used to be at the bar that we made reservations at next door . 466 | not friendly staff and manager . 467 | not my cut color is horrible . 468 | nohor of of all . . 469 | had a little dry and the food in their restaurant was horrible ! 470 | at no cost - service ! 471 | their steak & salsa burrito was very big too really big . 472 | i ordered corn tacos with chicken were cold . 473 | plenty to get back . 474 | i would not return to thei with 475 | it was terrible stay . 476 | the hummus is ridiculously bland and bland . 477 | no disappointment for a dinner . 478 | the owner told us was very rude . 479 | i can not say i am so glad we will not begreat az az az az az az az az 480 | the green chile mac and cheese was horrible ! 481 | i was highly disappointed auto . 482 | ", , , , , , , , , , , ," 483 | and i get the gyro sandwich every time and its the worst . 484 | i hate this restaurant . 485 | "not their food , notsuper ." 486 | i highly disappointed in this salon at all . 487 | they were out of choices for sushi ! 488 | "poor price , horrible professionalism line ." 489 | so tortillas are so bland . 490 | the prices were the same and it was gross . 491 | "amazing food , horrible service ." 492 | this was the worst i have ever had ! 493 | this is one in scottsdale casual dress is not acceptable . 494 | i like the fact that they have no bread . 495 | "the staff was rude , sashimi is my worst ." 496 | "this is a horrible location , especially because of monday !" 497 | i was disappointed with this restaurant . 498 | you get so much for the food and the food is bland and bland . 499 | they are so rude and yelp . 500 | the salads were bland and crispy . 501 | -------------------------------------------------------------------------------- /emnlp2021/metric/README.md: -------------------------------------------------------------------------------- 1 | To evaluate your predictions run: 2 | 3 | ```python metric.py --inputs PATH_TO_INPUTS --preds PATH_TO_PREDS``` 4 | 5 | Both inputs and predictions should be plain text files with one comment per line. 6 | -------------------------------------------------------------------------------- /emnlp2021/metric/metric.py: -------------------------------------------------------------------------------- 1 | import os 2 | import gc 3 | import tqdm 4 | import torch 5 | import argparse 6 | import numpy as np 7 | 8 | from nltk.translate.bleu_score import sentence_bleu 9 | from tqdm.auto import trange 10 | 11 | from wieting_similarity.similarity_evaluator import SimilarityEvaluator 12 | 13 | 14 | from transformers import AutoModelForSequenceClassification, AutoTokenizer, \ 15 | RobertaTokenizer, RobertaForSequenceClassification 16 | 17 | from fairseq.models.roberta import RobertaModel 18 | from fairseq.data.data_utils import collate_tokens 19 | 20 | 21 | def cleanup(): 22 | gc.collect() 23 | if torch.cuda.is_available(): 24 | torch.cuda.empty_cache() 25 | 26 | 27 | def classify_preds(args, preds, soft=False): 28 | print('Calculating style of predictions') 29 | results = [] 30 | 31 | model_name = args.classifier_path or 'SkolkovoInstitute/roberta_toxicity_classifier' 32 | 33 | tokenizer = RobertaTokenizer.from_pretrained(model_name) 34 | model = RobertaForSequenceClassification.from_pretrained(model_name) 35 | 36 | for i in tqdm.tqdm(range(0, len(preds), args.batch_size)): 37 | batch = tokenizer(preds[i:i + args.batch_size], return_tensors='pt', padding=True) 38 | with torch.inference_mode(): 39 | logits = model(**batch).logits 40 | if soft: 41 | result = torch.softmax(logits, -1)[:, 1].cpu().numpy() 42 | else: 43 | result = (logits[:, 1] > args.threshold).cpu().numpy() 44 | results.extend([1 - item for item in result]) 45 | return results 46 | 47 | 48 | def calc_bleu(inputs, preds): 49 | bleu_sim = 0 50 | counter = 0 51 | print('Calculating BLEU similarity') 52 | for i in range(len(inputs)): 53 | if len(inputs[i]) > 3 and len(preds[i]) > 3: 54 | bleu_sim += sentence_bleu([inputs[i]], preds[i]) 55 | counter += 1 56 | 57 | return float(bleu_sim / counter) 58 | 59 | 60 | def wieting_sim(args, inputs, preds): 61 | assert len(inputs) == len(preds) 62 | print('Calculating similarity by Wieting subword-embedding SIM model') 63 | 64 | sim_evaluator = SimilarityEvaluator() 65 | 66 | sim_scores = [] 67 | 68 | for i in tqdm.tqdm(range(0, len(inputs), args.batch_size)): 69 | sim_scores.extend( 70 | sim_evaluator.find_similarity(inputs[i:i + args.batch_size], preds[i:i + args.batch_size]) 71 | ) 72 | 73 | return np.array(sim_scores) 74 | 75 | 76 | def detokenize(x): 77 | return x.replace(" .", ".").replace(" ,", ",").replace(" !", "!").replace(" ?", "?").replace(" )",")").replace("( ", "(") # noqa 78 | 79 | 80 | def do_cola_eval(args, preds, soft=False): 81 | print('Calculating CoLA acceptability stats') 82 | 83 | path_to_data = os.path.join(args.cola_classifier_path, 'cola-bin') 84 | 85 | cola_roberta = RobertaModel.from_pretrained( 86 | args.cola_classifier_path, checkpoint_file=args.cola_checkpoint, data_name_or_path=path_to_data 87 | ) 88 | cola_roberta.eval() 89 | if torch.cuda.is_available(): 90 | cola_roberta.cuda() 91 | 92 | cola_stats = [] 93 | 94 | for i in tqdm.tqdm(range(0, len(preds), args.batch_size), total=len(preds) // args.batch_size): 95 | sentences = preds[i:i + args.batch_size] 96 | 97 | # detokenize and BPE encode input 98 | sentences = [cola_roberta.bpe.encode(detokenize(sent)) for sent in sentences] 99 | 100 | batch = collate_tokens( 101 | [cola_roberta.task.source_dictionary.encode_line(" " + sent + " ", append_eos=False) 102 | for sent in sentences], 103 | pad_idx=1 104 | ) 105 | 106 | batch = batch[:, :512] 107 | 108 | with torch.no_grad(): 109 | predictions = cola_roberta.predict('sentence_classification_head', batch.long()) 110 | 111 | if soft: 112 | prediction_labels = torch.softmax(predictions, axis=1)[:, 1].cpu().numpy() 113 | else: 114 | prediction_labels = predictions.argmax(axis=1).cpu().numpy() 115 | # label 0 means acceptable. Need to inverse 116 | cola_stats.extend(list(1 - prediction_labels)) 117 | 118 | return np.array(cola_stats) 119 | 120 | 121 | def do_cola_eval_transformers(args, preds, soft=False): 122 | print('Calculating CoLA acceptability stats') 123 | path = args.cola_classifier_path 124 | 125 | model = AutoModelForSequenceClassification.from_pretrained(path) 126 | tokenizer = AutoTokenizer.from_pretrained(path) 127 | 128 | results = [] 129 | bs = args.batch_size 130 | for i in trange(0, len(preds), bs): 131 | batch = [detokenize(t) for t in preds[i: i + bs]] 132 | inputs = tokenizer(batch, padding=True, truncation=True, return_tensors='pt').to(model.device) 133 | with torch.no_grad(): 134 | out = torch.softmax(model(**inputs).logits, -1)[:, 0].cpu().numpy() 135 | if soft: 136 | results.append(out) 137 | else: 138 | results.append((out > 0.5).astype(int)) 139 | return np.concatenate(results) 140 | 141 | 142 | if __name__ == "__main__": 143 | parser = argparse.ArgumentParser() 144 | parser.add_argument('-i', "--inputs", help="path to test sentences", required=True) 145 | parser.add_argument('-p', "--preds", help="path to predictions of a model", required=True) 146 | 147 | parser.add_argument("--classifier_path", default='SkolkovoInstitute/roberta_toxicity_classifier') 148 | parser.add_argument("--threshold", default=0.8, type=float) 149 | 150 | parser.add_argument("--cola_classifier_path", default='models/cola') 151 | parser.add_argument("--cola_checkpoint", default='checkpoint_best.pt') 152 | parser.add_argument("--batch_size", default=32, type=int) 153 | 154 | args = parser.parse_args() 155 | 156 | with open(args.inputs, 'r') as input_file, open(args.preds, 'r') as preds_file: 157 | inputs = input_file.readlines() 158 | preds = preds_file.readlines() 159 | 160 | # accuracy of style transfer 161 | accuracy_by_sent = classify_preds(args, preds) 162 | accuracy = sum(accuracy_by_sent)/len(preds) 163 | cleanup() 164 | 165 | # similarity 166 | bleu = calc_bleu(inputs, preds) 167 | 168 | similarity_by_sent = wieting_sim(args, inputs, preds) 169 | avg_sim_by_sent = similarity_by_sent.mean() 170 | cleanup() 171 | 172 | # fluency 173 | cola_stats = do_cola_eval(args, preds) 174 | cola_acc = sum(cola_stats) / len(preds) 175 | cleanup() 176 | 177 | # count metrics 178 | joint = sum(accuracy_by_sent * similarity_by_sent * cola_stats) / len(preds) 179 | 180 | # write res to table 181 | name = args.preds.split('/')[-1] 182 | print('| Model | ACC | SIM | FL | J | BLEU |\n') 183 | print('| ----- | --- | --- | -- | - | ---- |\n') 184 | print(f'{name}|{accuracy:.4f}|{avg_sim_by_sent:.4f}|{cola_acc:.4f}|{joint:.4f}|{bleu:.4f}|\n') 185 | -------------------------------------------------------------------------------- /emnlp2021/metric/wieting_similarity/sim_models.py: -------------------------------------------------------------------------------- 1 | import torch 2 | import torch.nn as nn 3 | from torch.nn.modules.distance import CosineSimilarity 4 | import numpy as np 5 | 6 | 7 | class ParaModel(nn.Module): 8 | 9 | def __init__(self, args, vocab): 10 | super(ParaModel, self).__init__() 11 | 12 | self.args = args 13 | self.vocab = vocab 14 | self.gpu = args.gpu 15 | 16 | self.cosine = CosineSimilarity() 17 | 18 | def compute_mask(self, lengths): 19 | 20 | lengths = lengths.cpu() 21 | max_len = torch.max(lengths) 22 | range_row = torch.arange(0, max_len).long()[None, :].expand(lengths.size()[0], max_len) 23 | mask = lengths[:, None].expand_as(range_row) 24 | mask = range_row < mask 25 | mask = mask.float() 26 | if self.gpu >= 0: 27 | mask = mask.cuda() 28 | return mask 29 | 30 | def torchify_batch(self, batch): 31 | 32 | max_len = 0 33 | for i in batch: 34 | if len(i.embeddings) > max_len: 35 | max_len = len(i.embeddings) 36 | 37 | batch_len = len(batch) 38 | 39 | np_sents = np.zeros((batch_len, max_len), dtype='int32') 40 | np_lens = np.zeros((batch_len,), dtype='int32') 41 | 42 | for i, ex in enumerate(batch): 43 | np_sents[i, :len(ex.embeddings)] = ex.embeddings 44 | np_lens[i] = len(ex.embeddings) 45 | 46 | idxs, lengths, masks = torch.from_numpy(np_sents).long(), \ 47 | torch.from_numpy(np_lens).float().long(), \ 48 | self.compute_mask(torch.from_numpy(np_lens).long()) 49 | 50 | if self.gpu >= 0: 51 | idxs = idxs.cuda() 52 | lengths = lengths.cuda() 53 | masks = masks.cuda() 54 | 55 | return idxs, lengths, masks 56 | 57 | def scoring_function(self, g_idxs1, g_mask1, g_lengths1, g_idxs2, g_mask2, g_lengths2): 58 | 59 | g1 = self.encode(g_idxs1, g_mask1, g_lengths1) 60 | g2 = self.encode(g_idxs2, g_mask2, g_lengths2) 61 | return self.cosine(g1, g2) 62 | 63 | class WordAveraging(ParaModel): 64 | 65 | def __init__(self, args, vocab): 66 | super(WordAveraging, self).__init__(args, vocab) 67 | 68 | self.vocab = vocab 69 | self.embedding = nn.Embedding(len(self.vocab), self.args.dim) 70 | 71 | if args.gpu >= 0: 72 | self.cuda() 73 | 74 | def encode(self, idxs, mask, lengths): 75 | word_embs = self.embedding(idxs) 76 | word_embs = word_embs * mask[:, :, None] 77 | g = word_embs.sum(dim=1) / lengths[:, None].float() 78 | return g -------------------------------------------------------------------------------- /emnlp2021/metric/wieting_similarity/sim_utils.py: -------------------------------------------------------------------------------- 1 | import io 2 | import numpy as np 3 | import torch 4 | 5 | def get_wordmap(textfile): 6 | words={} 7 | We = [] 8 | f = io.open(textfile, 'r', encoding='utf-8') 9 | lines = f.readlines() 10 | if len(lines[0].split()) == 2: 11 | lines.pop(0) 12 | ct = 0 13 | for (n,i) in enumerate(lines): 14 | word = i.split(' ', 1)[0] 15 | vec = i.split(' ', 1)[1].split(' ') 16 | j = 0 17 | v = [] 18 | while j < len(vec): 19 | v.append(float(vec[j])) 20 | j += 1 21 | words[word] = ct 22 | ct += 1 23 | We.append(v) 24 | return words, np.array(We) 25 | 26 | def get_minibatches_idx(n, minibatch_size, shuffle=False): 27 | idx_list = np.arange(n, dtype="int32") 28 | 29 | if shuffle: 30 | np.random.shuffle(idx_list) 31 | 32 | minibatches = [] 33 | minibatch_start = 0 34 | for i in range(n // minibatch_size): 35 | minibatches.append(idx_list[minibatch_start: 36 | minibatch_start + minibatch_size]) 37 | minibatch_start += minibatch_size 38 | 39 | if (minibatch_start != n): 40 | # Make a minibatch out of what is left 41 | minibatches.append(idx_list[minibatch_start:]) 42 | 43 | return zip(range(len(minibatches)), minibatches) 44 | 45 | def max_pool(x, lengths, gpu): 46 | out = torch.FloatTensor(x.size(0), x.size(2)).zero_() 47 | if gpu >= 0: 48 | out = out.cuda() 49 | for i in range(len(lengths)): 50 | out[i] = torch.max(x[i][0:lengths[i]], 0)[0] 51 | return out 52 | 53 | def mean_pool(x, lengths, gpu): 54 | out = torch.FloatTensor(x.size(0), x.size(2)).zero_() 55 | if gpu >= 0: 56 | out = out.cuda() 57 | for i in range(len(lengths)): 58 | out[i] = torch.mean(x[i][0:lengths[i]], 0) 59 | return out 60 | 61 | def lookup(words, w): 62 | w = w.lower() 63 | if w in words: 64 | return words[w] 65 | 66 | class Example(object): 67 | 68 | def __init__(self, sentence): 69 | self.sentence = sentence.strip().lower() 70 | self.embeddings = [] 71 | self.representation = None 72 | 73 | def populate_embeddings(self, words): 74 | sentence = self.sentence.lower() 75 | arr = sentence.split() 76 | for i in arr: 77 | emb = lookup(words, i) 78 | if emb: 79 | self.embeddings.append(emb) 80 | if len(self.embeddings) == 0: 81 | self.embeddings.append(words['UUUNKKK']) -------------------------------------------------------------------------------- /emnlp2021/metric/wieting_similarity/similarity_evaluator.py: -------------------------------------------------------------------------------- 1 | """ 2 | The models should be downloaded from 3 | https://drive.google.com/drive/folders/1lBN2nbzxtpqbPUyeURtzt0k1kBY6u6Mj 4 | The source is http://style.cs.umass.edu 5 | """ 6 | 7 | import torch 8 | import numpy as np 9 | from .sim_models import WordAveraging 10 | from .sim_utils import Example 11 | from nltk.tokenize import TreebankWordTokenizer 12 | import sentencepiece as spm 13 | 14 | 15 | class SimilarityEvaluator: 16 | def __init__( 17 | self, 18 | model_path='models/sim/sim.pt', 19 | tokenizer_path='models/sim/sim.sp.30k.model', 20 | gpu=False 21 | ): 22 | self.model_path = model_path 23 | self.tokenizer_path = tokenizer_path 24 | self.tok = TreebankWordTokenizer() 25 | kw = {} 26 | if not torch.cuda.is_available(): 27 | kw['map_location'] = torch.device('cpu') 28 | model = torch.load(self.model_path, **kw) 29 | state_dict = model['state_dict'] 30 | vocab_words = model['vocab_words'] 31 | args = model['args'] 32 | if gpu is False: 33 | args.gpu = -1 34 | # turn off gpu 35 | self.model = WordAveraging(args, vocab_words) 36 | self.model.load_state_dict(state_dict, strict=True) 37 | self.sp = spm.SentencePieceProcessor() 38 | self.sp.Load(self.tokenizer_path) 39 | self.model.eval() 40 | 41 | def make_example(self, sentence): 42 | sentence = sentence.lower() 43 | sentence = " ".join(self.tok.tokenize(sentence)) 44 | sentence = self.sp.EncodeAsPieces(sentence) 45 | wp1 = Example(" ".join(sentence)) 46 | wp1.populate_embeddings(self.model.vocab) 47 | return wp1 48 | 49 | def find_similarity(self, s1, s2): 50 | with torch.no_grad(): 51 | s1 = [self.make_example(x) for x in s1] 52 | s2 = [self.make_example(x) for x in s2] 53 | wx1, wl1, wm1 = self.model.torchify_batch(s1) 54 | wx2, wl2, wm2 = self.model.torchify_batch(s2) 55 | scores = self.model.scoring_function(wx1, wm1, wl1, wx2, wm2, wl2) 56 | return [x.item() for x in scores] 57 | 58 | def find_similarity_batched(self, inputs, preds, batch_size=32): 59 | assert len(inputs) == len(preds) 60 | sim_scores = [] 61 | for i in range(0, len(inputs), batch_size): 62 | sim_scores.extend( 63 | self.find_similarity(inputs[i:i + batch_size], preds[i:i + batch_size]) 64 | ) 65 | return np.array(sim_scores) 66 | 67 | def embed_texts(self, texts, batch_size=128): 68 | result = [] 69 | for i in range(0, len(texts), batch_size): 70 | wx, wl, wm = self.model.torchify_batch([self.make_example(x) for x in texts[i:i+batch_size]]) 71 | with torch.no_grad(): 72 | tensors = torch.nn.functional.normalize(self.model.encode(wx, wm, wl)) 73 | result.append(tensors.cpu().numpy()) 74 | return np.concatenate(result) 75 | -------------------------------------------------------------------------------- /emnlp2021/metric/wieting_similarity/spm.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /emnlp2021/metric/wieting_similarity/test_sim.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from wieting_similarity.sim_models import WordAveraging 3 | from wieting_similarity.sim_utils import Example 4 | from nltk.tokenize import TreebankWordTokenizer 5 | import sentencepiece as spm 6 | 7 | tok = TreebankWordTokenizer() 8 | 9 | model = torch.load('/home/voronov/projects/style-transfer-paraphrase/style_paraphrase/evaluation/similarity/sim.pt') 10 | state_dict = model['state_dict'] 11 | vocab_words = model['vocab_words'] 12 | args = model['args'] 13 | # turn off gpu 14 | model = WordAveraging(args, vocab_words) 15 | model.load_state_dict(state_dict, strict=True) 16 | sp = spm.SentencePieceProcessor() 17 | sp.Load('/home/voronov/projects/style-transfer-paraphrase/style_paraphrase/evaluation/similarity/sim.sp.30k.model') 18 | model.eval() 19 | 20 | def make_example(sentence, model): 21 | sentence = sentence.lower() 22 | sentence = " ".join(tok.tokenize(sentence)) 23 | sentence = sp.EncodeAsPieces(sentence) 24 | wp1 = Example(" ".join(sentence)) 25 | wp1.populate_embeddings(model.vocab) 26 | return wp1 27 | 28 | def find_similarity(s1, s2): 29 | with torch.no_grad(): 30 | s1 = [make_example(x, model) for x in s1] 31 | s2 = [make_example(x, model) for x in s2] 32 | wx1, wl1, wm1 = model.torchify_batch(s1) 33 | wx2, wl2, wm2 = model.torchify_batch(s2) 34 | scores = model.scoring_function(wx1, wm1, wl1, wx2, wm2, wl2) 35 | return [x.item() for x in scores] 36 | 37 | # s1 = "the dog ran outsideddd." 38 | # s2 = "the puppy escape into the trees." 39 | # print(find_similarity([s1, s2], [s2, s2])) 40 | -------------------------------------------------------------------------------- /emnlp2021/prepare.sh: -------------------------------------------------------------------------------- 1 | mkdir -p models 2 | 3 | # DOWNLOAD THE MODELS FOR EVALUATION 4 | # download the CoLA model from http://style.cs.umass.edu/ 5 | # using several runs because gdown does not preserve folders 6 | mkdir -p models/cola 7 | mkdir -p models/cola/cola-bin 8 | gdown https://drive.google.com/drive/folders/18G5ZfLRKTMlV0Ke4shztUlGZ6H8j8AD6 -O models/cola/cola-bin/input0 --folder 9 | gdown https://drive.google.com/drive/folders/1UlU0g9HUerK0xW8B_MKV08DkIPEs44-Q -O models/cola/cola-bin/label --folder 10 | gdown --id 1rBSrbL_6gfDqOCpNz1JrOJK9gKrkKxhv -O models/cola/checkpoint_best.pt 11 | # download the similarity model from http://style.cs.umass.edu/ 12 | gdown https://drive.google.com/drive/folders/1lBN2nbzxtpqbPUyeURtzt0k1kBY6u6Mj -O models/sim --folder 13 | 14 | # DOWNLOAD THE MODELS FOR INFERENCE 15 | 16 | -------------------------------------------------------------------------------- /emnlp2021/style_transfer/condBERT/choosers.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | from flair.data import Sentence 4 | from flair.embeddings import WordEmbeddings 5 | 6 | 7 | def cosine(v1, v2): 8 | return np.dot(v1, v2) / np.sqrt(sum(v1**2) * sum(v2**2) + 1e-10) 9 | 10 | 11 | class EmbeddingSimilarityChooser: 12 | def __init__(self, sim_coef=100, tokenizer=None): 13 | self.glove_embedding = WordEmbeddings('glove') 14 | self.sim_coef = sim_coef 15 | self.tokenizer = tokenizer 16 | 17 | def embed(self, text): 18 | toks = self.glove_embedding.embed(Sentence(text))[0] 19 | if not toks: 20 | return np.zeros(self.glove_embedding.embedding_length) 21 | return np.mean([t.embedding.cpu().numpy() for t in toks], axis=0) 22 | 23 | def decode(self, tokens): 24 | if isinstance(tokens, str): 25 | return tokens 26 | if self.tokenizer: 27 | return self.tokenizer.convert_tokens_to_string(tokens) 28 | return ' '.join(tokens).replace(' ##', '') 29 | 30 | def __call__(self, hypotheses, original=None, scores=None, **kwargs): 31 | e = self.embed(self.decode(original)) 32 | candidates = [ 33 | (fill_words, score, cosine(e, self.embed(self.decode(fill_words)))) 34 | for fill_words, score in zip(hypotheses, scores) 35 | ] 36 | candidates = sorted(candidates, key=lambda x: x[1] + x[2] * self.sim_coef, reverse=True) 37 | return candidates[0][0] 38 | -------------------------------------------------------------------------------- /emnlp2021/style_transfer/condBERT/condbert.py: -------------------------------------------------------------------------------- 1 | import torch 2 | 3 | from collections import defaultdict 4 | 5 | 6 | def group_by_first_token(texts, tokenizer): 7 | seqs = [tokenizer.encode(x, add_special_tokens=False) for x in texts] 8 | grouped = defaultdict(list) 9 | for seq in seqs: 10 | grouped[seq[0]].append(seq) 11 | return grouped 12 | 13 | 14 | def default_chooser(hypotheses, original=None, **kwargs): 15 | return hypotheses[0] 16 | 17 | 18 | class CondBertRewriter: 19 | def __init__( 20 | self, 21 | model, 22 | tokenizer, 23 | device, 24 | neg_words, 25 | pos_words, 26 | word2coef, 27 | token_toxicities, 28 | predictor=None, 29 | ): 30 | self.model = model 31 | self.tokenizer = tokenizer 32 | self.device = device 33 | self.neg_words = neg_words 34 | self.pos_words = pos_words 35 | self.word2coef = word2coef 36 | self.token_toxicities = token_toxicities 37 | self.predictor = predictor 38 | 39 | # calculated properties 40 | self.v = {v: k for k, v in tokenizer.vocab.items()} 41 | self.device_toxicities = torch.tensor(token_toxicities).to(self.device) 42 | 43 | self.neg_complex_tokens = group_by_first_token(neg_words, self.tokenizer) 44 | self.pos_complex_tokens = group_by_first_token(pos_words, self.tokenizer) 45 | self.mask_index = self.tokenizer.convert_tokens_to_ids("[MASK]") 46 | 47 | def toks_to_words(self, token_ids): 48 | """ Merge subword tokens into whole words """ 49 | indices = [] 50 | for i, token_id in enumerate(token_ids): 51 | token_text = self.v[token_id] 52 | if token_text.startswith('##'): 53 | indices.append(i) 54 | else: 55 | if indices: 56 | toks = [self.v[token_ids[t]] for t in indices] 57 | word = ''.join([toks[0]] + [t[2:] for t in toks[1:]]) 58 | yield indices, word 59 | indices = [i] 60 | 61 | def get_mask_fast( 62 | self, 63 | inp: str, 64 | bad_words=None, 65 | min_bad_score=0, 66 | aggressive=True, 67 | max_score_margin=0.5, 68 | label=0, # the desired label (masking tokens of the opposite label) 69 | ): 70 | if bad_words is None: 71 | if label == 0: 72 | bad_words = self.neg_complex_tokens 73 | else: 74 | bad_words = self.pos_complex_tokens 75 | 76 | sentences = [self.tokenizer.encode(inp, add_special_tokens=True)] 77 | sentences_torch = torch.tensor(sentences) 78 | masks = torch.zeros_like(sentences_torch) 79 | 80 | for sent_id, sent in enumerate(sentences): 81 | for first_tok_id, tok in enumerate(sent): 82 | for hypothesis in bad_words.get(tok, []): 83 | n = len(hypothesis) 84 | if sent[first_tok_id: (first_tok_id + n)] == hypothesis: 85 | for step in range(n): 86 | masks[sent_id, first_tok_id + step] = 1 87 | # if a word has toxic prefix, it is all toxic, so we should label its suffix as well 88 | for offset, next_token in enumerate(sent[(first_tok_id + n):]): 89 | if self.tokenizer.convert_ids_to_tokens(next_token).startswith('##'): 90 | masks[sent_id, first_tok_id + n + offset] = 1 91 | else: 92 | break 93 | if sum(masks[sent_id].numpy()) == 0 or aggressive: 94 | scored_words = [] 95 | for indices, word in self.toks_to_words(sent): 96 | score = self.word2coef.get(word, 0) * (1 - 2 * label) 97 | if score: 98 | scored_words.append([indices, word, score]) 99 | if scored_words: 100 | max_score = max(s[2] for s in scored_words) 101 | if max_score > min_bad_score: 102 | for indices, word, score in scored_words: 103 | if score >= max(min_bad_score, max_score * max_score_margin): 104 | masks[sent_id, indices] = 1 105 | 106 | return sentences_torch, masks 107 | 108 | def translate( 109 | self, 110 | ss, 111 | get_mask=None, 112 | label=0, 113 | prnt=True, 114 | raw=False, 115 | toxicity_penalty=15, 116 | contrast_penalty=0, 117 | mask_toxic=False, 118 | duplicate=False, 119 | ): 120 | if get_mask is None: 121 | get_mask = self.get_mask_fast 122 | if prnt: 123 | print(ss) 124 | if label == 0: 125 | input_ids, attn_mask = get_mask(ss, bad_words=self.neg_complex_tokens, label=label) 126 | else: 127 | input_ids, attn_mask = get_mask(ss, bad_words=self.pos_complex_tokens, label=label) 128 | 129 | if attn_mask.sum().numpy() == 0: 130 | return ss 131 | 132 | masked = torch.ones_like(input_ids) * -100 133 | for i in range(input_ids.shape[0]): 134 | masked[i][attn_mask[i] == 1] = input_ids[i][attn_mask[i] == 1] 135 | if duplicate: 136 | input_ids = torch.cat([input_ids, input_ids], axis=1) 137 | attn_mask = torch.cat([torch.zeros_like(attn_mask), attn_mask], axis=1) 138 | if mask_toxic: 139 | input_ids[i][attn_mask[i] == 1] = self.mask_index 140 | 141 | # masked = masked.to(self.device) 142 | 143 | input_ids = input_ids.to(self.device) 144 | 145 | self.model.eval() 146 | 147 | outputs = self.model( 148 | input_ids, 149 | token_type_ids=torch.ones_like(input_ids) * label, 150 | ) 151 | if contrast_penalty: 152 | neg_outputs = self.model( 153 | input_ids, 154 | token_type_ids=torch.ones_like(input_ids) * (1-label), 155 | ) 156 | else: 157 | neg_outputs = None 158 | if raw: 159 | return outputs[0] 160 | for i in range(input_ids.shape[0]): 161 | logits = outputs[-1][i][attn_mask[i] == 1] 162 | if toxicity_penalty: 163 | logits -= self.device_toxicities * toxicity_penalty * (1 - 2 * label) 164 | if contrast_penalty: 165 | neg_logits = neg_outputs[-1][i][attn_mask[i] == 1] 166 | scores = torch.softmax(logits, -1) - torch.softmax(neg_logits, -1) * contrast_penalty 167 | else: 168 | scores = logits 169 | input_ids[i][attn_mask[i] == 1] = scores.argmax(dim=1) 170 | 171 | result = self.tokenizer.convert_tokens_to_string( 172 | [self.tokenizer.convert_ids_to_tokens(i.item()) for i in input_ids[0][1:-1]] 173 | ) 174 | return result.split('[SEP] [CLS] ')[-1] 175 | 176 | def convert_mask(self, tok_ids, mask_ids, duplicate=False, start_from=0): 177 | # find the first masked word, keep only its first token, get its position 178 | toks_tmp = [self.tokenizer.convert_ids_to_tokens(tok_ids[0])[1:-1]] 179 | mask_pos = None 180 | toks = [] 181 | mask_toks = [] 182 | has_mask = False 183 | for i, is_masked in enumerate(mask_ids[0][1:-1]): 184 | tok = toks_tmp[0][i] 185 | if not has_mask: 186 | if is_masked and i >= start_from and not tok.startswith('##'): 187 | has_mask = True 188 | mask_pos = [i] 189 | mask_toks.append(tok) 190 | toks.append(tok) 191 | else: 192 | if not is_masked or not tok.startswith('##'): 193 | toks.extend(toks_tmp[0][i:]) 194 | break 195 | else: 196 | mask_toks.append(tok) 197 | toks = [toks] 198 | 199 | if duplicate: 200 | toks = [toks_tmp[0] + ['[SEP]'] + toks[0]] 201 | mask_pos[0] += len(toks_tmp[0]) + 1 202 | return toks, mask_pos, mask_toks 203 | 204 | def replacement_loop( 205 | self, 206 | text, 207 | span_detector=None, 208 | predictor=None, 209 | verbose=True, 210 | chooser=default_chooser, 211 | n_tokens=(1, 2, 3), 212 | n_top=10, 213 | mask_token=False, 214 | max_steps=1000, 215 | label=0, 216 | **predictor_args, 217 | ): 218 | if span_detector is None: 219 | span_detector = self.get_mask_fast 220 | if predictor is None: 221 | predictor = self.predictor 222 | new_text = text 223 | look_from = 0 224 | 225 | for i in range(max_steps): 226 | tok_ids, mask_ids = span_detector(new_text, label=label) 227 | if not sum(mask_ids[0][(1 + look_from):]): 228 | break 229 | toks, mask_pos, mask_toks = self.convert_mask( 230 | tok_ids, mask_ids, duplicate=False, start_from=look_from 231 | ) 232 | if mask_pos is None: 233 | return new_text 234 | texts, scores = predictor.generate( 235 | toks, 236 | mask_pos, 237 | n_tokens=list(n_tokens), 238 | n_top=n_top, 239 | fix_multiunit=False, 240 | mask_token=mask_token, 241 | label=label, 242 | **predictor_args 243 | ) 244 | old_replacement = chooser(hypotheses=texts[0], scores=scores[0], original=mask_toks) 245 | if isinstance(old_replacement, str): 246 | old_replacement = [old_replacement] 247 | replacement = [t for w in old_replacement for t in w.split('_')] 248 | if verbose: 249 | print(mask_toks, '->', replacement) 250 | new_toks = toks[0][:mask_pos[0]] + replacement + toks[0][mask_pos[0] + 1:] 251 | new_text = self.tokenizer.convert_tokens_to_string(new_toks) 252 | look_from = mask_pos[0] + len(old_replacement) 253 | # we could add len(replacement), but sometimes its tokens glue together like 'mental' + '##ly' 254 | # on the other hand, we want to skip light_##hearted fully 255 | return new_text 256 | 257 | def get_mlm_contrast(self, text, label=0, logs=True): 258 | """ Calculate difference in log likelihood (or likelihood) of a text under different token_type_ids """ 259 | inputs = self.tokenizer(text, return_tensors='pt') 260 | inputs['token_type_ids'] = inputs['token_type_ids'] * 0 + label 261 | inputs = {k: v.to(self.device) for k, v in inputs.items()} 262 | 263 | with torch.no_grad(): 264 | scores = self.model(**inputs, ) 265 | pos_ll = get_ll(inputs, scores) 266 | 267 | inputs['token_type_ids'] = 1 - inputs['token_type_ids'] 268 | with torch.no_grad(): 269 | scores2 = self.model(**inputs, ) 270 | neg_ll = get_ll(inputs, scores2) 271 | 272 | if logs: 273 | diff = pos_ll - neg_ll 274 | else: 275 | diff = torch.exp(pos_ll) - torch.exp(neg_ll) 276 | 277 | return diff 278 | 279 | 280 | def get_ll(inputs, results): 281 | """ Calculate log likelihood of a sequence """ 282 | probs = torch.log_softmax(results.logits, dim=-1) 283 | seq = inputs['input_ids'] 284 | proba = torch.gather(probs, 2, seq[:, :, None]).squeeze(-1) 285 | return proba * inputs['attention_mask'] -------------------------------------------------------------------------------- /emnlp2021/style_transfer/condBERT/condbert_compile_vocab.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This notebook reproduces creation of CondBERT vocabulary.\n", 8 | "\n", 9 | "The files `positive-words.txt`, `negative-words.txt` and `toxic_words.txt` are not reproduced exactly because of our internal issues. \n", 10 | "\n", 11 | "However, all other files (`token_toxicities.txt` and `word2coef.pkl` ) are reproduced accurately. " 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "# 0. Prerequisites" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 1, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "VOCAB_DIRNAME = 'vocabularies' " 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 3, 33 | "metadata": {}, 34 | "outputs": [], 35 | "source": [ 36 | "from condbert import CondBertRewriter\n", 37 | "from choosers import EmbeddingSimilarityChooser\n", 38 | "from multiword.masked_token_predictor_bert import MaskedTokenPredictorBert" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "# 1. Loading BERT" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 6, 51 | "metadata": {}, 52 | "outputs": [], 53 | "source": [ 54 | "import torch\n", 55 | "from transformers import BertTokenizer, BertForMaskedLM\n", 56 | "import numpy as np\n", 57 | "import pickle\n", 58 | "import os\n", 59 | "from tqdm.auto import tqdm, trange" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 7, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "os.environ['CUDA_VISIBLE_DEVICES'] = '0'\n", 69 | "device = torch.device('cuda:0')\n", 70 | "device = torch.device('cpu')" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 9, 76 | "metadata": {}, 77 | "outputs": [], 78 | "source": [ 79 | "model_name = 'bert-base-uncased'\n", 80 | "tokenizer = BertTokenizer.from_pretrained(model_name)" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 10, 86 | "metadata": {}, 87 | "outputs": [ 88 | { 89 | "name": "stderr", 90 | "output_type": "stream", 91 | "text": [ 92 | "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']\n", 93 | "- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", 94 | "- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n" 95 | ] 96 | } 97 | ], 98 | "source": [ 99 | "model = BertForMaskedLM.from_pretrained(model_name)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 11, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "model.to(device);" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "# 2. Preparing the vocabularires." 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "\n", 123 | "- negative-words.txt\n", 124 | "- positive-words.txt\n", 125 | "- word2coef.pkl\n", 126 | "- token_toxicities.txt\n", 127 | "\n", 128 | "These files should be prepared once. " 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 12, 134 | "metadata": {}, 135 | "outputs": [], 136 | "source": [ 137 | "tox_corpus_path = '../../data/train/train_toxic'\n", 138 | "norm_corpus_path = '../../data/train/train_normal'" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 13, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "if not os.path.exists(VOCAB_DIRNAME):\n", 148 | " os.makedirs(VOCAB_DIRNAME)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "### 2.1 Preparing the DRG-like vocabularies" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 14, 161 | "metadata": {}, 162 | "outputs": [], 163 | "source": [ 164 | "import os\n", 165 | "import argparse\n", 166 | "import numpy as np\n", 167 | "from tqdm import tqdm\n", 168 | "from nltk import ngrams\n", 169 | "from sklearn.linear_model import LogisticRegression\n", 170 | "from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer\n", 171 | "\n", 172 | "\n", 173 | "\n", 174 | "class NgramSalienceCalculator():\n", 175 | " def __init__(self, tox_corpus, norm_corpus, use_ngrams=False):\n", 176 | " ngrams = (1, 3) if use_ngrams else (1, 1)\n", 177 | " self.vectorizer = CountVectorizer(ngram_range=ngrams)\n", 178 | "\n", 179 | " tox_count_matrix = self.vectorizer.fit_transform(tox_corpus)\n", 180 | " self.tox_vocab = self.vectorizer.vocabulary_\n", 181 | " self.tox_counts = np.sum(tox_count_matrix, axis=0)\n", 182 | "\n", 183 | " norm_count_matrix = self.vectorizer.fit_transform(norm_corpus)\n", 184 | " self.norm_vocab = self.vectorizer.vocabulary_\n", 185 | " self.norm_counts = np.sum(norm_count_matrix, axis=0)\n", 186 | "\n", 187 | " def salience(self, feature, attribute='tox', lmbda=0.5):\n", 188 | " assert attribute in ['tox', 'norm']\n", 189 | " if feature not in self.tox_vocab:\n", 190 | " tox_count = 0.0\n", 191 | " else:\n", 192 | " tox_count = self.tox_counts[0, self.tox_vocab[feature]]\n", 193 | "\n", 194 | " if feature not in self.norm_vocab:\n", 195 | " norm_count = 0.0\n", 196 | " else:\n", 197 | " norm_count = self.norm_counts[0, self.norm_vocab[feature]]\n", 198 | "\n", 199 | " if attribute == 'tox':\n", 200 | " return (tox_count + lmbda) / (norm_count + lmbda)\n", 201 | " else:\n", 202 | " return (norm_count + lmbda) / (tox_count + lmbda)\n" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 15, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "88645\n" 215 | ] 216 | } 217 | ], 218 | "source": [ 219 | "from collections import Counter\n", 220 | "c = Counter()\n", 221 | "\n", 222 | "for fn in [tox_corpus_path, norm_corpus_path]:\n", 223 | " with open(fn, 'r') as corpus:\n", 224 | " for line in corpus.readlines():\n", 225 | " for tok in line.strip().split():\n", 226 | " c[tok] += 1\n", 227 | "\n", 228 | "print(len(c))" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 16, 234 | "metadata": {}, 235 | "outputs": [ 236 | { 237 | "name": "stdout", 238 | "output_type": "stream", 239 | "text": [ 240 | "88645\n" 241 | ] 242 | } 243 | ], 244 | "source": [ 245 | "vocab = {w for w, _ in c.most_common() if _ > 0} # if we took words with > 1 occurences, vocabulary would be x2 smaller, but we'll survive this size\n", 246 | "print(len(vocab))" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 17, 252 | "metadata": {}, 253 | "outputs": [], 254 | "source": [ 255 | "with open(tox_corpus_path, 'r') as tox_corpus, open(norm_corpus_path, 'r') as norm_corpus:\n", 256 | " corpus_tox = [' '.join([w if w in vocab else '' for w in line.strip().split()]) for line in tox_corpus.readlines()]\n", 257 | " corpus_norm = [' '.join([w if w in vocab else '' for w in line.strip().split()]) for line in norm_corpus.readlines()]" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 18, 263 | "metadata": {}, 264 | "outputs": [], 265 | "source": [ 266 | "neg_out_name = VOCAB_DIRNAME + '/negative-words.txt'\n", 267 | "pos_out_name = VOCAB_DIRNAME + '/positive-words.txt'" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 20, 273 | "metadata": {}, 274 | "outputs": [], 275 | "source": [ 276 | "threshold = 4" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 21, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "sc = NgramSalienceCalculator(corpus_tox, corpus_norm, False)\n", 286 | "seen_grams = set()\n", 287 | "\n", 288 | "with open(neg_out_name, 'w') as neg_out, open(pos_out_name, 'w') as pos_out:\n", 289 | " for gram in set(sc.tox_vocab.keys()).union(set(sc.norm_vocab.keys())):\n", 290 | " if gram not in seen_grams:\n", 291 | " seen_grams.add(gram)\n", 292 | " toxic_salience = sc.salience(gram, attribute='tox')\n", 293 | " polite_salience = sc.salience(gram, attribute='norm')\n", 294 | " if toxic_salience > threshold:\n", 295 | " neg_out.writelines(f'{gram}\\n')\n", 296 | " elif polite_salience > threshold:\n", 297 | " pos_out.writelines(f'{gram}\\n')" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "## 2.2 Evaluating word toxicities with a logistic regression" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 22, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "from sklearn.pipeline import make_pipeline\n", 314 | "pipe = make_pipeline(CountVectorizer(), LogisticRegression(max_iter=1000))" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": 23, 320 | "metadata": {}, 321 | "outputs": [], 322 | "source": [ 323 | "X_train = corpus_tox + corpus_norm\n", 324 | "y_train = [1] * len(corpus_tox) + [0] * len(corpus_norm)\n", 325 | "pipe.fit(X_train, y_train);" 326 | ] 327 | }, 328 | { 329 | "cell_type": "code", 330 | "execution_count": 24, 331 | "metadata": {}, 332 | "outputs": [ 333 | { 334 | "data": { 335 | "text/plain": [ 336 | "(88519,)" 337 | ] 338 | }, 339 | "execution_count": 24, 340 | "metadata": {}, 341 | "output_type": "execute_result" 342 | } 343 | ], 344 | "source": [ 345 | "coefs = pipe[1].coef_[0]\n", 346 | "coefs.shape" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": 25, 352 | "metadata": {}, 353 | "outputs": [], 354 | "source": [ 355 | "word2coef = {w: coefs[idx] for w, idx in pipe[0].vocabulary_.items()}" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 26, 361 | "metadata": {}, 362 | "outputs": [], 363 | "source": [ 364 | "import pickle\n", 365 | "with open(VOCAB_DIRNAME + '/word2coef.pkl', 'wb') as f:\n", 366 | " pickle.dump(word2coef, f)" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "## 2.3 Labelling BERT tokens by toxicity" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": 27, 379 | "metadata": {}, 380 | "outputs": [ 381 | { 382 | "name": "stderr", 383 | "output_type": "stream", 384 | "text": [ 385 | "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 135390/135390 [00:43<00:00, 3088.85it/s]\n", 386 | "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 135390/135390 [00:45<00:00, 2977.70it/s]\n" 387 | ] 388 | } 389 | ], 390 | "source": [ 391 | "from collections import defaultdict\n", 392 | "toxic_counter = defaultdict(lambda: 1)\n", 393 | "nontoxic_counter = defaultdict(lambda: 1)\n", 394 | "\n", 395 | "for text in tqdm(corpus_tox):\n", 396 | " for token in tokenizer.encode(text):\n", 397 | " toxic_counter[token] += 1\n", 398 | "for text in tqdm(corpus_norm):\n", 399 | " for token in tokenizer.encode(text):\n", 400 | " nontoxic_counter[token] += 1" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 28, 406 | "metadata": {}, 407 | "outputs": [], 408 | "source": [ 409 | "token_toxicities = [toxic_counter[i] / (nontoxic_counter[i] + toxic_counter[i]) for i in range(len(tokenizer.vocab))]" 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": 29, 415 | "metadata": {}, 416 | "outputs": [], 417 | "source": [ 418 | "with open(VOCAB_DIRNAME + '/token_toxicities.txt', 'w') as f:\n", 419 | " for t in token_toxicities:\n", 420 | " f.write(str(t))\n", 421 | " f.write('\\n')" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": {}, 427 | "source": [ 428 | "# 3. Setting up the model" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "### 3.1 Loading the vocabularies" 436 | ] 437 | }, 438 | { 439 | "cell_type": "code", 440 | "execution_count": 30, 441 | "metadata": {}, 442 | "outputs": [], 443 | "source": [ 444 | "with open(VOCAB_DIRNAME + \"/negative-words.txt\", \"r\") as f:\n", 445 | " s = f.readlines()\n", 446 | "negative_words = list(map(lambda x: x[:-1], s))\n", 447 | "\n", 448 | "with open(VOCAB_DIRNAME + \"/positive-words.txt\", \"r\") as f:\n", 449 | " s = f.readlines()\n", 450 | "positive_words = list(map(lambda x: x[:-1], s))" 451 | ] 452 | }, 453 | { 454 | "cell_type": "code", 455 | "execution_count": 31, 456 | "metadata": {}, 457 | "outputs": [], 458 | "source": [ 459 | "import pickle\n", 460 | "with open(VOCAB_DIRNAME + '/word2coef.pkl', 'rb') as f:\n", 461 | " word2coef = pickle.load(f)" 462 | ] 463 | }, 464 | { 465 | "cell_type": "code", 466 | "execution_count": 32, 467 | "metadata": {}, 468 | "outputs": [], 469 | "source": [ 470 | "token_toxicities = []\n", 471 | "with open(VOCAB_DIRNAME + '/token_toxicities.txt', 'r') as f:\n", 472 | " for line in f.readlines():\n", 473 | " token_toxicities.append(float(line))\n", 474 | "token_toxicities = np.array(token_toxicities)\n", 475 | "token_toxicities = np.maximum(0, np.log(1/(1/token_toxicities-1))) # log odds ratio\n", 476 | "\n", 477 | "# discourage meaningless tokens\n", 478 | "for tok in ['.', ',', '-']:\n", 479 | " token_toxicities[tokenizer.encode(tok)][1] = 3\n", 480 | "\n", 481 | "for tok in ['you']:\n", 482 | " token_toxicities[tokenizer.encode(tok)][1] = 0" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": 38, 488 | "metadata": {}, 489 | "outputs": [], 490 | "source": [ 491 | "def adjust_logits(logits, label=0):\n", 492 | " return logits - token_toxicities * 100 * (1 - 2 * label)\n", 493 | "\n", 494 | "predictor = MaskedTokenPredictorBert(model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0, logits_postprocessor=adjust_logits)\n", 495 | "\n", 496 | "editor = CondBertRewriter(\n", 497 | " model=model,\n", 498 | " tokenizer=tokenizer,\n", 499 | " device=device,\n", 500 | " neg_words=negative_words,\n", 501 | " pos_words=positive_words,\n", 502 | " word2coef=word2coef,\n", 503 | " token_toxicities=token_toxicities,\n", 504 | " predictor=predictor,\n", 505 | ")" 506 | ] 507 | }, 508 | { 509 | "cell_type": "markdown", 510 | "metadata": {}, 511 | "source": [ 512 | "The model below is used for reranking BERT hypotheses and helps to increase semantic similarity by choosing the hypotheses with embeddings similar to the orignal words. " 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": 34, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [ 521 | "chooser = EmbeddingSimilarityChooser(sim_coef=10, tokenizer=tokenizer)" 522 | ] 523 | }, 524 | { 525 | "cell_type": "markdown", 526 | "metadata": {}, 527 | "source": [ 528 | "# 4. Finally, the inference" 529 | ] 530 | }, 531 | { 532 | "cell_type": "markdown", 533 | "metadata": {}, 534 | "source": [ 535 | "Parallel application of the model to all tokens, fast, but dirty. " 536 | ] 537 | }, 538 | { 539 | "cell_type": "code", 540 | "execution_count": 35, 541 | "metadata": {}, 542 | "outputs": [ 543 | { 544 | "name": "stdout", 545 | "output_type": "stream", 546 | "text": [ 547 | "you are mistake !\n" 548 | ] 549 | } 550 | ], 551 | "source": [ 552 | "print(editor.translate('You are idiot!', prnt=False))" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": {}, 558 | "source": [ 559 | "Application of the model to all the tokens sequentially, in the multiword mode. " 560 | ] 561 | }, 562 | { 563 | "cell_type": "code", 564 | "execution_count": 39, 565 | "metadata": {}, 566 | "outputs": [ 567 | { 568 | "name": "stdout", 569 | "output_type": "stream", 570 | "text": [ 571 | "you are very beautiful !\n" 572 | ] 573 | } 574 | ], 575 | "source": [ 576 | "print(editor.replacement_loop('You are stupid!', verbose=False, chooser=chooser, n_tokens=(1, 2, 3), n_top=10))" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "Parameters that could be tuned:\n", 584 | "* The coeffincient in `adjust_logits` - the larger it is, the more the model avoids toxic words\n", 585 | "* The coefficient in `EmbeddingSimilarityChooser` - the larger it is, the more the model tries to preserve content \n", 586 | "* n_tokens - how many words can be generated from one\n", 587 | "* n_top - how many BERT hypotheses are reranked" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": null, 593 | "metadata": {}, 594 | "outputs": [], 595 | "source": [] 596 | } 597 | ], 598 | "metadata": { 599 | "kernelspec": { 600 | "display_name": "Python 3 (ipykernel)", 601 | "language": "python", 602 | "name": "python3" 603 | }, 604 | "language_info": { 605 | "codemirror_mode": { 606 | "name": "ipython", 607 | "version": 3 608 | }, 609 | "file_extension": ".py", 610 | "mimetype": "text/x-python", 611 | "name": "python", 612 | "nbconvert_exporter": "python", 613 | "pygments_lexer": "ipython3", 614 | "version": "3.9.6" 615 | } 616 | }, 617 | "nbformat": 4, 618 | "nbformat_minor": 4 619 | } -------------------------------------------------------------------------------- /emnlp2021/style_transfer/condBERT/condbert_inference.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This notebook" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "import os\n", 17 | "import sys\n", 18 | "\n", 19 | "\n", 20 | "def add_sys_path(p):\n", 21 | " p = os.path.abspath(p)\n", 22 | " print(p)\n", 23 | " if p not in sys.path:\n", 24 | " sys.path.append(p)" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 2, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "os.environ['CUDA_VISIBLE_DEVICES'] = '0'" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 3, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "from importlib import reload" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 4, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "import condbert\n", 52 | "reload(condbert)\n", 53 | "from condbert import CondBertRewriter" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": 5, 59 | "metadata": {}, 60 | "outputs": [], 61 | "source": [ 62 | "import torch\n", 63 | "from transformers import BertTokenizer, BertForMaskedLM\n", 64 | "import numpy as np\n", 65 | "import pickle\n", 66 | "from tqdm.auto import tqdm, trange" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 6, 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "device = torch.device('cuda:0')" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": {}, 81 | "source": [ 82 | "### Load the model" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 7, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "model_name = 'bert-base-uncased'\n", 92 | "tokenizer = BertTokenizer.from_pretrained(model_name)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "code", 97 | "execution_count": 8, 98 | "metadata": {}, 99 | "outputs": [ 100 | { 101 | "name": "stderr", 102 | "output_type": "stream", 103 | "text": [ 104 | "Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']\n", 105 | "- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n", 106 | "- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n" 107 | ] 108 | } 109 | ], 110 | "source": [ 111 | "model = BertForMaskedLM.from_pretrained(model_name)" 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 9, 117 | "metadata": {}, 118 | "outputs": [], 119 | "source": [ 120 | "model.to(device);" 121 | ] 122 | }, 123 | { 124 | "cell_type": "markdown", 125 | "metadata": {}, 126 | "source": [ 127 | "#### Load vocabularies for spans detection" 128 | ] 129 | }, 130 | { 131 | "cell_type": "code", 132 | "execution_count": 10, 133 | "metadata": {}, 134 | "outputs": [], 135 | "source": [ 136 | "vocab_root = 'vocab/'" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 11, 142 | "metadata": {}, 143 | "outputs": [], 144 | "source": [ 145 | "with open(vocab_root + \"negative-words.txt\", \"r\") as f:\n", 146 | " s = f.readlines()\n", 147 | "negative_words = list(map(lambda x: x[:-1], s))\n", 148 | "with open(vocab_root + \"toxic_words.txt\", \"r\") as f:\n", 149 | " ss = f.readlines()\n", 150 | "negative_words += list(map(lambda x: x[:-1], ss))\n", 151 | "\n", 152 | "with open(vocab_root + \"positive-words.txt\", \"r\") as f:\n", 153 | " s = f.readlines()\n", 154 | "positive_words = list(map(lambda x: x[:-1], s))" 155 | ] 156 | }, 157 | { 158 | "cell_type": "code", 159 | "execution_count": 12, 160 | "metadata": {}, 161 | "outputs": [], 162 | "source": [ 163 | "import pickle\n", 164 | "with open(vocab_root + 'word2coef.pkl', 'rb') as f:\n", 165 | " word2coef = pickle.load(f)" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 13, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "token_toxicities = []\n", 175 | "with open(vocab_root + 'token_toxicities.txt', 'r') as f:\n", 176 | " for line in f.readlines():\n", 177 | " token_toxicities.append(float(line))\n", 178 | "token_toxicities = np.array(token_toxicities)\n", 179 | "token_toxicities = np.maximum(0, np.log(1/(1/token_toxicities-1))) # log odds ratio\n", 180 | "\n", 181 | "# discourage meaningless tokens\n", 182 | "for tok in ['.', ',', '-']:\n", 183 | " token_toxicities[tokenizer.encode(tok)][1] = 3\n", 184 | "\n", 185 | "for tok in ['you']:\n", 186 | " token_toxicities[tokenizer.encode(tok)][1] = 0" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "### Applying the model" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 14, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "reload(condbert)\n", 203 | "from condbert import CondBertRewriter\n", 204 | "\n", 205 | "editor = CondBertRewriter(\n", 206 | " model=model,\n", 207 | " tokenizer=tokenizer,\n", 208 | " device=device,\n", 209 | " neg_words=negative_words,\n", 210 | " pos_words=positive_words,\n", 211 | " word2coef=word2coef,\n", 212 | " token_toxicities=token_toxicities,\n", 213 | ")" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 15, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "name": "stdout", 223 | "output_type": "stream", 224 | "text": [ 225 | "you are an the !\n" 226 | ] 227 | } 228 | ], 229 | "source": [ 230 | "print(editor.translate('You are an idiot!', prnt=False))" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": {}, 236 | "source": [ 237 | "### Multiunit" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 16, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "editor = CondBertRewriter(\n", 247 | " model=model,\n", 248 | " tokenizer=tokenizer,\n", 249 | " device=device,\n", 250 | " neg_words=negative_words,\n", 251 | " pos_words=positive_words,\n", 252 | " word2coef=word2coef,\n", 253 | " token_toxicities=token_toxicities,\n", 254 | " predictor=None,\n", 255 | ")" 256 | ] 257 | }, 258 | { 259 | "cell_type": "code", 260 | "execution_count": 17, 261 | "metadata": {}, 262 | "outputs": [], 263 | "source": [ 264 | "from multiword import masked_token_predictor_bert\n", 265 | "reload(masked_token_predictor_bert)\n", 266 | "from multiword.masked_token_predictor_bert import MaskedTokenPredictorBert" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 18, 272 | "metadata": {}, 273 | "outputs": [ 274 | { 275 | "name": "stdout", 276 | "output_type": "stream", 277 | "text": [ 278 | "you are an old man !\n" 279 | ] 280 | } 281 | ], 282 | "source": [ 283 | "predictor = MaskedTokenPredictorBert(model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0)\n", 284 | "editor.predictor = predictor\n", 285 | "\n", 286 | "def adjust_logits(logits, label):\n", 287 | " return logits - editor.token_toxicities * 3\n", 288 | "\n", 289 | "predictor.logits_postprocessor = adjust_logits\n", 290 | "\n", 291 | "print(editor.replacement_loop('You are an idiot!', verbose=False))" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": 19, 297 | "metadata": {}, 298 | "outputs": [ 299 | { 300 | "name": "stdout", 301 | "output_type": "stream", 302 | "text": [ 303 | "you are an old man !\n", 304 | "Wall time: 499 ms\n" 305 | ] 306 | } 307 | ], 308 | "source": [ 309 | "%%time\n", 310 | "print(editor.replacement_loop('You are an idiot!', verbose=False, n_units=1))" 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": 20, 316 | "metadata": {}, 317 | "outputs": [ 318 | { 319 | "name": "stdout", 320 | "output_type": "stream", 321 | "text": [ 322 | "you are an old man !\n", 323 | "Wall time: 1.45 s\n" 324 | ] 325 | } 326 | ], 327 | "source": [ 328 | "%%time\n", 329 | "print(editor.replacement_loop('You are an idiot!', verbose=False, n_units=3))" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 21, 335 | "metadata": {}, 336 | "outputs": [ 337 | { 338 | "name": "stdout", 339 | "output_type": "stream", 340 | "text": [ 341 | "you are an old man !\n", 342 | "Wall time: 1.8 s\n" 343 | ] 344 | } 345 | ], 346 | "source": [ 347 | "%%time\n", 348 | "print(editor.replacement_loop('You are an idiot!', verbose=False, n_units=10))" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 22, 354 | "metadata": {}, 355 | "outputs": [], 356 | "source": [ 357 | "import choosers\n", 358 | "reload(choosers)\n", 359 | "from choosers import EmbeddingSimilarityChooser" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "# Reproduction" 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": 26, 372 | "metadata": {}, 373 | "outputs": [], 374 | "source": [ 375 | "predictor = MaskedTokenPredictorBert(\n", 376 | " model, tokenizer, max_len=250, device=device, label=0, contrast_penalty=0.0, \n", 377 | " confuse_bert_args=True, # this argument deteriorates quality but is used for backward compatibility\n", 378 | ")\n", 379 | "editor.predictor = predictor\n", 380 | "\n", 381 | "def adjust_logits(logits, label=0):\n", 382 | " return logits - editor.token_toxicities * 10\n", 383 | "\n", 384 | "predictor.logits_postprocessor = adjust_logits\n", 385 | "\n", 386 | "cho = EmbeddingSimilarityChooser(sim_coef=100, tokenizer=tokenizer)" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": 27, 392 | "metadata": {}, 393 | "outputs": [ 394 | { 395 | "data": { 396 | "application/vnd.jupyter.widget-view+json": { 397 | "model_id": "8c53fb1e3c8648d58c811221c8ca7146", 398 | "version_major": 2, 399 | "version_minor": 0 400 | }, 401 | "text/plain": [ 402 | " 0%| | 0/10 [00:00diet and ##ing 37 | keep the first subword and remove others. 38 | """ 39 | 40 | logger.debug(f'bpe tokens: {bpe_tokens}') 41 | logger.debug(f'bpe offsets: {bpe_offsets}') 42 | 43 | if len(masked_position[1]) > 1: 44 | indexes_to_del = masked_position[1][1:] 45 | del bpe_tokens[masked_position[0]][indexes_to_del[0] : indexes_to_del[-1] + 1] 46 | del bpe_offsets[masked_position[0]][indexes_to_del[0] : indexes_to_del[-1] + 1] 47 | 48 | masked_position = (masked_position[0], masked_position[1][0]) # TODO: leave masked_token as a list 49 | 50 | logger.debug(f'bpe offsets: {str(bpe_tokens)}') 51 | logger.debug(f'bpe offsets: {str(bpe_offsets)}') 52 | 53 | return masked_position, bpe_tokens, bpe_offsets 54 | 55 | 56 | def merge_sorted_results(objects_left, scores_left, 57 | objects_right, scores_right, max_elems): 58 | result_objects = [] 59 | result_scores = [] 60 | 61 | j = 0 62 | i = 0 63 | while True: 64 | if (len(result_scores) == max_elems): 65 | break 66 | 67 | if i == len(scores_left): 68 | result_objects += objects_right[j : j + max_elems - len(result_scores)] 69 | result_scores += scores_right[j : j + max_elems - len(result_scores)] 70 | break 71 | 72 | if j == len(scores_right): 73 | result_objects += objects_left[i : i + max_elems - len(result_scores)] 74 | result_scores += scores_left[i : i + max_elems - len(result_scores)] 75 | break 76 | 77 | if scores_left[i] > scores_right[j]: 78 | result_objects.append(objects_left[i]) 79 | result_scores.append(scores_left[i]) 80 | i += 1 81 | else: 82 | result_objects.append(objects_right[j]) 83 | result_scores.append(scores_right[j]) 84 | j += 1 85 | 86 | return result_objects, result_scores 87 | 88 | 89 | class MaskedTokenPredictorBert: 90 | def __init__( 91 | self, model, bpe_tokenizer, max_len=250, mask_in_multiunit=False, device=None, label=0, 92 | logits_postprocessor=None, contrast_penalty=0, 93 | mean=np.mean, 94 | confuse_bert_args=False, 95 | ): 96 | self._model = model 97 | self._bpe_tokenizer = bpe_tokenizer 98 | self._max_len = max_len 99 | self._mask_in_multiunit = mask_in_multiunit 100 | self.device = device or torch.device('cuda') 101 | self.label = label 102 | self.logits_postprocessor = logits_postprocessor 103 | self.contrast_penalty = contrast_penalty 104 | self.mean = mean # could be scipy.stats.hmean, but for negative tokens 105 | self.confuse_bert_args = confuse_bert_args # reproduce an old error for backward compatibility 106 | 107 | def __call__(self, sentences, masked_position, **kwargs): 108 | if type(masked_position) is not list: 109 | bpe_tokens = [bpe_tokens] 110 | masked_position = [masked_position] 111 | 112 | b_masked_pos = [] 113 | b_bpe_tokens = [] 114 | for sent, mask_pos in zip(sentences, masked_position): 115 | bpe_tokens, bpe_offsets = bpe_tokenize(self._bpe_tokenizer, sent) 116 | 117 | masked_position = find_bpe_position_by_offset([bpe_offsets], 118 | (sent[mask_pos].begin, 119 | sent[mask_pos].end)) 120 | 121 | masked_position, bpe_tokens, _ = remove_masked_token_subwords(masked_position, 122 | [bpe_tokens], 123 | [bpe_offsets]) 124 | bpe_tokens = bpe_tokens[0] 125 | 126 | logger.debug(f'Bpe tokens: {bpe_tokens}') 127 | 128 | b_bpe_tokens.append(bpe_tokens) 129 | b_masked_pos.append(masked_position[1]) 130 | 131 | return self.generate(b_bpe_tokens, b_masked_pos, **kwargs) 132 | 133 | def generate(self, b_bpe_tokens, b_masked_pos, 134 | mask_token=True, 135 | n_top=5, 136 | n_units=1, 137 | n_tokens=[1], 138 | fix_multiunit=True, 139 | beam_size=10, 140 | multiunit_lookup=100, 141 | max_multiunit=10, 142 | label=None, 143 | **kwargs 144 | ): 145 | result_preds = [[] for _ in range(len(b_bpe_tokens))] 146 | result_scores = [[] for _ in range(len(b_bpe_tokens))] 147 | if 1 in n_tokens: 148 | result_preds, result_scores = self.predict_single_word(b_bpe_tokens, b_masked_pos, 149 | mask_token=mask_token, 150 | n_top=n_top, 151 | n_units=n_units, 152 | multiunit_lookup=multiunit_lookup, 153 | fix_multiunit=fix_multiunit, 154 | max_multiunit=max_multiunit, label=label) 155 | 156 | for n_t in n_tokens: 157 | if n_t == 1: 158 | continue 159 | 160 | pred_tokens, pred_scores = self.predict_token_sequence(b_bpe_tokens, b_masked_pos, 161 | mask_token=mask_token, 162 | n_top=n_top, 163 | n_units=n_units, 164 | seq_len=n_t, 165 | multiunit_lookup=multiunit_lookup, 166 | fix_multiunit=fix_multiunit, 167 | beam_size=beam_size, 168 | max_multiunit=max_multiunit, label=label) 169 | 170 | for i in range(len(b_bpe_tokens)): 171 | result_preds[i], result_scores[i] = merge_sorted_results(result_preds[i], result_scores[i], 172 | pred_tokens[i], pred_scores[i], 173 | n_top) 174 | 175 | return result_preds, result_scores 176 | 177 | def predict_single_unit(self, bpe_tokens, masked_position, 178 | mask_token, n_top, label=None): 179 | if label is None: 180 | label = self.label 181 | bpe_tokens = copy.deepcopy(bpe_tokens) 182 | 183 | max_len = min([max(len(e) for e in bpe_tokens) + 2, self._max_len]) 184 | token_ids = [] 185 | for i in range(len(bpe_tokens)): 186 | bpe_tokens[i] = bpe_tokens[i][:max_len - 2] 187 | 188 | if mask_token: 189 | if i >= len(masked_position): 190 | continue 191 | pos = masked_position[i] 192 | if pos >= len(bpe_tokens[i]): 193 | continue 194 | bpe_tokens[i][pos] = '[MASK]' 195 | 196 | bpe_tokens[i] = ['[CLS]'] + bpe_tokens[i] + ['[SEP]'] 197 | logger.debug(f'Masked BPE tokens: {bpe_tokens[i]}') 198 | 199 | token_ids.append(self._bpe_tokenizer.convert_tokens_to_ids(bpe_tokens[i])) 200 | 201 | token_ids = pad_sequences(token_ids, maxlen=max_len, dtype='long', 202 | truncating='post', padding='post') 203 | attention_masks_tensor = torch.tensor(token_ids > 0).long().to(self.device) 204 | tokens_tensor = torch.tensor(token_ids).to(self.device) 205 | 206 | segments_ids = np.ones_like(token_ids, dtype=int) * label 207 | segments_tensor = torch.tensor(segments_ids).to(self.device) 208 | 209 | self._model.eval() 210 | with torch.no_grad(): 211 | if self.confuse_bert_args: 212 | target_sent = self._model( 213 | tokens_tensor, attention_mask=segments_tensor, token_type_ids=attention_masks_tensor 214 | )[0] 215 | else: 216 | target_sent = self._model( 217 | tokens_tensor, token_type_ids=segments_tensor, attention_mask=attention_masks_tensor 218 | )[0] 219 | 220 | # target_sent = torch.log_softmax(target_sent, -1) 221 | 222 | if self.contrast_penalty: 223 | # todo: make it work with multiple words (it breaks hypotheses somehow) 224 | with torch.no_grad(): 225 | another = self._model(tokens_tensor, token_type_ids=1 - segments_tensor, attention_mask=attention_masks_tensor)[0] 226 | diff = torch.softmax(target_sent, -1) - self.contrast_penalty * torch.softmax(another, -1) 227 | target_sent = torch.log(torch.clamp(diff, 1e-20)) 228 | 229 | target_sent = target_sent.detach().cpu().numpy() 230 | 231 | final_top_scores = [] 232 | final_top_tokens = [] 233 | for i in range(target_sent.shape[0]): 234 | row = target_sent[i] 235 | idx = masked_position[i] 236 | if idx + 1 >= len(row): 237 | # todo: investigate why this shit happens when n_units>1 238 | continue 239 | logits = row[idx + 1] 240 | logits = self.adjust_logits(logits, label=label) 241 | top_ids = nlargest_indexes(logits, n_top) 242 | top_scores = [target_sent[i][masked_position[i] + 1][j] for j in top_ids] 243 | top_tokens = self._bpe_tokenizer.convert_ids_to_tokens(top_ids) 244 | 245 | final_top_scores.append(top_scores) 246 | final_top_tokens.append(top_tokens) 247 | 248 | return final_top_tokens, final_top_scores 249 | 250 | def adjust_logits(self, logits, label=0): 251 | if self.logits_postprocessor: 252 | return self.logits_postprocessor(logits, label=label or 0) 253 | return logits 254 | 255 | def predict_single_word(self, bpe_tokens, masked_position, 256 | mask_token, 257 | n_top, 258 | n_units, 259 | fix_multiunit, 260 | multiunit_lookup, 261 | max_multiunit, 262 | label=None 263 | ): 264 | pred_tokens, scores = self.predict_single_unit(bpe_tokens, 265 | masked_position, 266 | mask_token=mask_token, 267 | n_top=n_top, label=label) 268 | 269 | final_pred_tokens = [] 270 | final_scores = [] 271 | for j in range(len(pred_tokens)): 272 | if n_units > 1: 273 | pred_tokens[j] = list(reversed(pred_tokens[j][:multiunit_lookup])) 274 | scores[j] = list(reversed(scores[j][:multiunit_lookup])) 275 | 276 | seq_list = self.generate_multiunit_token(masked_position[j], bpe_tokens[j], 277 | n_top=multiunit_lookup, 278 | n_units=n_units, label=label) 279 | 280 | #for seq in seq_list[ :max_multiunit - 1]: 281 | for seq in seq_list[:max_multiunit]: 282 | seq_pred, seq_scores = seq 283 | multiunit_token = '_'.join(seq_pred) 284 | if fix_multiunit: 285 | multiunit_token = multiunit_token.replace('#', '') 286 | multiunit_token = multiunit_token.replace('_', '') 287 | 288 | #multiunit_score = np.average(seq_scores) 289 | multiunit_score = self.mean(seq_scores) 290 | 291 | ind = bisect.bisect(scores[j], multiunit_score) 292 | 293 | pred_tokens[j].insert(ind, multiunit_token) 294 | scores[j].insert(ind, multiunit_score) 295 | 296 | pred_tokens[j] = list(reversed(pred_tokens[j])) 297 | scores[j] = list(reversed(scores[j])) 298 | 299 | logger.debug(f'Predicted words: {pred_tokens[j]}') 300 | 301 | #final_pred_tokens.append([e for e in pred_tokens[j] if not e.startswith('##')]) 302 | final_pred_tokens.append(pred_tokens[j][:n_top]) 303 | final_scores.append(scores[j][:n_top]) 304 | 305 | return final_pred_tokens, final_scores 306 | 307 | def generate_multiunit_token(self, masked_position, bpe_tokens, 308 | n_top, 309 | n_units, 310 | label=None, 311 | ): 312 | final_result = [] 313 | final_result_scores = [] 314 | 315 | bpe_tokens = copy.deepcopy(bpe_tokens) 316 | bpe_tokens.insert(masked_position, '[MASK]') 317 | predictions, scores = self.predict_single_unit([bpe_tokens], 318 | [masked_position + 1], 319 | n_top=n_top, 320 | mask_token=self._mask_in_multiunit, 321 | label=label, 322 | ) 323 | # This will result in "Mama washed the [MASK] __frame__ ." 324 | 325 | if len(predictions) == 0: 326 | return [] 327 | 328 | predictions = predictions[0] 329 | scores = scores[0] 330 | #n_suffix = 0 331 | 332 | good_preds = [] 333 | 334 | # TODO: increase speed 335 | b_bpe_tokens = [] 336 | for i, pred in (e for e in enumerate(predictions) if e[1][0] == '#'): 337 | # if n_suffix > n_units - 1: 338 | # break 339 | 340 | tmp = copy.deepcopy(bpe_tokens) 341 | tmp[masked_position + 1] = pred 342 | b_bpe_tokens.append(tmp) 343 | good_preds.append((i,pred)) 344 | #n_suffix += 1 345 | 346 | if not good_preds: 347 | return [] 348 | 349 | 350 | loader = DataLoader(b_bpe_tokens, batch_size=10, collate_fn=lambda _: _) 351 | preds = [] 352 | pred_scores = [] 353 | for batch in loader: 354 | bb_preds, bb_pred_scores = self.predict_single_unit(batch, 355 | [masked_position for _ in range(len(batch))], 356 | mask_token=False, # We do not need masking here, since inserted token will be a mask 357 | n_top=n_top, label=label 358 | ) 359 | 360 | preds += bb_preds 361 | pred_scores += bb_pred_scores 362 | 363 | for i in range(len(preds)): 364 | result = [preds[i][0], good_preds[i][1]] 365 | result_scores = [pred_scores[i][0], scores[good_preds[i][0]]] 366 | 367 | tail, tail_scores = self.generate_from_tail(preds[i][0], b_bpe_tokens[i], masked_position, 368 | max_subunits=n_units-2, 369 | n_top=n_top, label=label) 370 | result = tail + result 371 | result_scores = tail_scores + result_scores 372 | 373 | final_result.append(result) 374 | final_result_scores.append(result_scores) 375 | 376 | return list(zip(final_result, final_result_scores)) 377 | 378 | def generate_from_tail(self, pred, bpe_tokens, masked_position, 379 | max_subunits, 380 | n_top, label=None,): 381 | result = [] 382 | result_scores = [] 383 | 384 | it = 0 385 | while (pred[0] == '#') and (it < max_subunits): 386 | bpe_tokens[masked_position] = pred 387 | bpe_tokens.insert(masked_position, '[MASK]') 388 | preds, pred_scores = self.predict_single_unit([bpe_tokens], [masked_position], 389 | n_top=n_top, 390 | mask_token=False, label=label) 391 | pred = preds[0][0] 392 | result.append(pred) 393 | result_scores.append(pred_scores[0][0]) 394 | it += 1 395 | 396 | return (list(reversed(result)), 397 | list(reversed(result_scores))) 398 | 399 | def generate_variants(self, bpe_tokens, mask_pos, 400 | gen_tokens, gen_scores, seq_len): 401 | batch_size = len(bpe_tokens) 402 | 403 | if not gen_tokens: 404 | yield bpe_tokens, [0.]*batch_size, [[] for _ in range(batch_size)], mask_pos 405 | return 406 | 407 | for var_num in range(len(gen_tokens[0])): 408 | if not gen_tokens[0][var_num]: 409 | continue 410 | 411 | variant = [] 412 | new_mask = [] 413 | var_t = [] 414 | var_s = [] 415 | for i in range(batch_size): 416 | new_bpe = copy.deepcopy(bpe_tokens[i]) 417 | 418 | for seq_num in range(len(gen_tokens[i][var_num])): 419 | new_bpe[mask_pos[i] + seq_num] = gen_tokens[i][var_num][seq_num] 420 | 421 | var_t.append(gen_tokens[i][var_num]) 422 | var_s.append(gen_scores[i][var_num]) 423 | 424 | new_mask.append(mask_pos[i] + len(gen_tokens[i][var_num])) 425 | 426 | variant.append(new_bpe) 427 | 428 | yield variant, var_s, var_t, new_mask 429 | 430 | def update_beam(self, 431 | prev_tokens, prev_score, 432 | new_scores, new_tokens, 433 | gen_scores, gen_tokens): 434 | for i in range(len(gen_scores)): 435 | final_gen_score = prev_score + gen_scores[i] 436 | insert_pos = bisect.bisect(new_scores, final_gen_score) 437 | 438 | new_scores.insert(insert_pos, final_gen_score) 439 | del new_scores[0] 440 | 441 | new_tokens.insert(insert_pos, prev_tokens + [gen_tokens[i]]) 442 | if len(new_tokens) > len(new_scores): 443 | del new_tokens[0] 444 | 445 | def predict_token_sequence(self, bpe_tokens, masked_pos, 446 | mask_token, 447 | n_top, 448 | seq_len, 449 | beam_size, 450 | n_units, 451 | fix_multiunit, 452 | multiunit_lookup, 453 | max_multiunit, 454 | label=None, 455 | ): 456 | bpe_tokens = copy.deepcopy(bpe_tokens) 457 | 458 | batch_size = len(bpe_tokens) 459 | for i in range(batch_size): 460 | for seq_num in range(seq_len - 1): 461 | bpe_tokens[i].insert(masked_pos[i] + 1, '[MASK]') 462 | 463 | gen_scores = [] 464 | gen_tokens = [] 465 | for seq_num in range(seq_len): 466 | gen_scores_seq = [[0. for __ in range(beam_size)] for _ in range(batch_size)] 467 | gen_tokens_seq = [[[] for __ in range(beam_size)] for _ in range(batch_size)] 468 | for variant, variant_score, prev_tokens, new_mask in self.generate_variants(bpe_tokens, masked_pos, 469 | gen_tokens, gen_scores, 470 | seq_len=seq_len): 471 | top_tokens, top_scores = self.predict_single_word(variant, new_mask, 472 | mask_token=True, 473 | n_top=n_top, 474 | n_units=n_units, 475 | fix_multiunit=fix_multiunit, 476 | multiunit_lookup=multiunit_lookup, 477 | max_multiunit=max_multiunit, 478 | label=label, 479 | ) 480 | 481 | for i in range(batch_size): 482 | self.update_beam(prev_tokens[i], variant_score[i], 483 | gen_scores_seq[i], gen_tokens_seq[i], 484 | top_scores[i], top_tokens[i]) 485 | 486 | gen_tokens = gen_tokens_seq 487 | gen_scores = gen_scores_seq 488 | 489 | gen_scores = [[(e/seq_len) for e in l] for l in gen_scores] 490 | 491 | return ([list(reversed(e)) for e in gen_tokens], 492 | [list(reversed(e)) for e in gen_scores]) 493 | -------------------------------------------------------------------------------- /emnlp2021/style_transfer/condBERT/multiword/predict_utils.py: -------------------------------------------------------------------------------- 1 | def find_bpe_position_by_offset(bpe_offsets, target_offset): 2 | bpe_nums=[] 3 | for sent_num, sent in enumerate(bpe_offsets): 4 | if sent[-1][0] < target_offset[0]: 5 | continue 6 | 7 | for bpe_num, bpe in enumerate(sent): 8 | if target_offset[0] <= bpe[0] and bpe[1] <= target_offset[1]: 9 | bpe_nums.append(bpe_num) 10 | return (sent_num, bpe_nums) 11 | 12 | 13 | def generate_seq_indexes(indexes): 14 | if not indexes: 15 | yield [] 16 | return 17 | 18 | for ind in indexes[0]: 19 | for seq in generate_seq_indexes(indexes[1:]): 20 | yield [ind] + seq 21 | 22 | """failure case of tokenizer: 23 | tagged_text = "Earlier this year , some 70 U.S. congressmen sent a letter to U. __S.__ President Bill Clinton , calling for an end to the humanitarian crisis in Iraq by having the sanctions lifted ." 24 | """ 25 | -------------------------------------------------------------------------------- /emnlp2021/style_transfer/condBERT/vocab/toxic_words.txt: -------------------------------------------------------------------------------- 1 | 4r5e 2 | 5h1t 3 | 5hit 4 | a55 5 | anal 6 | anus 7 | ar5e 8 | arrse 9 | arse 10 | ass 11 | ass-fucker 12 | asses 13 | assfucker 14 | assfukka 15 | asshole 16 | assholes 17 | asswhole 18 | a_s_s 19 | b!tch 20 | b00bs 21 | b17ch 22 | b1tch 23 | ballbag 24 | balls 25 | ballsack 26 | bastard 27 | beastial 28 | beastiality 29 | bellend 30 | bestial 31 | bestiality 32 | bi+ch 33 | biatch 34 | bitch 35 | bitcher 36 | bitchers 37 | bitches 38 | bitchin 39 | bitching 40 | bloody 41 | blow job 42 | blowjob 43 | blowjobs 44 | boiolas 45 | bollock 46 | bollok 47 | boner 48 | boob 49 | boobs 50 | booobs 51 | boooobs 52 | booooobs 53 | booooooobs 54 | breasts 55 | buceta 56 | bugger 57 | bum 58 | bunny fucker 59 | butt 60 | butthole 61 | buttmunch 62 | buttplug 63 | c0ck 64 | c0cksucker 65 | carpet muncher 66 | cawk 67 | chink 68 | cipa 69 | cl1t 70 | clit 71 | clitoris 72 | clits 73 | cnut 74 | cock 75 | cock-sucker 76 | cockface 77 | cockhead 78 | cockmunch 79 | cockmuncher 80 | cocks 81 | cocksuck 82 | cocksucked 83 | cocksucker 84 | cocksucking 85 | cocksucks 86 | cocksuka 87 | cocksukka 88 | cok 89 | cokmuncher 90 | coksucka 91 | coon 92 | cox 93 | crap 94 | cum 95 | cummer 96 | cumming 97 | cums 98 | cumshot 99 | cunilingus 100 | cunillingus 101 | cunnilingus 102 | cunt 103 | cuntlick 104 | cuntlicker 105 | cuntlicking 106 | cunts 107 | cyalis 108 | cyberfuc 109 | cyberfuck 110 | cyberfucked 111 | cyberfucker 112 | cyberfuckers 113 | cyberfucking 114 | d1ck 115 | damn 116 | dick 117 | dickhead 118 | dildo 119 | dildos 120 | dink 121 | dinks 122 | dirsa 123 | dlck 124 | dog-fucker 125 | doggin 126 | dogging 127 | donkeyribber 128 | doosh 129 | duche 130 | dyke 131 | ejaculate 132 | ejaculated 133 | ejaculates 134 | ejaculating 135 | ejaculatings 136 | ejaculation 137 | ejakulate 138 | f u c k 139 | f u c k e r 140 | f4nny 141 | fag 142 | fagging 143 | faggitt 144 | faggot 145 | faggs 146 | fagot 147 | fagots 148 | fags 149 | fanny 150 | fannyflaps 151 | fannyfucker 152 | fanyy 153 | fatass 154 | fcuk 155 | fcuker 156 | fcuking 157 | feck 158 | fecker 159 | felching 160 | fellate 161 | fellatio 162 | fingerfuck 163 | fingerfucked 164 | fingerfucker 165 | fingerfuckers 166 | fingerfucking 167 | fingerfucks 168 | fistfuck 169 | fistfucked 170 | fistfucker 171 | fistfuckers 172 | fistfucking 173 | fistfuckings 174 | fistfucks 175 | flange 176 | fook 177 | fooker 178 | fuck 179 | fucka 180 | fucked 181 | fucker 182 | fuckers 183 | fuckhead 184 | fuckheads 185 | fuckin 186 | fucking 187 | fuckings 188 | fuckingshitmotherfucker 189 | fuckme 190 | fucks 191 | fuckwhit 192 | fuckwit 193 | fudge packer 194 | fudgepacker 195 | fuk 196 | fuker 197 | fukker 198 | fukkin 199 | fuks 200 | fukwhit 201 | fukwit 202 | fux 203 | fux0r 204 | f_u_c_k 205 | gangbang 206 | gangbanged 207 | gangbangs 208 | gaylord 209 | gaysex 210 | goatse 211 | God 212 | god-dam 213 | god-damned 214 | goddamn 215 | goddamned 216 | hardcoresex 217 | hell 218 | heshe 219 | hoar 220 | hoare 221 | hoer 222 | homo 223 | hore 224 | horniest 225 | horny 226 | hotsex 227 | jack-off 228 | jackoff 229 | jap 230 | jerk-off 231 | jism 232 | jiz 233 | jizm 234 | jizz 235 | kawk 236 | knob 237 | knobead 238 | knobed 239 | knobend 240 | knobhead 241 | knobjocky 242 | knobjokey 243 | kock 244 | kondum 245 | kondums 246 | kum 247 | kummer 248 | kumming 249 | kums 250 | kunilingus 251 | l3i+ch 252 | l3itch 253 | labia 254 | lmfao 255 | lust 256 | lusting 257 | m0f0 258 | m0fo 259 | m45terbate 260 | ma5terb8 261 | ma5terbate 262 | masochist 263 | master-bate 264 | masterb8 265 | masterbat* 266 | masterbat3 267 | masterbate 268 | masterbation 269 | masterbations 270 | masturbate 271 | mo-fo 272 | mof0 273 | mofo 274 | mothafuck 275 | mothafucka 276 | mothafuckas 277 | mothafuckaz 278 | mothafucked 279 | mothafucker 280 | mothafuckers 281 | mothafuckin 282 | mothafucking 283 | mothafuckings 284 | mothafucks 285 | mother fucker 286 | motherfuck 287 | motherfucked 288 | motherfucker 289 | motherfuckers 290 | motherfuckin 291 | motherfucking 292 | motherfuckings 293 | motherfuckka 294 | motherfucks 295 | muff 296 | mutha 297 | muthafecker 298 | muthafuckker 299 | muther 300 | mutherfucker 301 | n1gga 302 | n1gger 303 | nazi 304 | nigg3r 305 | nigg4h 306 | nigga 307 | niggah 308 | niggas 309 | niggaz 310 | nigger 311 | niggers 312 | nob 313 | nob jokey 314 | nobhead 315 | nobjocky 316 | nobjokey 317 | numbnuts 318 | nutsack 319 | orgasim 320 | orgasims 321 | orgasm 322 | orgasms 323 | p0rn 324 | pawn 325 | pecker 326 | penis 327 | penisfucker 328 | phonesex 329 | phuck 330 | phuk 331 | phuked 332 | phuking 333 | phukked 334 | phukking 335 | phuks 336 | phuq 337 | pigfucker 338 | pimpis 339 | piss 340 | pissed 341 | pisser 342 | pissers 343 | pisses 344 | pissflaps 345 | pissin 346 | pissing 347 | pissoff 348 | poop 349 | porn 350 | porno 351 | pornography 352 | pornos 353 | prick 354 | pricks 355 | pron 356 | pube 357 | pusse 358 | pussi 359 | pussies 360 | pussy 361 | pussys 362 | rectum 363 | retard 364 | rimjaw 365 | rimming 366 | s hit 367 | s.o.b. 368 | sadist 369 | schlong 370 | screwing 371 | scroat 372 | scrote 373 | scrotum 374 | semen 375 | sex 376 | sh!+ 377 | sh!t 378 | sh1t 379 | shag 380 | shagger 381 | shaggin 382 | shagging 383 | shemale 384 | shi+ 385 | shit 386 | shitdick 387 | shite 388 | shited 389 | shitey 390 | shitfuck 391 | shitfull 392 | shithead 393 | shiting 394 | shitings 395 | shits 396 | shitted 397 | shitter 398 | shitters 399 | shitting 400 | shittings 401 | shitty 402 | skank 403 | slut 404 | sluts 405 | smegma 406 | smut 407 | snatch 408 | son-of-a-bitch 409 | spac 410 | spunk 411 | s_h_i_t 412 | t1tt1e5 413 | t1tties 414 | teets 415 | teez 416 | testical 417 | testicle 418 | tit 419 | titfuck 420 | tits 421 | titt 422 | tittie5 423 | tittiefucker 424 | titties 425 | tittyfuck 426 | tittywank 427 | titwank 428 | tosser 429 | turd 430 | tw4t 431 | twat 432 | twathead 433 | twatty 434 | twunt 435 | twunter 436 | v14gra 437 | v1gra 438 | vagina 439 | viagra 440 | vulva 441 | w00se 442 | wang 443 | wank 444 | wanker 445 | wanky 446 | whoar 447 | whore 448 | willies 449 | willy 450 | xrated 451 | xxx 452 | -------------------------------------------------------------------------------- /emnlp2021/style_transfer/condBERT/vocab/word2coef.pkl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/s-nlp/detox/0ebaeab817957bb5463819bec7fa4ed3de9a26ee/emnlp2021/style_transfer/condBERT/vocab/word2coef.pkl -------------------------------------------------------------------------------- /emnlp2021/style_transfer/paraGeDi/gedi_adapter.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import torch 3 | 4 | from transformers.generation_utils import GenerationMixin 5 | 6 | 7 | class GediAdapter(GenerationMixin): 8 | def __init__( 9 | self, model, gedi_model, 10 | target=0, # todo: pass them in the generate method as well 11 | gedi_logit_coef=1, 12 | pos_code=1349, 13 | neg_code=13086, 14 | tokenizer=None, 15 | max_id=None, 16 | debug=False, 17 | reg_alpha=0, 18 | ub=None, 19 | lb=None, 20 | untouchable_tokens=None, 21 | nearly_infinity = -1000, 22 | ): 23 | self.model = model 24 | self.gedi_model = gedi_model 25 | self.target = target 26 | self.gedi_logit_coef = gedi_logit_coef 27 | self.POS_CODE = pos_code 28 | self.NEG_CODE = neg_code 29 | self.codes = {'gedi_pos': self.POS_CODE, 'gedi_neg': self.NEG_CODE} 30 | self.tokenizer = tokenizer 31 | self.max_id = max_id 32 | self.debug = debug 33 | self.reg_alpha = reg_alpha 34 | self.ub = ub 35 | self.lb = lb 36 | self.logits = [] 37 | self.untouchable_tokens = untouchable_tokens or [] 38 | self.nearly_infinity = nearly_infinity 39 | 40 | def show_correction(self, sm, logits, corrected, old_logits): 41 | if self.tokenizer: 42 | vals = sm.cpu().numpy()[0][0] 43 | lv = logits.cpu().numpy()[0] 44 | cv = corrected.cpu().numpy()[0] 45 | if self.max_id: 46 | vals = vals[:self.max_id] 47 | lv = lv[:self.max_id] 48 | cv = cv[:self.max_id] 49 | # the most upgraded and downgraded tokens 50 | #print('+', self.tokenizer.convert_ids_to_tokens(np.argsort(-vals)[:5]), -np.sort(-vals)[:3]) 51 | #print('-', self.tokenizer.convert_ids_to_tokens(np.argsort(vals)[:5]), np.sort(vals)[:3]) 52 | #print(torch.exp(logits).sum()) 53 | # how the top logits change 54 | old_top_id = np.argsort(-lv)[:5] 55 | new_top_id = np.argsort(-cv)[:5] 56 | toks = list(new_top_id) 57 | for t in old_top_id: 58 | if t not in toks: 59 | toks.append(t) 60 | 61 | pos_logits = old_logits[0][0].cpu().numpy() 62 | neg_logits = old_logits[1][0].cpu().numpy() 63 | 64 | texts = self.tokenizer.convert_ids_to_tokens(toks) 65 | print('changes in the top:') 66 | for text, idx in zip(texts, toks): 67 | print('{:6d}: {:+2.2f} > {:+2.2f} {:20s} [{:+2.2f} | {:+2.2f}]'.format(idx, lv[idx], cv[idx], text, pos_logits[idx], neg_logits[idx])) 68 | #print() 69 | #print(self.tokenizer.convert_ids_to_tokens(old_top_id), self.tokenizer.convert_ids_to_tokens(new_top_id)) 70 | 71 | def __call__(self, return_dict=True, **kwargs): 72 | new_args = kwargs.get('main', {}) 73 | with torch.no_grad(): 74 | #print('main inputs:', {k: v.shape for k, v in new_args.items() if isinstance(v, torch.Tensor)}) 75 | outputs = self.model(return_dict=return_dict, **new_args) 76 | #print('main outputs:', {k: v.shape for k, v in outputs.items() if isinstance(v, torch.Tensor)}) 77 | outputs['main'] = outputs 78 | gedi_logits = {} 79 | for gedi_key in ['gedi_pos', 'gedi_neg']: 80 | gedi_args = kwargs.get(gedi_key, {}) 81 | with torch.no_grad(): 82 | gedi_out = self.gedi_model(**gedi_args, return_dict=True) 83 | outputs[gedi_key] = gedi_out # logits are [batch, seq, voc] 84 | gedi_logits[gedi_key] = gedi_out['logits'][:,-1] # [batch, voc] 85 | # todo: add cumulative log loss 86 | stacked = torch.stack([gedi_logits['gedi_pos'], gedi_logits['gedi_neg']]) # [2, batch, voc] 87 | # exclude untouchable tokens from the distribution on which the penalty is calculated 88 | for token_id in self.untouchable_tokens: 89 | stacked[:, :, token_id] = self.nearly_infinity 90 | if self.reg_alpha: 91 | # increase each p(token|class) by the same amount, to shift odds ratio to 1. 92 | old_logits = torch.log(torch.softmax(stacked, -1) + self.reg_alpha) 93 | else: 94 | old_logits = torch.log_softmax(stacked, -1) 95 | 96 | if hasattr(self.gedi_model, 'logit_scale'): 97 | old_logits += self.gedi_model.logit_scale 98 | if hasattr(self.gedi_model, 'bias'): 99 | old_logits += self.gedi_model.bias.reshape(2, 1, 1).repeat(1,1,old_logits.shape[-1]) 100 | 101 | sm = torch.log_softmax(old_logits, 0) 102 | logits = outputs['logits'][:,-1] 103 | 104 | # bad patch for eos 105 | #sm[:, :,1] = 0 106 | 107 | shift = sm[self.target] 108 | # shift everything by a constant to make logits before and after change more comparable 109 | shift -= shift.mean() 110 | # limit the positive or negative impact of gedi correction 111 | if self.lb is not None or self.ub is not None: 112 | shift = torch.clamp(shift, self.lb, self.ub) 113 | for token_id in self.untouchable_tokens: 114 | shift[:, token_id] = 0 115 | 116 | corrected = logits + shift * self.gedi_logit_coef 117 | if self.debug: 118 | self.show_correction(sm, logits, corrected, torch.log_softmax(stacked, -1)) 119 | if self.max_id is not None: 120 | corrected[self.max_id:] = -np.infty 121 | outputs['logits'] = corrected.unsqueeze(1) # add back sequence length 122 | # todo: calculate the loss with respect to the new logits 123 | 124 | #self.logits.append(corrected) 125 | return outputs 126 | 127 | def prepare_inputs_for_generation(self, input_ids, **kwargs): 128 | # todo: prepare inputs separately for main and gedi models 129 | result = {} 130 | # unpack past after beam search application 131 | past = kwargs.get('past') 132 | #print('past:', past is not None) 133 | if past and not isinstance(past, tuple): 134 | for k, v in past.items(): 135 | kwargs[k]['past'] = v 136 | 137 | main_input_ids = input_ids 138 | main_kwargs = kwargs.get('main', kwargs) 139 | if kwargs.get('main_prefix') is not None and main_kwargs.get('past') is None: 140 | prefix = kwargs['main_prefix'].unsqueeze(0).repeat(main_input_ids.shape[0], 1) 141 | main_input_ids = torch.cat([prefix, main_input_ids], dim=1) 142 | #print('main reshape text:', input_ids.shape, main_input_ids.shape) 143 | if main_kwargs.get('attention_mask') is not None: 144 | old_mask = main_kwargs['attention_mask'] 145 | mask_prefix = prefix * 0 + 1 146 | main_kwargs['attention_mask'] = torch.cat([mask_prefix, old_mask], dim=1) 147 | #print('main reshape mask:', old_mask.shape, main_kwargs['attention_mask'].shape) 148 | #print('main ', self.tokenizer.decode(main_input_ids[0])) 149 | result['main'] = self.model.prepare_inputs_for_generation(main_input_ids, **main_kwargs) 150 | 151 | for k in ['gedi_pos', 'gedi_neg']: 152 | gedi_args = kwargs.get(k, {}) 153 | # todo: allow concatenation of codes to the input instead of replacing the first token 154 | if kwargs.get('gedi_prepend'): 155 | # prepend the code to the input 156 | prefix = torch.ones([input_ids.shape[0], 1], dtype=input_ids.dtype).to(input_ids.device) * self.codes[k] 157 | new_input_ids = torch.cat([prefix, input_ids], dim=1) 158 | # todo: enlarge the attention mask 159 | else: 160 | # instert the code instead of the first token of the input 161 | new_input_ids = input_ids.clone() # batch size x seq len 162 | new_input_ids[:, 0] = self.codes[k] 163 | #print(k, self.tokenizer.decode(new_input_ids[0])) 164 | gedi_inputs = self.gedi_model.prepare_inputs_for_generation(new_input_ids, **gedi_args) 165 | result[k] = gedi_inputs 166 | return result 167 | 168 | def _update_model_kwargs_for_generation(self, outputs, model_kwargs, is_encoder_decoder=False): 169 | # todo: use the new outputs as args 170 | result = {k: v for k, v in model_kwargs.items()} 171 | #print('main before upd:', {k: v.shape for k, v in model_kwargs.get('main', model_kwargs).items() if isinstance(v, torch.Tensor)}, 'main' in model_kwargs) 172 | result['main'] = self.model._update_model_kwargs_for_generation( 173 | outputs=outputs['main'], 174 | model_kwargs=model_kwargs.get('main', model_kwargs), 175 | is_encoder_decoder=self.model.config.is_encoder_decoder, 176 | ) 177 | #print('main after upd:', {k: v.shape for k, v in result['main'].items() if isinstance(v, torch.Tensor)}) 178 | for k in ['gedi_pos', 'gedi_neg']: 179 | result[k] = self.gedi_model._update_model_kwargs_for_generation( 180 | outputs=outputs[k], 181 | model_kwargs=model_kwargs.get(k, {}), 182 | is_encoder_decoder=self.gedi_model.config.is_encoder_decoder, 183 | ) 184 | 185 | # a fix for beam search 186 | result['past'] = { 187 | k: result[k]['past'] 188 | for k in ['main', 'gedi_pos', 'gedi_neg'] 189 | if 'past' in result[k] and result[k]['past'] is not None and result[k]['past'][0] is not None 190 | } 191 | return result 192 | 193 | 194 | def paraphrase(self, text, n=None, max_length=128): 195 | inputs = tokenizer(text, return_tensors='pt', padding=True)['input_ids'].to(device) 196 | if max_length == 'auto': 197 | max_length = min(int(inputs.shape[1] * 1.1) + 4, 64) 198 | result = adapter.generate( 199 | inputs, 200 | num_return_sequences=n or 1, 201 | do_sample=False, temperature=0.0, repetition_penalty=3.0, max_length=max_length, 202 | bad_words_ids=[[2]], # unk 203 | ) 204 | texts = [tokenizer.decode(r) for r in result] 205 | if not n: 206 | return texts[0] 207 | return texts 208 | 209 | def _reorder_cache(self, past, beam_idx): 210 | # for each model, cache should be reordered separately 211 | result = {} 212 | for key, subpast in past.items(): 213 | model = self.model if key == 'main' else self.gedi_model 214 | result[key] = model._reorder_cache(subpast, beam_idx) 215 | return result 216 | 217 | # the data below is used only for compatibility 218 | # todo: clarify it 219 | def get_output_embeddings(self): 220 | return True 221 | 222 | @property 223 | def config(self): 224 | return self.model.config 225 | 226 | def get_encoder(self): 227 | return self.model.get_encoder() 228 | 229 | def parameters(self): 230 | return self.model.parameters() 231 | 232 | @property 233 | def device(self): 234 | return self.model.device 235 | 236 | @property 237 | def main_input_name(self): 238 | return self.model.main_input_name 239 | 240 | def forward(self, attention_mask=None, **kwargs): 241 | pass 242 | -------------------------------------------------------------------------------- /emnlp2021/style_transfer/paraGeDi/text_processing.py: -------------------------------------------------------------------------------- 1 | import re 2 | 3 | punkt = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ ' 4 | 5 | 6 | def detokenize(text): 7 | text = text.replace(" .", ".").replace(" ,", ",").replace(" !", "!") 8 | text = text.replace(" ?", "?").replace(" )", ")").replace("( ", "(") 9 | return text 10 | 11 | 12 | def drop_bad_words(text, max_len=30, return_digits=None): 13 | parts = re.split('(\\W)', text) 14 | if max_len: 15 | parts = [w for w in parts if len(w) <= max_len] 16 | if return_digits is not None: 17 | parts = [str(return_digits) if p == 'DIGIT' else p for p in parts] 18 | return ''.join(parts) 19 | 20 | 21 | def text_preprocess(text): 22 | # strip punctuation on the left 23 | text = text.lstrip(punkt) 24 | # remove exrea spaces after tokenization 25 | text = detokenize(text) 26 | # remove too long words because generally they confuse a seq2seq model 27 | # and often they are meaningless combinations of characters 28 | text = drop_bad_words(text) 29 | return text 30 | 31 | 32 | def text_postprocess(text): 33 | # strip multiple punctuation on the rigth 34 | res2 = text.rstrip(punkt) 35 | if len(res2) < len(text): 36 | res2 += text[len(res2)] 37 | return res2 38 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | tqdm 2 | numpy 3 | pandas 4 | torch 5 | nltk 6 | transformers==4.24 7 | fairseq==0.10.0 8 | sentencepiece 9 | keras_preprocessing 10 | flair==0.11 11 | scipy 12 | --------------------------------------------------------------------------------