├── README.md └── update.py /README.md: -------------------------------------------------------------------------------- 1 | Bias in NLP 2 | ======= 3 | 4 | 5 | This is a collection of natural language processing papers that deal with bias (mostly gender bias). The list is by no means complete and is just a way to keep up with the large amount of papers in that area. If you miss a paper, please add it. 6 | 7 | 8 | 9 | 10 | Papers 11 | ----- 12 | 13 | 14 | **Towards Detection of Subjective Bias using Contextualized Word Embeddings** 15 | WebConf2020 - [Paper](https://arxiv.org/pdf/2002.06644.pdf), [Code](https://github.com/tanvidadu/Subjective-Bias-Detection) 16 | *Note:* Wikineutrality Corpus. 17 | 18 | 19 | **Joint Multiclass Debiasing of Word Embeddings** 20 | ISMIS2020 - [Paper](https://arxiv.org/pdf/2003.11520.pdf), [Code](https://github.com/RadomirPopovicFON/Joint-Multiclass-Debiasing-of-Word-Embeddings) 21 | *Note:* Hard and Soft WEAT 22 | 23 | 24 | **Towards Debiasing Sentence Representations** 25 | ACL2020 - [Paper](https://pdfs.semanticscholar.org/0d96/5ed237a3b4592ecefdb618c29f63adedff76.pdf), [Code](https://github.com/pliang279/sent_debias) 26 | *Note:* Sentence-level debiasing. Difference between pretraining and finetuning. 27 | 28 | 29 | **Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation** 30 | arxiv2020 - [Paper](https://arxiv.org/pdf/2004.03133.pdf) 31 | *Note:* Counterfactual generation. 32 | 33 | 34 | **Unsupervised Discovery of Implicit Gender Bias** 35 | arxiv2020 - [Paper](https://arxiv.org/pdf/2004.08361.pdf), [Code](https://github.com/anjalief/unsupervised_gender_bias) 36 | *Note:* Unsupervised bias detection from comments. 37 | 38 | 39 | **StereoSet: Measuring stereotypical bias in pretrained language models** 40 | arxiv2020 - [Paper](https://arxiv.org/pdf/2004.09456.pdf), [Code](https://stereoset.mit.edu/) 41 | *Note:* Benchmark and Dataset for measuring bias in 4 domains (gender, profession, race, religion). 42 | 43 | 44 | **Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation** 45 | ACL2020 - [Paper](https://arxiv.org/pdf/2005.00965.pdf), [Code](https://github.com/uvavision/Double-Hard-Debias) 46 | *Note:* Double Hard Debias: mitigigate dataset and then do debiasing 47 | 48 | 49 | **Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer** 50 | ACL2020 - [Paper](https://arxiv.org/pdf/2005.00699.pdf) 51 | *Note:* Bias in multilingual embeddings depends on the alignment direction. 52 | 53 | 54 | **Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation** 55 | arxiv2020 - [Paper](https://arxiv.org/pdf/2006.08881.pdf) 56 | *Note:* Gender labels for pronouns in MT English-Spanish. 57 | 58 | 59 | **Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases** 60 | arxiv2020 - [Paper](https://arxiv.org/pdf/2006.03955.pdf) 61 | *Note:* CEAT 62 | 63 | 64 | **OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings** 65 | arxiv2020 - [Paper](https://arxiv.org/pdf/2007.00049.pdf) 66 | *Note:* Preserve semantic meaning of embeddings. 67 | 68 | 69 | **Investigating Gender Bias in BERT** 70 | arxiv2020 - [Paper](https://arxiv.org/pdf/2009.05021.pdf) 71 | *Note:* Identify one gender direction per BERT layer. 72 | 73 | 74 | **Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias** 75 | arxiv2020 - [Paper](https://arxiv.org/pdf/2009.11982.pdf), [Code](https://github.com/anavaleriagonzalez/ABC-dataset) 76 | *Note:* Multilingual multitask dataset across 4 languages. 77 | 78 | 79 | **Towards Debiasing NLU Models from Unknown Biases** 80 | arxiv2020 - [Paper](https://arxiv.org/pdf/2009.12303.pdf), [Code](https://github.com/UKPLab/emnlp2020-debiasing-unknown) 81 | *Note:* Unsupervised bias detection. 82 | 83 | 84 | **Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs** 85 | arxiv2020 - [Paper](https://arxiv.org/pdf/2010.02847.pdf), [Code](https://github.com/alisonsneyd/Gender_bias_word_embeddings) 86 | *Note:* Choice of base pairs is relevant. 87 | 88 | 89 | **LOGAN: Local Group Bias Detection by Clustering** 90 | arxiv2020 - [Paper](https://arxiv.org/pdf/2010.02867.pdf) 91 | *Note:* Identify biases through clustering. 92 | 93 | 94 | **Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation** 95 | arxiv2020 - [Paper](https://arxiv.org/pdf/2009.09435.pdf) 96 | *Note:* Verify whether non-linear debiasing helps. It seems not. 97 | 98 | 99 | **Unmasking Contextual Stereotypes: Measuring and Mitigating BERT’s Gender Bias** 100 | GeBNLP2020 - [Paper](https://arxiv.org/pdf/2010.14534.pdf), [Code](https://github.com/marionbartl/gender-bias-BERT) 101 | *Note:* Verify gender debiasing techniques in German. 102 | 103 | 104 | **Language (Technology) is Power: A Critical Survey of “Bias” in NLP** 105 | arxiv2020 - [Paper](https://arxiv.org/pdf/2005.14050.pdf) 106 | *Note:* Metastudy: survey of 146 gender bias papers 107 | 108 | 109 | **Pick a Fight or Bite your Tongue: Investigation of Gender Differences in Idiomatic Language Usage** 110 | arxiv2020 - [Paper](https://arxiv.org/pdf/2011.00335.pdf) 111 | *Note:* Idiomatic expressions depending on the speaker. 112 | 113 | 114 | **Evaluating Bias In Dutch Word Embeddings** 115 | GeBNLP2020 - [Paper](https://arxiv.org/pdf/2011.00244.pdf), [Code](https://github.com/Noixas/Official-Evaluating-Bias-In-Dutch) 116 | *Note:* Examining bias in Dutch (using WEAT) 117 | 118 | 119 | **Analyzing Gender Bias within Narrative Tropes** 120 | arxiv2020 - [Paper](https://arxiv.org/pdf/2011.00092.pdf), [Code](https://github.com/dhruvilgala/tvtropes) 121 | *Note:* Analyze bias using tropes 122 | 123 | 124 | **Neural Machine Translation Doesn’t Translate Gender Coreference Right Unless You Make It** 125 | GeBNLP2020 - [Paper](https://arxiv.org/pdf/2010.05332.pdf), [Code](https://github.com/DCSaunders/tagged-gender-coref) 126 | *Note:* Incorporate explicit word-level gender tags. 127 | 128 | 129 | **The Gap on GAP: Tackling the Problem of Differing Data Distributions in Bias-Measuring Datasets** 130 | NeurIPS 2020 - [Paper](https://arxiv.org/pdf/2011.01837.pdf), [Code](https://github.com/vid-koci/weightingGAP) 131 | *Note:* Distances in GAP play a role. 132 | 133 | 134 | **AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings** 135 | arxiv2020 - [Paper](https://arxiv.org/pdf/2011.01575.pdf), [Code](https://github.com/bakrianoo/aravec) 136 | *Note:* Arabic WEAT. 137 | 138 | 139 | **Characterising Bias in Compressed Models** 140 | arxiv2020 - [Paper](https://arxiv.org/pdf/2010.03058.pdf) 141 | *Note:* Bias in compressed model is large. Provide method to identify biased examples. 142 | 143 | 144 | **Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them** 145 | NAACL2019 - [Paper](https://arxiv.org/pdf/1903.03862.pdf), [Code](https://github.com/gonenhila/gender_bias_lipstick) 146 | *Note:* Debiasing by setting dimensions to zero ist not effective 147 | 148 | 149 | **Equalizing Gender Bias in Neural Machine Translation with Word Embeddings Techniques** 150 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1901.03116.pdf) 151 | *Note:* Spanisch-Englisch translation with occupations. 152 | 153 | 154 | **Evaluating the Underlying Gender Bias in Contextualized Word Embeddings** 155 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1904.08783.pdf) 156 | *Note:* Cointextualized embeddings are less biased than static ones. 157 | 158 | 159 | **Mitigating Gender Bias in Natural Language Processing: Literature Review** 160 | ACL2019 - [Paper](https://www.aclweb.org/anthology/P19-1159.pdf) 161 | *Note:* Survey 162 | 163 | 164 | **What's in a Name? Reducing Bias in Bios without Access to Protected Attributes** 165 | NAACL2019 - [Paper](https://arxiv.org/abs/1904.05233) 166 | *Note:* Work on biographies. 167 | 168 | 169 | **Assessing Social and Intersectional Biases in Contextualized Word Representations** 170 | NeurIPS2019 - [Paper](https://papers.nips.cc/paper/9479-assessing-social-and-intersectional-biases-in-contextualized-word-representations.pdf) 171 | *Note:* Strong bias in contextualized embeddings. Bias not always visible on sentence level. 172 | 173 | 174 | **It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution** 175 | EMNLP2019 - [Paper](https://arxiv.org/pdf/1909.00871.pdf) 176 | *Note:* Counterfactual Data Substitution (CDS) 177 | 178 | 179 | **Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis** 180 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1906.10256.pdf), [Code](https://github.com/jayadevbhaskaran/gendered-sentiment) 181 | *Note:* Dataset of 800 sentences analysed with sentiment analysis. 182 | 183 | 184 | **Automatic Gender Identification and Reinflection in Arabic** 185 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3822) 186 | *Note:* Arabic English Translation with focus on getting the pronouns right. 187 | 188 | 189 | **Gendered Ambiguous Pronouns Shared Task: Boosting Model Confidence by Evidence Pooling** 190 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3820.pdf), [Code](https://github.com/sattree/gap) 191 | *Note:* Shared task winner GAP 192 | 193 | 194 | **Gendered Ambiguous Pronouns (GAP) Shared Task at the Gender Bias in NLP Workshop 2019** 195 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3801/), [Code](https://github.com/google-research-datasets/gap-coreference) 196 | *Note:* GAP shared task description 197 | 198 | 199 | **Conceptor Debiasing of Word Representations Evaluated on WEAT** 200 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1906.05993.pdf) 201 | *Note:* Proposes Conceptor Debiasing. 202 | 203 | 204 | **On Measuring Gender Bias in Translation of Gender-neutral Pronouns** 205 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1905.11684.pdf), [Code](https://github.com/nolongerprejudice/tgbi) 206 | *Note:* Gender bias in pronoun translation Korean-English 207 | 208 | 209 | **Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories** 210 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3804), [Code](https://github.com/alfredomg/GeBNLP2019) 211 | *Note:* Clustering method for discovering new biases. 212 | 213 | 214 | **The Role of Protected Class Word Lists in Bias Identification of Contextualized Word Representations** 215 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3808) 216 | *Note:* Uses conceptor debiasing 217 | 218 | 219 | **The Woman Worked as a Babysitter: On Biases in Language Generation** 220 | EMNLP2019 - [Paper](https://arxiv.org/pdf/1909.01326.pdf), [Code](https://github.com/ewsheng/nlg-bias) 221 | *Note:* Regard and Sentiment. Annotations released. 222 | 223 | 224 | **Exploring Human Gender Stereotypes with Word Association Test** 225 | EMNLP2019 - [Paper](https://www.aclweb.org/anthology/D19-1635.pdf), [Code](https://github.com/Yupei-Du/bias-in-wat) 226 | *Note:* Word association graphs 227 | 228 | 229 | **Gender-preserving Debiasing for Pre-trained Word Embeddings** 230 | ACL2019 - [Paper](https://arxiv.org/pdf/1906.00742.pdf), [Code](https://github.com/kanekomasahiro/gp_debias) 231 | *Note:* Differentiate between bias and gender information. 232 | 233 | 234 | **Quantifying Social Biases in Contextual Word Representations** 235 | GeBNLP 2019 - [Paper](https://www.cs.cmu.edu/~ytsvetko/papers/bias_in_bert.pdf) 236 | *Note:* Template based method to quantify bias. 237 | 238 | 239 | **Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting** 240 | ACM Fat 2019 - [Paper](https://arxiv.org/pdf/1901.09451.pdf) 241 | *Note:* Analyze effects of bias. 242 | 243 | 244 | **Gender Bias in Neural Natural Language Processing** 245 | Logic, Language, and Security. Springer. 2018 - [Paper](https://arxiv.org/pdf/1807.11714.pdf) 246 | *Note:* Counterfactual Data Augmentation (CDA). Clear definition of Bias. Evaluates on coreference resolution and language modelling. 247 | 248 | 249 | **Gender Bias in Coreference Resolution** 250 | NAACL2018 - [Paper](https://arxiv.org/pdf/1804.09301.pdf), [Code](https://github.com/rudinger/winogender-schemas) 251 | *Note:* Windogender schemes. 252 | 253 | 254 | **Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings** 255 | NeurIPS2016 - [Paper](http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf), [Code](https://github.com/tolga-b/debiaswe) 256 | *Note:* Among the first to address gender bias 257 | 258 | 259 | **Rejecting the Gender Binary: A Vector-Space Operation.** 260 | 2015 - [Paper](http://bookworm.benschmidt.org/posts/2015-10-30-rejecting-the-gender-binary.html) 261 | *Note:* Blog post: first to propose to remove gender dimension 262 | 263 | 264 | TODOS 265 | * add https://arxiv.org/pdf/2011.12086.pdf 266 | * add https://arxiv.org/pdf/2011.12096.pdf 267 | * add papers from [GeBNLP2020](https://genderbiasnlp.talp.cat/gebnlp2020/accepted-papers/) once they are available. 268 | 269 | -------------------------------------------------------------------------------- /update.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | 4 | def main(args): 5 | entries = [] 6 | with open(args.input) as fp: 7 | next(fp) 8 | for line in fp: 9 | line = line.strip().split("\t") 10 | entry = {"title": line[0], 11 | "paperlink": line[1], 12 | "codelink": line[2], 13 | "venue": line[4], 14 | "year": int(line[4][-4:]), 15 | "comment": line[6]} 16 | entries.append(entry) 17 | 18 | # sort and write 19 | entries = sorted(entries, key=lambda x: x["year"], reverse=True) 20 | 21 | entry_format = """ 22 | **{title}** 23 | {venue} - [Paper]({paperlink}), [Code]({codelink}) 24 | *Note:* {comment} 25 | """ 26 | for entry in entries: 27 | to_add = entry_format.format(**entry) 28 | to_add = to_add.replace(", [Code]()", "") 29 | print(to_add) 30 | 31 | 32 | if __name__ == '__main__': 33 | ''' 34 | Usage: 35 | python update.py --input /Users/philipp/Downloads/Gender Bias Overview - RelatedWork.tsv >> output.md 36 | ''' 37 | parser = argparse.ArgumentParser() 38 | parser.add_argument("--input", default=None, type=str, required=True, help="") 39 | args = parser.parse_args() 40 | main(args) 41 | --------------------------------------------------------------------------------