├── README.md
└── update.py


/README.md:
--------------------------------------------------------------------------------
  1 | Bias in NLP
  2 | =======
  3 | 
  4 | 
  5 | This is a collection of natural language processing papers that deal with bias (mostly gender bias). The list is by no means complete and is just a way to keep up with the large amount of papers in that area. If you miss a paper, please add it. 
  6 | 
  7 | 
  8 | 
  9 | 
 10 | Papers
 11 | -----
 12 | 
 13 | 
 14 | **Towards Detection of Subjective Bias using Contextualized Word Embeddings**  
 15 | WebConf2020 - [Paper](https://arxiv.org/pdf/2002.06644.pdf), [Code](https://github.com/tanvidadu/Subjective-Bias-Detection)  
 16 | *Note:* Wikineutrality Corpus.
 17 | 
 18 | 
 19 | **Joint Multiclass Debiasing of Word Embeddings**  
 20 | ISMIS2020 - [Paper](https://arxiv.org/pdf/2003.11520.pdf), [Code](https://github.com/RadomirPopovicFON/Joint-Multiclass-Debiasing-of-Word-Embeddings)  
 21 | *Note:* Hard and Soft WEAT
 22 | 
 23 | 
 24 | **Towards Debiasing Sentence Representations**  
 25 | ACL2020 - [Paper](https://pdfs.semanticscholar.org/0d96/5ed237a3b4592ecefdb618c29f63adedff76.pdf), [Code](https://github.com/pliang279/sent_debias)  
 26 | *Note:* Sentence-level debiasing. Difference between pretraining and finetuning. 
 27 | 
 28 | 
 29 | **Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation**  
 30 | arxiv2020 - [Paper](https://arxiv.org/pdf/2004.03133.pdf)  
 31 | *Note:* Counterfactual generation.
 32 | 
 33 | 
 34 | **Unsupervised Discovery of Implicit Gender Bias**  
 35 | arxiv2020 - [Paper](https://arxiv.org/pdf/2004.08361.pdf), [Code](https://github.com/anjalief/unsupervised_gender_bias)  
 36 | *Note:* Unsupervised bias detection from comments. 
 37 | 
 38 | 
 39 | **StereoSet: Measuring stereotypical bias in pretrained language models**  
 40 | arxiv2020 - [Paper](https://arxiv.org/pdf/2004.09456.pdf), [Code](https://stereoset.mit.edu/)  
 41 | *Note:* Benchmark and Dataset for measuring bias in 4 domains (gender, profession, race, religion).
 42 | 
 43 | 
 44 | **Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation**  
 45 | ACL2020 - [Paper](https://arxiv.org/pdf/2005.00965.pdf), [Code](https://github.com/uvavision/Double-Hard-Debias)  
 46 | *Note:* Double Hard Debias: mitigigate dataset and then do debiasing
 47 | 
 48 | 
 49 | **Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer**  
 50 | ACL2020 - [Paper](https://arxiv.org/pdf/2005.00699.pdf)  
 51 | *Note:* Bias in multilingual embeddings depends on the alignment direction.
 52 | 
 53 | 
 54 | **Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation**  
 55 | arxiv2020 - [Paper](https://arxiv.org/pdf/2006.08881.pdf)  
 56 | *Note:* Gender labels for pronouns in MT English-Spanish.
 57 | 
 58 | 
 59 | **Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases**  
 60 | arxiv2020 - [Paper](https://arxiv.org/pdf/2006.03955.pdf)  
 61 | *Note:* CEAT
 62 | 
 63 | 
 64 | **OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings**  
 65 | arxiv2020 - [Paper](https://arxiv.org/pdf/2007.00049.pdf)  
 66 | *Note:* Preserve semantic meaning of embeddings. 
 67 | 
 68 | 
 69 | **Investigating Gender Bias in BERT**  
 70 | arxiv2020 - [Paper](https://arxiv.org/pdf/2009.05021.pdf)  
 71 | *Note:* Identify one gender direction per BERT layer.
 72 | 
 73 | 
 74 | **Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias**  
 75 | arxiv2020 - [Paper](https://arxiv.org/pdf/2009.11982.pdf), [Code](https://github.com/anavaleriagonzalez/ABC-dataset)  
 76 | *Note:* Multilingual multitask dataset across 4 languages.
 77 | 
 78 | 
 79 | **Towards Debiasing NLU Models from Unknown Biases**  
 80 | arxiv2020 - [Paper](https://arxiv.org/pdf/2009.12303.pdf), [Code](https://github.com/UKPLab/emnlp2020-debiasing-unknown)  
 81 | *Note:* Unsupervised bias detection.
 82 | 
 83 | 
 84 | **Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs**  
 85 | arxiv2020 - [Paper](https://arxiv.org/pdf/2010.02847.pdf), [Code](https://github.com/alisonsneyd/Gender_bias_word_embeddings)  
 86 | *Note:* Choice of base pairs is relevant.
 87 | 
 88 | 
 89 | **LOGAN: Local Group Bias Detection by Clustering**  
 90 | arxiv2020 - [Paper](https://arxiv.org/pdf/2010.02867.pdf)  
 91 | *Note:* Identify biases through clustering.
 92 | 
 93 | 
 94 | **Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation**  
 95 | arxiv2020 - [Paper](https://arxiv.org/pdf/2009.09435.pdf)  
 96 | *Note:* Verify whether non-linear debiasing helps. It seems not.
 97 | 
 98 | 
 99 | **Unmasking Contextual Stereotypes: Measuring and Mitigating BERT’s Gender Bias**  
100 | GeBNLP2020 - [Paper](https://arxiv.org/pdf/2010.14534.pdf), [Code](https://github.com/marionbartl/gender-bias-BERT)  
101 | *Note:* Verify gender debiasing techniques in German.
102 | 
103 | 
104 | **Language (Technology) is Power: A Critical Survey of “Bias” in NLP**  
105 | arxiv2020 - [Paper](https://arxiv.org/pdf/2005.14050.pdf)  
106 | *Note:* Metastudy: survey of 146 gender bias papers
107 | 
108 | 
109 | **Pick a Fight or Bite your Tongue: Investigation of Gender Differences in Idiomatic Language Usage**  
110 | arxiv2020 - [Paper](https://arxiv.org/pdf/2011.00335.pdf)  
111 | *Note:* Idiomatic expressions depending on the speaker.
112 | 
113 | 
114 | **Evaluating Bias In Dutch Word Embeddings**  
115 | GeBNLP2020 - [Paper](https://arxiv.org/pdf/2011.00244.pdf), [Code](https://github.com/Noixas/Official-Evaluating-Bias-In-Dutch)  
116 | *Note:* Examining bias in Dutch (using WEAT)
117 | 
118 | 
119 | **Analyzing Gender Bias within Narrative Tropes**  
120 | arxiv2020 - [Paper](https://arxiv.org/pdf/2011.00092.pdf), [Code](https://github.com/dhruvilgala/tvtropes)  
121 | *Note:* Analyze bias using tropes
122 | 
123 | 
124 | **Neural Machine Translation Doesn’t Translate Gender Coreference Right Unless You Make It**  
125 | GeBNLP2020 - [Paper](https://arxiv.org/pdf/2010.05332.pdf), [Code](https://github.com/DCSaunders/tagged-gender-coref)  
126 | *Note:* Incorporate explicit word-level gender tags.
127 | 
128 | 
129 | **The Gap on GAP: Tackling the Problem of Differing Data Distributions in Bias-Measuring Datasets**  
130 | NeurIPS 2020 - [Paper](https://arxiv.org/pdf/2011.01837.pdf), [Code](https://github.com/vid-koci/weightingGAP)  
131 | *Note:* Distances in GAP play a role.
132 | 
133 | 
134 | **AraWEAT: Multidimensional Analysis of Biases in Arabic Word Embeddings**  
135 | arxiv2020 - [Paper](https://arxiv.org/pdf/2011.01575.pdf), [Code](https://github.com/bakrianoo/aravec)  
136 | *Note:* Arabic WEAT.
137 | 
138 | 
139 | **Characterising Bias in Compressed Models**  
140 | arxiv2020 - [Paper](https://arxiv.org/pdf/2010.03058.pdf)  
141 | *Note:* Bias in compressed model is large. Provide method to identify biased examples.
142 | 
143 | 
144 | **Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them**  
145 | NAACL2019 - [Paper](https://arxiv.org/pdf/1903.03862.pdf), [Code](https://github.com/gonenhila/gender_bias_lipstick)  
146 | *Note:* Debiasing by setting dimensions to zero ist not effective
147 | 
148 | 
149 | **Equalizing Gender Bias in Neural Machine Translation with Word Embeddings Techniques**  
150 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1901.03116.pdf)  
151 | *Note:* Spanisch-Englisch translation with occupations.
152 | 
153 | 
154 | **Evaluating the Underlying Gender Bias in Contextualized Word Embeddings**  
155 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1904.08783.pdf)  
156 | *Note:* Cointextualized embeddings are less biased than static ones. 
157 | 
158 | 
159 | **Mitigating Gender Bias in Natural Language Processing: Literature Review**  
160 | ACL2019 - [Paper](https://www.aclweb.org/anthology/P19-1159.pdf)  
161 | *Note:* Survey
162 | 
163 | 
164 | **What's in a Name? Reducing Bias in Bios without Access to Protected Attributes**  
165 | NAACL2019 - [Paper](https://arxiv.org/abs/1904.05233)  
166 | *Note:* Work on biographies. 
167 | 
168 | 
169 | **Assessing Social and Intersectional Biases in Contextualized Word Representations**  
170 | NeurIPS2019 - [Paper](https://papers.nips.cc/paper/9479-assessing-social-and-intersectional-biases-in-contextualized-word-representations.pdf)  
171 | *Note:* Strong bias in contextualized embeddings. Bias not always visible on sentence level. 
172 | 
173 | 
174 | **It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution**  
175 | EMNLP2019 - [Paper](https://arxiv.org/pdf/1909.00871.pdf)  
176 | *Note:* Counterfactual Data Substitution (CDS)
177 | 
178 | 
179 | **Good Secretaries, Bad Truck Drivers? Occupational Gender Stereotypes in Sentiment Analysis**  
180 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1906.10256.pdf), [Code](https://github.com/jayadevbhaskaran/gendered-sentiment)  
181 | *Note:* Dataset of 800 sentences analysed with sentiment analysis.
182 | 
183 | 
184 | **Automatic Gender Identification and Reinflection in Arabic**  
185 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3822)  
186 | *Note:* Arabic English Translation with focus on getting the pronouns right.
187 | 
188 | 
189 | **Gendered Ambiguous Pronouns Shared Task: Boosting Model Confidence by Evidence Pooling**  
190 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3820.pdf), [Code](https://github.com/sattree/gap)  
191 | *Note:* Shared task winner GAP
192 | 
193 | 
194 | **Gendered Ambiguous Pronouns (GAP) Shared Task at the Gender Bias in NLP Workshop 2019**  
195 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3801/), [Code](https://github.com/google-research-datasets/gap-coreference)  
196 | *Note:* GAP shared task description
197 | 
198 | 
199 | **Conceptor Debiasing of Word Representations Evaluated on WEAT**  
200 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1906.05993.pdf)  
201 | *Note:* Proposes Conceptor Debiasing.
202 | 
203 | 
204 | **On Measuring Gender Bias in Translation of Gender-neutral Pronouns**  
205 | GeBNLP 2019 - [Paper](https://arxiv.org/pdf/1905.11684.pdf), [Code](https://github.com/nolongerprejudice/tgbi)  
206 | *Note:* Gender bias in pronoun translation Korean-English
207 | 
208 | 
209 | **Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories**  
210 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3804), [Code](https://github.com/alfredomg/GeBNLP2019)  
211 | *Note:* Clustering method for discovering new biases. 
212 | 
213 | 
214 | **The Role of Protected Class Word Lists in Bias Identification of Contextualized Word Representations**  
215 | GeBNLP 2019 - [Paper](https://www.aclweb.org/anthology/W19-3808)  
216 | *Note:* Uses conceptor debiasing
217 | 
218 | 
219 | **The Woman Worked as a Babysitter: On Biases in Language Generation**  
220 | EMNLP2019 - [Paper](https://arxiv.org/pdf/1909.01326.pdf), [Code](https://github.com/ewsheng/nlg-bias)  
221 | *Note:* Regard and Sentiment. Annotations released. 
222 | 
223 | 
224 | **Exploring Human Gender Stereotypes with Word Association Test**  
225 | EMNLP2019 - [Paper](https://www.aclweb.org/anthology/D19-1635.pdf), [Code](https://github.com/Yupei-Du/bias-in-wat)  
226 | *Note:* Word association graphs
227 | 
228 | 
229 | **Gender-preserving Debiasing for Pre-trained Word Embeddings**  
230 | ACL2019 - [Paper](https://arxiv.org/pdf/1906.00742.pdf), [Code](https://github.com/kanekomasahiro/gp_debias)  
231 | *Note:* Differentiate between bias and gender information. 
232 | 
233 | 
234 | **Quantifying Social Biases in Contextual Word Representations**  
235 | GeBNLP 2019 - [Paper](https://www.cs.cmu.edu/~ytsvetko/papers/bias_in_bert.pdf)  
236 | *Note:* Template based method to quantify bias.
237 | 
238 | 
239 | **Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting**  
240 | ACM Fat 2019 - [Paper](https://arxiv.org/pdf/1901.09451.pdf)  
241 | *Note:* Analyze effects of bias.
242 | 
243 | 
244 | **Gender Bias in Neural Natural Language Processing**  
245 | Logic, Language, and Security. Springer. 2018 - [Paper](https://arxiv.org/pdf/1807.11714.pdf)  
246 | *Note:* Counterfactual Data Augmentation (CDA). Clear definition of Bias. Evaluates on coreference resolution and language modelling.
247 | 
248 | 
249 | **Gender Bias in Coreference Resolution**  
250 | NAACL2018 - [Paper](https://arxiv.org/pdf/1804.09301.pdf), [Code](https://github.com/rudinger/winogender-schemas)  
251 | *Note:* Windogender schemes.
252 | 
253 | 
254 | **Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings**  
255 | NeurIPS2016 - [Paper](http://papers.nips.cc/paper/6228-man-is-to-computer-programmer-as-woman-is-to-homemaker-debiasing-word-embeddings.pdf), [Code](https://github.com/tolga-b/debiaswe)  
256 | *Note:* Among the first to address gender bias
257 | 
258 | 
259 | **Rejecting the Gender Binary: A Vector-Space Operation.**  
260 | 2015 - [Paper](http://bookworm.benschmidt.org/posts/2015-10-30-rejecting-the-gender-binary.html)  
261 | *Note:* Blog post: first to propose to remove gender dimension
262 | 
263 | 
264 | TODOS
265 | * add https://arxiv.org/pdf/2011.12086.pdf
266 | * add https://arxiv.org/pdf/2011.12096.pdf
267 | * add papers from [GeBNLP2020](https://genderbiasnlp.talp.cat/gebnlp2020/accepted-papers/) once they are available.
268 | 
269 | 


--------------------------------------------------------------------------------
/update.py:
--------------------------------------------------------------------------------
 1 | import argparse
 2 | 
 3 | 
 4 | def main(args):
 5 |     entries = []
 6 |     with open(args.input) as fp:
 7 |         next(fp)
 8 |         for line in fp:
 9 |             line = line.strip().split("\t")
10 |             entry = {"title": line[0],
11 |                      "paperlink": line[1],
12 |                      "codelink": line[2],
13 |                      "venue": line[4],
14 |                      "year": int(line[4][-4:]),
15 |                      "comment": line[6]}
16 |             entries.append(entry)
17 | 
18 |     # sort and write
19 |     entries = sorted(entries, key=lambda x: x["year"], reverse=True)
20 | 
21 |     entry_format = """
22 |     **{title}**  
23 |     {venue} - [Paper]({paperlink}), [Code]({codelink})  
24 |     *Note:* {comment}
25 |     """
26 |     for entry in entries:
27 |         to_add = entry_format.format(**entry)
28 |         to_add = to_add.replace(", [Code]()", "")
29 |         print(to_add)
30 | 
31 | 
32 | if __name__ == '__main__':
33 |     '''
34 |     Usage:
35 |     python update.py --input /Users/philipp/Downloads/Gender Bias Overview - RelatedWork.tsv >> output.md
36 |     '''
37 |     parser = argparse.ArgumentParser()
38 |     parser.add_argument("--input", default=None, type=str, required=True, help="")
39 |     args = parser.parse_args()
40 |     main(args)
41 | 


--------------------------------------------------------------------------------