Graphical Identification of Gender Bias in BERT with a Weakly Supervised Approach

Dusi, M.; Arici, N.; Gerevini, A. E.; Putelli, L.; Serina, I.

Transformer-based algorithms such as BERT are typically trained with large corpora of documents, extracted directly from the Internet. As reported by several studies, these data can contain biases, stereotypes and other properties which are transferred also to the machine learning models, potentially leading them to a discriminatory behaviour which should be identified and corrected. A very intuitive technique for bias identification in NLP models is the visualization of word embeddings. Exploiting the concept of that a short distance between two word vectors means a semantic similarity between these two words; for instance, a closeness between the terms nurse and woman could be an indicator of gender bias in the model. These techniques however were designed for static word embedding algorithms such as Word2Vec. Instead, BERT does not guarantee the same relation between semantic similarity and short distance, making the visualization techniques more difficult to apply. In this work, we propose a weakly supervised approach, which only requires a list of gendered words that can be easily found in online lexical resources, for visualizing the gender bias present in the English base model of BERT. Our approach is based on a Linear Support Vector Classifier and Principal Component Analysis (PCA) and obtains better results with respect to standard PCA.