The research presents a new dataset, GEST, for measuring gender stereotypical reasoning in masked language models and machine translation systems. The dataset contains samples compatible with 9 Slavic languages and English, focusing on 16 gender stereotypes. The researchers used GEST to evaluate 11 masked language models and 4 machine translation systems, discovering significant amounts of stereotypical reasoning across models and languages. The study emphasizes the importance of understanding and monitoring such biases in NLP systems.

 

Publication date: 30 Nov 2023
Project Page: https://github.com/kinit-sk/gestar
Paper: https://arxiv.org/pdf/2311.18711