🤖 AI Summary
Existing language models for morphologically rich, grammatically gendered languages (e.g., Italian) perpetuate gender stereotypes and inadequately represent non-binary identities. Method: We propose the first systematic framework for gender-fair language generation, comprising three core tasks—gendered expression detection, gender-neutral rewriting, and gender-fair English–Italian translation—and release three high-quality, expert-annotated datasets: GFL-it, GeNTE, and Neo-GATE. We design a multitask evaluation framework featuring a novel weighted metric for non-binary neologism coverage, integrated with BERTScore-based semantic similarity, task-specific classifiers, and rule-guided fine-tuning. Contribution/Results: Our work establishes a reproducible, unified benchmark; defines three new evaluation metrics; and significantly improves model capacity to generate non-binary expressions and de-stereotyped language—advancing gender-fair NLP from bias detection toward equitable generative applications.
📝 Abstract
Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities and avoid reinforcing gender stereotypes. Implementing gender-fair strategies is particularly challenging in heavily gender-marked languages, such as Italian. To address this, the Gender-Fair Generation challenge intends to help shift toward gender-fair language in written communication. The challenge, designed to assess and monitor the recognition and generation of gender-fair language in both mono- and cross-lingual scenarios, includes three tasks: (1) the detection of gendered expressions in Italian sentences, (2) the reformulation of gendered expressions into gender-fair alternatives, and (3) the generation of gender-fair language in automatic translation from English to Italian. The challenge relies on three different annotated datasets: the GFL-it corpus, which contains Italian texts extracted from administrative documents provided by the University of Brescia; GeNTE, a bilingual test set for gender-neutral rewriting and translation built upon a subset of the Europarl dataset; and Neo-GATE, a bilingual test set designed to assess the use of non-binary neomorphemes in Italian for both fair formulation and translation tasks. Finally, each task is evaluated with specific metrics: average of F1-score obtained by means of BERTScore computed on each entry of the datasets for task 1, an accuracy measured with a gender-neutral classifier, and a coverage-weighted accuracy for tasks 2 and 3.