ConGA: Guidelines for Contextual Gender Annotation. A Framework for Annotating Gender in Machine Translation

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the pervasive gender bias in machine translation (MT) when translating from languages without grammatical gender (e.g., English) into morphologically gendered languages (e.g., Italian), where systems often default to masculine forms due to missing gender cues. To tackle this, we propose ConGA, a novel framework that introduces a fine-grained annotation scheme integrating semantic gender (male/female/ambiguous) with grammatical gender, alongside a cross-sentential entity coreference tracking mechanism. Leveraging this approach, we construct a gold-standard annotated resource on the gENder-IT dataset, revealing systematic issues in current MT systems—including overuse of masculine forms and inconsistent rendering of feminine expressions. Our benchmark offers a linguistically grounded, scalable foundation for evaluating and advancing gender-fair machine translation.

Technology Category

Application Category

📝 Abstract
Handling gender across languages remains a persistent challenge for Machine Translation (MT) and Large Language Models (LLMs), especially when translating from gender-neutral languages into morphologically gendered ones, such as English to Italian. English largely omits grammatical gender, while Italian requires explicit agreement across multiple grammatical categories. This asymmetry often leads MT systems to default to masculine forms, reinforcing bias and reducing translation accuracy. To address this issue, we present the Contextual Gender Annotation (ConGA) framework, a linguistically grounded set of guidelines for word-level gender annotation. The scheme distinguishes between semantic gender in English through three tags, Masculine (M), Feminine (F), and Ambiguous (A), and grammatical gender realisation in Italian (Masculine (M), Feminine (F)), combined with entity-level identifiers for cross-sentence tracking. We apply ConGA to the gENder-IT dataset, creating a gold-standard resource for evaluating gender bias in translation. Our results reveal systematic masculine overuse and inconsistent feminine realisation, highlighting persistent limitations of current MT systems. By combining fine-grained linguistic annotation with quantitative evaluation, this work offers both a methodology and a benchmark for building more gender-aware and multilingual NLP systems.
Problem

Research questions and friction points this paper is trying to address.

gender bias
machine translation
grammatical gender
cross-lingual
gender annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual Gender Annotation
Machine Translation
Gender Bias
Linguistic Annotation
Multilingual NLP
🔎 Similar Papers
No similar papers found.
A
Argentina Anna Rescigno
University of Pisa, University of Naples “L’Orientale”
E
Eva Vanmassenhove
Tilburg University
Johanna Monti
Johanna Monti
Professor of Modern Languages Teaching, Università di Napoli L'Orientale
Computational LinguisticsMachine TranslationComputer Aided TranslationLocalisationArtificial