🤖 AI Summary
Current multilingual large language models lack systematic evaluation on fine-grained linguistic phenomena such as grammatical gender and morphological agreement, particularly in morphologically rich languages. This work proposes MORPHOGEN, the first benchmark specifically designed to evaluate gender-sensitive morphological generation across languages. It introduces the GENFORM task, which requires models to transform first-person sentences into their opposite-gender forms while preserving semantic content and syntactic structure. The benchmark covers French, Arabic, and Hindi—three typologically diverse languages—and employs synthetically generated, high-quality evaluation data. A systematic assessment of 15 prominent multilingual models (ranging from 2B to 70B parameters) reveals substantial deficiencies in gender-aware morphological generation and notable cross-lingual variation, offering both a diagnostic tool and foundational insights for developing more inclusive and morphologically sensitive NLP systems.
📝 Abstract
While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchmark dataset for evaluating gender-aware generation in three typologically diverse grammatically gendered languages: French, Arabic, and Hindi. The core task, GENFORM, requires models to rewrite a first-person sentence in the opposite gender while preserving its meaning and structure. We construct a high-quality synthetic dataset spanning these three languages and benchmark 15 popular multilingual LLMs (2B-70B) on their ability to perform this transformation. Our results reveal significant gaps and interesting insights into how current models handle morphological gender. MORPHOGEN provides a focused diagnostic lens for gender-aware language modeling and lays the groundwork for future research on inclusive and morphology-sensitive NLP.