MORPHOGEN: A Multilingual Benchmark for Evaluating Gender-Aware Morphological Generation

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
Current multilingual large language models lack systematic evaluation on fine-grained linguistic phenomena such as grammatical gender and morphological agreement, particularly in morphologically rich languages. This work proposes MORPHOGEN, the first benchmark specifically designed to evaluate gender-sensitive morphological generation across languages. It introduces the GENFORM task, which requires models to transform first-person sentences into their opposite-gender forms while preserving semantic content and syntactic structure. The benchmark covers French, Arabic, and Hindi—three typologically diverse languages—and employs synthetically generated, high-quality evaluation data. A systematic assessment of 15 prominent multilingual models (ranging from 2B to 70B parameters) reveals substantial deficiencies in gender-aware morphological generation and notable cross-lingual variation, offering both a diagnostic tool and foundational insights for developing more inclusive and morphologically sensitive NLP systems.

Technology Category

Application Category

📝 Abstract
While multilingual large language models (LLMs) perform well on high-level tasks like translation and question answering, their ability to handle grammatical gender and morphological agreement remains underexplored. In morphologically rich languages, gender influences verb conjugation, pronouns, and even first-person constructions with explicit and implicit mentions of gender. We introduce MORPHOGEN, a morphologically grounded large-scale benchmark dataset for evaluating gender-aware generation in three typologically diverse grammatically gendered languages: French, Arabic, and Hindi. The core task, GENFORM, requires models to rewrite a first-person sentence in the opposite gender while preserving its meaning and structure. We construct a high-quality synthetic dataset spanning these three languages and benchmark 15 popular multilingual LLMs (2B-70B) on their ability to perform this transformation. Our results reveal significant gaps and interesting insights into how current models handle morphological gender. MORPHOGEN provides a focused diagnostic lens for gender-aware language modeling and lays the groundwork for future research on inclusive and morphology-sensitive NLP.
Problem

Research questions and friction points this paper is trying to address.

grammatical gender
morphological agreement
multilingual LLMs
gender-aware generation
morphologically rich languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

gender-aware generation
morphological agreement
multilingual benchmark
GENFORM task
grammatical gender
🔎 Similar Papers
No similar papers found.