Findings of the Fourth Shared Task on Multilingual Coreference Resolution: Can LLMs Dethrone Traditional Approaches?

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluation frameworks for coreference resolution are ill-suited for large language models (LLMs), and multilingual LLM-based coreference resolution lacks standardized, fair benchmarks. Method: The authors introduce the first LLM-specific evaluation track for multilingual coreference resolution, using plain-text input only. They construct a unified, equitable multilingual benchmark based on an extended CorefUD v1.3—covering 17 languages and 25 datasets, including three newly added datasets and two new languages. They systematically compare fine-tuned and few-shot LLM approaches against traditional systems. Contribution/Results: Among nine participating systems, four are LLM-based. Although traditional systems still lead overall, LLMs demonstrate strong competitive performance, validating their feasibility for cross-lingual coreference modeling. This work establishes a novel paradigm and a reproducible, multilingual benchmark for future LLM-driven coreference resolution research.

Technology Category

Application Category

📝 Abstract
The paper presents an overview of the fourth edition of the Shared Task on Multilingual Coreference Resolution, organized as part of the CODI-CRAC 2025 workshop. As in the previous editions, participants were challenged to develop systems that identify mentions and cluster them according to identity coreference. A key innovation of this year's task was the introduction of a dedicated Large Language Model (LLM) track, featuring a simplified plaintext format designed to be more suitable for LLMs than the original CoNLL-U representation. The task also expanded its coverage with three new datasets in two additional languages, using version 1.3 of CorefUD - a harmonized multilingual collection of 22 datasets in 17 languages. In total, nine systems participated, including four LLM-based approaches (two fine-tuned and two using few-shot adaptation). While traditional systems still kept the lead, LLMs showed clear potential, suggesting they may soon challenge established approaches in future editions.
Problem

Research questions and friction points this paper is trying to address.

Evaluating if LLMs can outperform traditional coreference resolution methods
Developing systems to identify and cluster identity coreference mentions
Expanding multilingual coreference resolution with new datasets and languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduced dedicated LLM track with simplified plaintext format
Expanded coverage using CorefUD multilingual dataset collection
Evaluated both fine-tuned and few-shot adapted LLM approaches
🔎 Similar Papers
No similar papers found.