Findings of the Fifth Shared Task on Multilingual Coreference Resolution: Expanding Datasets for Long-Range Entities

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study addresses the challenge of long-range coreference resolution in multilingual settings—specifically, the task of identifying and clustering scattered mentions of the same entity across sentence boundaries—through a shared task. The work introduces a significant expansion of the CorefUD dataset, adding five new datasets and two additional languages, resulting in 27 datasets spanning 19 languages. It presents a systematic evaluation of diverse approaches, combining traditional coreference systems with large language models (LLMs) employing both fine-tuning and few-shot strategies. Among ten participating teams, conventional systems currently outperform LLM-based methods; however, the latter demonstrate considerable promise, suggesting strong potential for future breakthroughs in this domain.

📝 Abstract

This paper describes the fifth edition of the Shared Task on Multilingual Coreference Resolution, held in conjunction with the CODI-CRAC 2026 workshop. Building on previous iterations, the task required participants to develop systems capable of mention identification and identity-based coreference clustering. The 2026 edition specifically emphasizes long-range entities, defined as coreferential chains spanning significant distances, across many words and sentences. The task expanded its linguistic scope by incorporating five new datasets and two additional languages. These additions leverage version 1.4 of CorefUD, a harmonized multilingual collection comprising 27 datasets in 19 languages. In total, ten systems participated, including four LLM-based approaches (three fine-tuned models and one few-shot approach). While traditional systems still maintained their lead, LLMs demonstrated significant potential, suggesting they may soon challenge established approaches in future editions.

Problem

Research questions and friction points this paper is trying to address.

Multilingual Coreference Resolution

Long-Range Entities

Coreference Clustering

Mention Identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

long-range coreference

multilingual coreference resolution

CorefUD

large language models