🤖 AI Summary
This study addresses multilingual coreference resolution by proposing an efficient dual-mode system built upon multilingual pretrained encoders, supporting both LLM-guided and unconstrained inference paradigms. Methodologically, we perform a full-stack reconstruction from TensorFlow to PyTorch, integrate multilingual encoders, and adapt the CRAC 2025 reduced dataset. Crucially, we introduce a unified modeling framework that preserves lightweight design while enhancing cross-lingual generalization. In the CRAC 2025 shared task, our system achieves first place in both the LLM-guided and unconstrained tracks, with F1 scores exceeding the second-best systems by 8 percentage points—substantially outperforming all baselines. To foster reproducibility and extensibility, we fully open-source our code, models, and configurations, establishing a new, accessible benchmark for multilingual coreference resolution.
📝 Abstract
We present CorPipe 25, the winning entry to the CRAC 2025 Shared Task on Multilingual Coreference Resolution. This fourth iteration of the shared task introduces a new LLM track alongside the original unconstrained track, features reduced development and test sets to lower computational requirements, and includes additional datasets. CorPipe 25 represents a complete reimplementation of our previous systems, migrating from TensorFlow to PyTorch. Our system significantly outperforms all other submissions in both the LLM and unconstrained tracks by a substantial margin of 8 percentage points. The source code and trained models are publicly available at https://github.com/ufal/crac2025-corpipe.