🤖 AI Summary
Coreference resolution (CR) has long been constrained by task-specific architectures and encoder-centric paradigms, limiting generalization and multilingual adaptability. This work proposes the first decoder-only large language model (LLM)-based approach for multilingual coreference resolution, eliminating reliance on dedicated encoders. By leveraging instruction tuning, it unifies modeling of both explicit mentions and zero pronouns within a single framework and enables controllable inference. We design five instruction templates and fine-tune Llama 3.1, Gemma 2, and Mistral 0.3 on the multilingual CorefUD v1.2 benchmark. Our method achieves state-of-the-art performance, with the best model outperforming Corpipe-24 (single-stage) by an average +2.0 F1 points. This work establishes a novel decoder-only LLM paradigm for coreference resolution, substantially improving model generality, inference controllability, and cross-lingual transfer capability.
📝 Abstract
Coreference Resolution (CR) is a crucial yet challenging task in natural language understanding, often constrained by task-specific architectures and encoder-based language models that demand extensive training and lack adaptability. This study introduces the first multilingual CR methodology which leverages decoder-only LLMs to handle both overt and zero mentions. The article explores how to model the CR task for LLMs via five different instruction sets using a controlled inference method. The approach is evaluated across three LLMs; Llama 3.1, Gemma 2, and Mistral 0.3. The results indicate that LLMs, when instruction-tuned with a suitable instruction set, can surpass state-of-the-art task-specific architectures. Specifically, our best model, a fully fine-tuned Llama 3.1 for multilingual CR, outperforms the leading multilingual CR model (i.e., Corpipe 24 single stage variant) by 2 pp on average across all languages in the CorefUD v1.2 dataset collection.