Towards Generating Automatic Anaphora Annotations

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

126K/year

🤖 AI Summary

This work addresses the scarcity of high-quality coreference-annotated data and the prohibitive cost of manual annotation for coreference resolution. To this end, we propose a fully automated, dual-path data generation framework that requires no human labeling: (1) rule-based direct conversion leveraging existing corpora, and (2) cross-lingual joint parsing of dependency and coreference structures using multilingual pretrained models. Our framework is the first to systematically integrate structured data mapping with multilingual transfer capabilities, enabling coreference annotation for low-resource and unseen languages. Experiments across diverse languages demonstrate that the generated data achieves high quality and strong generalization, significantly reducing annotation costs. The approach establishes a scalable, reusable data infrastructure for coreference resolution—advancing both data efficiency and cross-lingual applicability in the field.

Technology Category

Application Category

📝 Abstract

Training models that can perform well on various NLP tasks require large amounts of data, and this becomes more apparent with nuanced tasks such as anaphora and conference resolution. To combat the prohibitive costs of creating manual gold annotated data, this paper explores two methods to automatically create datasets with coreferential annotations; direct conversion from existing datasets, and parsing using multilingual models capable of handling new and unseen languages. The paper details the current progress on those two fronts, as well as the challenges the efforts currently face, and our approach to overcoming these challenges.

Problem

Research questions and friction points this paper is trying to address.

Generating automatic anaphora annotations for NLP tasks

Reducing costs of manual gold annotated data creation

Handling new and unseen languages in coreference resolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct conversion from existing datasets

Parsing using multilingual models

Handling new and unseen languages

🔎 Similar Papers

A Generative Marker Enhanced End-to-End Framework for Argument Mining