🤖 AI Summary
This work addresses the challenge of inconsistent speaker attribute annotation in multilingual texts, where sociocultural cues are often implicit and vary significantly across languages. To mitigate this issue, the authors propose a novel human–LLM collaborative relabeling framework that iteratively refines explicit reasoning rationales through interactive dialogue and employs a disagreement-focused sampling strategy for targeted relabeling. This approach establishes the first human–machine collaborative paradigm for multilingual speaker attribute annotation, substantially improving labeling consistency and cross-lingual comparability. The resulting WhoSaidIt dataset encompasses nine speaker attributes and enables quantitative analysis of discrepancies between original and revised annotations as well as cross-lingual decision divergences, while also providing a systematic evaluation of the capabilities and limitations of large language models in this task.
📝 Abstract
Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and culturally variable. We propose a human-large language model (LLM) collaborative re-annotation framework for stabilizing multilingual speaker-attribute labels under practical resource constraints. Starting from a noisy corpus, we use LLMs to surface recurring annotation rationales through iterative interaction with experts, and apply disagreement-focused sampling for targeted re-annotation. Using this framework, we construct WhoSaidIt, a multilingual dataset covering nine speaker-attribute labels. We quantify divergence between original and revised annotations, benchmark recent LLMs, and analyze the effect of explicit rationales on model behavior. Our results reveal substantial cross-lingual differences in annotation decisions and demonstrate both the strengths and limitations of LLMs in speaker-attribute classification.