Understanding the Essence: Delving into Annotator Prototype Learning for Multi-Class Annotation Aggregation

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-class label aggregation methods suffer from unreliable and insufficient modeling of annotator confusion matrices due to sparse annotations and severe class imbalance. To address this, we propose a prototype-enhanced annotator modeling framework: it introduces a set of learnable prototype confusion matrices regularized by a Dirichlet prior, enabling fine-grained and robust characterization of annotator expertise. Within a Bayesian classifier combination framework, we integrate a probabilistic graphical model with variational inference for efficient optimization. Our approach significantly mitigates annotation sparsity, yielding more accurate and generalizable true-label inference. Extensive experiments across 11 real-world datasets demonstrate that our method achieves up to a 15% absolute accuracy improvement over baselines, with an average gain of 3%, while reducing computational overhead by over 90%.

Technology Category

Application Category

📝 Abstract
Multi-class classification annotations have significantly advanced AI applications, with truth inference serving as a critical technique for aggregating noisy and biased annotations. Existing state-of-the-art methods typically model each annotator's expertise using a confusion matrix. However, these methods suffer from two widely recognized issues: 1) when most annotators label only a few tasks, or when classes are imbalanced, the estimated confusion matrices are unreliable, and 2) a single confusion matrix often remains inadequate for capturing each annotator's full expertise patterns across all tasks. To address these issues, we propose a novel confusion-matrix-based method, PTBCC (ProtoType learning-driven Bayesian Classifier Combination), to introduce a reliable and richer annotator estimation by prototype learning. Specifically, we assume that there exists a set $S$ of prototype confusion matrices, which capture the inherent expertise patterns of all annotators. Rather than a single confusion matrix, the expertise per annotator is extended as a Dirichlet prior distribution over these prototypes. This prototype learning-driven mechanism circumvents the data sparsity and class imbalance issues, ensuring a richer and more flexible characterization of annotators. Extensive experiments on 11 real-world datasets demonstrate that PTBCC achieves up to a 15% accuracy improvement in the best case, and a 3% higher average accuracy while reducing computational cost by over 90%.
Problem

Research questions and friction points this paper is trying to address.

Address unreliable confusion matrices in sparse annotation data
Capture diverse annotator expertise beyond single confusion matrix
Improve accuracy and reduce cost in multi-class annotation aggregation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype learning-driven Bayesian Classifier Combination
Dirichlet prior distribution over prototype matrices
Handles data sparsity and class imbalance
🔎 Similar Papers
No similar papers found.
Ju Chen
Ju Chen
Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, China; College of Computer Science and Software Engineering, Hohai University, Nanjing, China
J
Jun Feng
Key Laboratory of Water Big Data Technology of Ministry of Water Resources, Hohai University, Nanjing, China; College of Computer Science and Software Engineering, Hohai University, Nanjing, China
Shenyu Zhang
Shenyu Zhang
Southeast University
Natural Language Processing