Uncertainty Decomposition via Cyclical SG-MCMC and Soft-label Learning for Subjective NLP

๐Ÿ“… 2026-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the inherent ambiguity in subjective emotion classification caused by annotator disagreement and the consequent need for effective uncertainty quantification. The work proposes a novel approach that integrates soft-label learning with Bayesian deep learning by training a linear head on top of a frozen RoBERTa feature extractor. To approximate the true annotator distribution, the method employs cyclical stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and introduces posterior temperature scaling to enhance calibration. A comprehensive five-dimensional evaluation framework is introduced, revealing that calibration quality under hard labels and fidelity to the annotator distribution constitute distinct evaluation axes. On the GoEmotions dataset, the proposed method significantly outperforms Monte Carlo Dropout and deep ensemble baselines in terms of Jensenโ€“Shannon divergence, Spearman correlation, and AURC/AUROC metrics.
๐Ÿ“ Abstract
Annotator disagreement in emotion classification reflects ambiguity intrinsic to emotion concepts and is essential for predictor-quality assessment in subjective NLP. Yet no prior work integrates soft-label learning with Bayesian deep learning to evaluate uncertainty along axes including annotator-distribution fidelity. We train a linear head on a frozen RoBERTa via cyclical stochastic gradient Markov chain Monte Carlo (cSG-MCMC), targeting the empirical annotator distribution with a soft-label objective under a five-axis evaluation. On the 28-emotion GoEmotions benchmark, the proposed method outperforms Monte Carlo Dropout and Deep Ensemble simultaneously on three axes -- Jensen-Shannon divergence (JSD) to the annotator distribution, Spearman correlation between per-emotion aleatoric uncertainty and disagreement, and selective-prediction Area Under the Risk-Coverage Curve (AURC) and Area Under the ROC Curve (AUROC) -- showing independent axes are jointly attainable from one posterior. Post-hoc temperature scaling exhibits a bidirectional effect, establishing hard-label calibration and annotator-JSD as independent dimensions and motivating joint reporting as an honest protocol.
Problem

Research questions and friction points this paper is trying to address.

annotator disagreement
uncertainty decomposition
soft-label learning
subjective NLP
emotion classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

cyclical SG-MCMC
soft-label learning
uncertainty decomposition
annotator disagreement
Bayesian deep learning
๐Ÿ”Ž Similar Papers
No similar papers found.