🤖 AI Summary
Empathic regression suffers from substantial noise in self-reported affective labels, and existing supervised learning methods lack robustness to such label corruption.
Method: We propose an uncertainty-aware probabilistic language modeling framework that integrates variational model ensembles with heteroscedastic uncertainty modeling. Our approach introduces a degradation-suppressing uncertainty quantification mechanism and a contrastive loss that enhances input similarity, enabling effective separation of noisy from clean samples.
Contribution/Results: Leveraging Bayesian probabilistic modeling and synthetic noise injection for validation, our framework achieves state-of-the-art performance on two public benchmarks: Pearson correlation coefficients improve to 0.580 and 0.634, respectively, while calibration error drops significantly to 0.376. To the best of our knowledge, this is the first work to systematically introduce uncertainty-aware modeling into empathic regression—substantially enhancing model generalization and reliability under label noise.
📝 Abstract
Supervised learning for empathy regression is challenged by noisy self-reported empathy scores. While many algorithms have been proposed for learning with noisy labels in textual classification problems, the regression counterpart is relatively under-explored. We propose UPLME, an uncertainty-aware probabilistic language modelling framework to capture label noise in the regression setting of empathy detection. UPLME includes a probabilistic language model that predicts both empathy score and heteroscedastic uncertainty and is trained using Bayesian concepts with variational model ensembling. We further introduce two novel loss components: one penalises degenerate Uncertainty Quantification (UQ), and another enforces the similarity between the input pairs on which we predict empathy. UPLME provides state-of-the-art performance (Pearson Correlation Coefficient: $0.558
ightarrow0.580$ and $0.629
ightarrow0.634$) in terms of the performance reported in the literature in two public benchmarks, having label noise. Through synthetic label noise injection, we show that UPLME is effective in separating noisy and clean samples based on the predicted uncertainty. UPLME further outperform (Calibration error: $0.571
ightarrow0.376$) a recent variational model ensembling-based UQ method designed for regression problems.