๐ค AI Summary
In subjective tasks, aggregating crowd annotations into hard labels obscures inherent human judgment diversity, causing a misalignment between model confidence and true cognitive uncertainty. To address this, we propose modeling the annotation distribution itself as a direct representation of cognitive uncertaintyโnot as noise to be suppressed. Our method employs soft-label supervision, using the empirical probability distribution over crowd annotations as the learning target for end-to-end training. Crucially, we are the first to formally reinterpret the annotation distribution as a learnable signal of cognitive uncertainty, thereby aligning model confidence with observed human perceptual variability. Experiments across vision and NLP benchmarks demonstrate that, while preserving hard-label accuracy, our approach reduces KL divergence by 32% and improves the correlation between model output entropy and annotation entropy by 61%.
๐ Abstract
Many machine learning tasks involve inherent subjectivity, where annotators naturally provide varied labels. Standard practice collapses these label distributions into single labels, aggregating diverse human judgments into point estimates. We argue that this approach is epistemically misaligned for ambiguous data--the annotation distribution itself should be regarded as the ground truth. Training on collapsed single labels forces models to express false confidence on fundamentally ambiguous cases, creating a misalignment between model certainty and the diversity of human perception. We demonstrate empirically that soft-label training, which treats annotation distributions as ground truth, preserves epistemic uncertainty. Across both vision and NLP tasks, soft-label training achieves 32% lower KL divergence from human annotations and 61% stronger correlation between model and annotation entropy, while matching the accuracy of hard-label training. Our work repositions annotation distributions from noisy signals to be aggregated away, to faithful representations of epistemic uncertainty that models should learn to reproduce.