Probabilistic Label Spreading: Efficient and Consistent Estimation of Soft Labels with Epistemic Uncertainty on Graphs

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the scarcity of high-quality annotations in perception tasks and the inherent aleatoric and epistemic uncertainties associated with human labeling. It proposes the first probabilistic label propagation method that maintains consistency even as the number of annotations approaches zero. Built upon the assumption of label smoothness in feature space, the approach leverages a probabilistic diffusion mechanism over a graph structure to propagate soft labels from a single annotation, thereby jointly modeling and quantifying both sources of uncertainty. The method substantially reduces the annotation budget required to achieve target performance, outperforms existing baselines on mainstream image datasets, and establishes a new state-of-the-art on the Data-Centric Image Classification benchmark.

Technology Category

Application Category

📝 Abstract

Safe artificial intelligence for perception tasks remains a major challenge, partly due to the lack of data with high-quality labels. Annotations themselves are subject to aleatoric and epistemic uncertainty, which is typically ignored during annotation and evaluation. While crowdsourcing enables collecting multiple annotations per image to estimate these uncertainties, this approach is impractical at scale due to the required annotation effort. We introduce a probabilistic label spreading method that provides reliable estimates of aleatoric and epistemic uncertainty of labels. Assuming label smoothness over the feature space, we propagate single annotations using a graph-based diffusion method. We prove that label spreading yields consistent probability estimators even when the number of annotations per data point converges to zero. We present and analyze a scalable implementation of our method. Experimental results indicate that, compared to baselines, our approach substantially reduces the annotation budget required to achieve a desired label quality on common image datasets and achieves a new state of the art on the Data-Centric Image Classification benchmark.

Problem

Research questions and friction points this paper is trying to address.

label uncertainty

epistemic uncertainty

aleatoric uncertainty

annotation efficiency

graph-based learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

probabilistic label spreading

epistemic uncertainty

graph-based diffusion