Human-Corrected Labels Learning: Enhancing Labels Quality via Human Correction of VLMs Discrepancies

๐Ÿ“… 2025-11-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high label noise and lack of error-correction mechanisms in automatic visual-language model (VLM)-generated annotations, this paper proposes a novel โ€œhuman-corrected labelingโ€ paradigm: human annotation is selectively applied only to high-risk samples exhibiting inconsistency across multiple VLM outputs, thereby balancing labeling quality and cost efficiency. Our contributions are threefold: (1) We introduce the first uncertainty quantification method for labeling based on inter-VLM output divergence, enabling targeted human intervention; (2) We design a risk-consistent weakly supervised learning framework that jointly leverages raw VLM outputs, model predictions, and human-corrected labels; (3) We incorporate conditional probability modeling to estimate the underlying true label distribution, facilitating noise-aware classifier training. Experiments demonstrate substantial improvements in classification accuracy across diverse noise settings, strong robustness to label corruption, and over 40% reduction in human annotation cost.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision-Language Models (VLMs), with their powerful content generation capabilities, have been successfully applied to data annotation processes. However, the VLM-generated labels exhibit dual limitations: low quality (i.e., label noise) and absence of error correction mechanisms. To enhance label quality, we propose Human-Corrected Labels (HCLs), a novel setting that efficient human correction for VLM-generated noisy labels. As shown in Figure 1(b), HCL strategically deploys human correction only for instances with VLM discrepancies, achieving both higher-quality annotations and reduced labor costs. Specifically, we theoretically derive a risk-consistent estimator that incorporates both human-corrected labels and VLM predictions to train classifiers. Besides, we further propose a conditional probability method to estimate the label distribution using a combination of VLM outputs and model predictions. Extensive experiments demonstrate that our approach achieves superior classification performance and is robust to label noise, validating the effectiveness of HCL in practical weak supervision scenarios. Code https://github.com/Lilianach24/HCL.git
Problem

Research questions and friction points this paper is trying to address.

Improving low-quality labels generated by Vision-Language Models
Reducing human annotation costs while enhancing label quality
Developing robust classification methods for noisy label scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human correction targets VLM discrepancies
Risk-consistent estimator combines human and VLM labels
Conditional probability estimates label distribution
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zhongnian Li
School of Computer Science and Technology / School of Artificial Intelligence, China University of Mining and Technology, Xuzhou, China
Lan Chen
Lan Chen
Communication University of China
Image/Video generation and editing
Yixin Xu
Yixin Xu
School of Computer Science and Technology / School of Artificial Intelligence, China University of Mining and Technology, Xuzhou, China
S
Shi Xu
School of Computer Science and Technology / School of Artificial Intelligence, China University of Mining and Technology, Xuzhou, China
X
Xinzheng Xu
State Key Lab. for Novel Software Technology, Nanjing University, Nanjing, China