Learning from True-False Labels via Multi-modal Prompt Retrieving

📅 2024-05-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

177K/year
🤖 AI Summary
To address the low reliability of pseudo-labels generated by vision-language models (VLMs) in weakly supervised learning, this paper proposes a novel True–False Labeling (TFL) paradigm: it randomly samples candidate labels and discriminates instance-level label assignments, enabling a risk-consistent estimator for unbiased conditional probability modeling. Furthermore, we design a lightweight convolutional multimodal prompt retrieval (MRP) mechanism that aligns VLM knowledge with downstream task features—without fine-tuning the VLM. Our key contributions are: (1) introducing the first TFL weak supervision setting; (2) providing theoretical proof of risk consistency; and (3) proposing an efficient, VLM-free-fine-tuning alignment method. Extensive experiments on multiple benchmarks demonstrate significant improvements over state-of-the-art approaches, achieving high label accuracy and strong robustness to label noise. The code is publicly available and ensures reproducible results.

Technology Category

Application Category

📝 Abstract
Weakly supervised learning has recently achieved considerable success in reducing annotation costs and label noise. Unfortunately, existing weakly supervised learning methods are short of ability in generating reliable labels via pre-trained vision-language models (VLMs). In this paper, we propose a novel weakly supervised labeling setting, namely True-False Labels (TFLs) which can achieve high accuracy when generated by VLMs. The TFL indicates whether an instance belongs to the label, which is randomly and uniformly sampled from the candidate label set. Specifically, we theoretically derive a risk-consistent estimator to explore and utilize the conditional probability distribution information of TFLs. Besides, we propose a convolutional-based Multi-modal Prompt Retrieving (MRP) method to bridge the gap between the knowledge of VLMs and target learning tasks. Experimental results demonstrate the effectiveness of the proposed TFL setting and MRP learning method. The code to reproduce the experiments is at https://github.com/Tranquilxu/TMP.
Problem

Research questions and friction points this paper is trying to address.

Improving weakly supervised label accuracy via Vision-Language Models
Developing True-False Labels for high-accuracy weak supervision
Bridging VLM knowledge and target tasks with prompt retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

True-False Labels for accurate VLM weak supervision
Risk-consistent estimator for conditional probability utilization
Multi-modal Prompt Retrieving bridges VLM-task knowledge gap
🔎 Similar Papers
No similar papers found.
Z
Zhongnian Li
School of Computer Science and Technology, China University of Mining Technology, Xuzhou, China
J
Jinghao Xu
School of Computer Science and Technology, China University of Mining Technology, Xuzhou, China
P
Peng Ying
School of Computer Science and Technology, China University of Mining Technology, Xuzhou, China
M
Meng Wei
School of Computer Science and Technology, China University of Mining Technology, Xuzhou, China
T
Tongfeng Sun
School of Computer Science and Technology, China University of Mining Technology, Xuzhou, China
X
Xinzheng Xu
School of Computer Science and Technology, China University of Mining Technology, Xuzhou, China