Learning from True-False Labels via Multi-modal Prompt Retrieving

📅 2024-05-24

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

To address the low reliability of pseudo-labels generated by vision-language models (VLMs) in weakly supervised learning, this paper proposes a novel True–False Labeling (TFL) paradigm: it randomly samples candidate labels and discriminates instance-level label assignments, enabling a risk-consistent estimator for unbiased conditional probability modeling. Furthermore, we design a lightweight convolutional multimodal prompt retrieval (MRP) mechanism that aligns VLM knowledge with downstream task features—without fine-tuning the VLM. Our key contributions are: (1) introducing the first TFL weak supervision setting; (2) providing theoretical proof of risk consistency; and (3) proposing an efficient, VLM-free-fine-tuning alignment method. Extensive experiments on multiple benchmarks demonstrate significant improvements over state-of-the-art approaches, achieving high label accuracy and strong robustness to label noise. The code is publicly available and ensures reproducible results.

Technology Category

Application Category

📝 Abstract

Weakly supervised learning has recently achieved considerable success in reducing annotation costs and label noise. Unfortunately, existing weakly supervised learning methods are short of ability in generating reliable labels via pre-trained vision-language models (VLMs). In this paper, we propose a novel weakly supervised labeling setting, namely True-False Labels (TFLs) which can achieve high accuracy when generated by VLMs. The TFL indicates whether an instance belongs to the label, which is randomly and uniformly sampled from the candidate label set. Specifically, we theoretically derive a risk-consistent estimator to explore and utilize the conditional probability distribution information of TFLs. Besides, we propose a convolutional-based Multi-modal Prompt Retrieving (MRP) method to bridge the gap between the knowledge of VLMs and target learning tasks. Experimental results demonstrate the effectiveness of the proposed TFL setting and MRP learning method. The code to reproduce the experiments is at https://github.com/Tranquilxu/TMP.

Problem

Research questions and friction points this paper is trying to address.

Improving weakly supervised label accuracy via Vision-Language Models

Developing True-False Labels for high-accuracy weak supervision

Bridging VLM knowledge and target tasks with prompt retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

True-False Labels for accurate VLM weak supervision

Risk-consistent estimator for conditional probability utilization

Multi-modal Prompt Retrieving bridges VLM-task knowledge gap

🔎 Similar Papers

No similar papers found.