TADS: Task-Aware Data Selection for Multi-Task Multimodal Pre-Training

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing multi-task, multi-modal pretraining approaches, which are often hindered by noisy, weakly aligned, and redundant web-scale data, while current data curation methods struggle to simultaneously ensure quality, diversity, and task relevance. To overcome this, we propose TADS, a novel framework that introduces a task-aware mechanism for efficient data selection. TADS employs a learnable value function that jointly optimizes intrinsic data quality, task-specific relevance, and distributional diversity. It integrates unimodal and cross-modal quality assessments, task similarity vectors, clustering-based diversity weighting, and a meta-learning strategy driven by downstream task feedback to dynamically refine data selection. Evaluated on CC12M, TADS achieves an average 1.0% improvement over baselines on zero-shot benchmarks including ImageNet and CIFAR-100 using only 36% of the original data, substantially enhancing both data efficiency and performance ceilings.

Technology Category

Application Category

📝 Abstract
Large-scale multimodal pre-trained models like CLIP rely heavily on high-quality training data, yet raw web-crawled datasets are often noisy, misaligned, and redundant, leading to inefficient training and suboptimal generalization. Existing data selection methods are either heuristic-based, suffering from bias and limited diversity, or data-driven but task-agnostic, failing to optimize for multi-task scenarios. To address these gaps, we introduce TADS (Task-Aware Data Selection), a novel framework for multi-task multimodal pre-training that integrates Intrinsic Quality, Task Relevance, and Distributional Diversity into a learnable value function. TADS employs a comprehensive quality assessment system with unimodal and cross-modal operators, quantifies task relevance via interpretable similarity vectors, and optimizes diversity through cluster-based weighting. A feedback-driven meta-learning mechanism adaptively refines the selection strategy based on proxy model performance across multiple downstream tasks. Experiments on CC12M demonstrate that TADS achieves superior zero-shot performance on benchmarks like ImageNet, CIFAR-100, MS-COCO, and Flickr30K, using only 36% of the data while outperforming baselines by an average of 1.0%. This highlights that TADS significantly enhances data efficiency by curating a high-utility subset that yields a much higher performance ceiling within the same computational constraints.
Problem

Research questions and friction points this paper is trying to address.

multimodal pre-training
data selection
task-awareness
noisy data
multi-task learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-Aware Data Selection
Multimodal Pre-Training
Data Efficiency
Meta-Learning
Zero-Shot Learning
🔎 Similar Papers
No similar papers found.
Guanjie Cheng
Guanjie Cheng
Assistant Professor, School of Software Technology, Zhejiang University
AIoTMuti-Agent CollaborationEdge ComputingData Security and BlockchainPrivacy Protection
B
Boyi Li
School of Computer Science and Engineering, Northeastern University
L
Lingyu Sun
School of Computer and Electronic Information, Nanjing Normal University
Mengying Zhu
Mengying Zhu
Zhejiang University
Online learningfintechportfolio
Yangyang Wu
Yangyang Wu
Zhejiang University
Large Language ModelData CleaningMulti-modal Analysis
X
Xinkui Zhao
School of Software Technology, Zhejiang University
S
Shuiguang Deng
School of Computer Science and Technology, Zhejiang University