Value-Based Pre-Training with Downstream Feedback

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Standard self-supervised pretraining relies on fixed proxy tasks—such as next-token prediction—that often fail to efficiently prioritize capabilities most relevant to downstream applications. This work proposes V-Pretraining, a method wherein a lightweight task designer dynamically selects pretraining objectives—such as data augmentation strategies—that align with downstream task gradients, guided solely by limited downstream validation feedback without updating the model using downstream labels. To our knowledge, this is the first approach to effectively steer large-scale self-supervised pretraining using only minimal validation signals. Under identical computational budgets, V-Pretraining yields substantial performance gains: up to 18% improvement on GSM8K for language models ranging from 0.5B to 7B parameters; in vision tasks, it increases ADE20K mIoU by 1.07, reduces NYUv2 RMSE, and boosts ImageNet linear probing accuracy, while also demonstrating higher token efficiency and strong applicability across multimodal settings.

Technology Category

Application Category

📝 Abstract

Can a small amount of verified goal information steer the expensive self-supervised pretraining of foundation models? Standard pretraining optimizes a fixed proxy objective (e.g., next-token prediction), which can misallocate compute away from downstream capabilities of interest. We introduce V-Pretraining: a value-based, modality-agnostic method for controlled continued pretraining in which a lightweight task designer reshapes the pretraining task to maximize the value of each gradient step. For example, consider self-supervised learning (SSL) with sample augmentation. The V-Pretraining task designer selects pretraining tasks (e.g., augmentations) for which the pretraining loss gradient is aligned with a gradient computed over a downstream task (e.g., image segmentation). This helps steer pretraining towards relevant downstream capabilities. Notably, the pretrained model is never updated on downstream task labels; they are used only to shape the pretraining task. Under matched learner update budgets, V-Pretraining of 0.5B--7B language models improves reasoning (GSM8K test Pass@1) by up to 18% relative over standard next-token prediction using only 12% of GSM8K training examples as feedback. In vision SSL, we improve the state-of-the-art results on ADE20K by up to 1.07 mIoU and reduce NYUv2 RMSE while improving ImageNet linear accuracy, and we provide pilot evidence of improved token efficiency in continued pretraining.

Problem

Research questions and friction points this paper is trying to address.

pretraining

downstream feedback

value-based learning

self-supervised learning

foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

value-based pretraining

downstream feedback

task designer