Reinforced Curriculum Pre-Alignment for Domain-Adaptive VLMs

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the catastrophic forgetting problem commonly encountered when fine-tuning vision-language models (VLMs) on specialized domains. To mitigate this issue, the authors propose a staged post-training paradigm that integrates curriculum learning with reinforcement learning techniques such as GRPO. A pre-alignment mechanism is introduced to counteract optimization collapse caused by scarce domain-specific data. The approach progressively incorporates domain knowledge through partial output constraints and gradually transitions to full generative optimization, thereby balancing domain adaptation with the preservation of general-purpose multimodal capabilities. Experimental results demonstrate that the proposed method significantly improves performance across multiple specialized tasks while effectively maintaining the model’s general multimodal competence.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) demonstrate remarkable general-purpose capabilities but often fall short in specialized domains such as medical imaging or geometric problem-solving. Supervised Fine-Tuning (SFT) can enhance performance within a target domain, but it typically causes catastrophic forgetting, limiting its generalization. The central challenge, therefore, is to adapt VLMs to new domains while preserving their general-purpose capabilities. Continual pretraining is effective for expanding knowledge in Large Language Models (LLMs), but it is less feasible for VLMs due to prohibitive computational costs and the unavailability of pretraining data for most open-source models. This necessitates efficient post-training adaptation methods. Reinforcement learning (RL)-based approaches such as Group Relative Policy Optimization (GRPO) have shown promise in preserving general abilities, yet they often fail in domain adaptation scenarios where the model initially lacks sufficient domain knowledge, leading to optimization collapse. To bridge this gap, we propose Reinforced Curriculum Pre-Alignment (RCPA), a novel post-training paradigm that introduces a curriculum-aware progressive modulation mechanism. In the early phase, RCPA applies partial output constraints to safely expose the model to new domain concepts. As the model's domain familiarity increases, training gradually transitions to full generation optimization, refining responses and aligning them with domain-specific preferences. This staged adaptation balances domain knowledge acquisition with the preservation of general multimodal capabilities. Extensive experiments across specialized domains and general benchmarks validate the effectiveness of RCPA, establishing a practical pathway toward building high-performing and domain-adaptive VLMs.

Problem

Research questions and friction points this paper is trying to address.

domain adaptation

vision-language models

catastrophic forgetting

post-training adaptation

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforced Curriculum Pre-Alignment

Domain-Adaptive VLMs

Progressive Modulation