🤖 AI Summary
To address the reliance on large-scale data in domain adaptation of CLIP models, this paper proposes CHIPS—a framework for efficient, scalable, and knowledge-preserving continual pre-training via curvature-aware and influence-informed data selection. Methodologically, CHIPS introduces three key innovations: (i) Newton-style curvature alignment to preserve geometric structure during adaptation; (ii) InfoNCE curvature estimation under Johnson–Lindenstrauss random projection for scalable computation; and (iii) a selection-aware joint weighting mechanism balancing relevance and learnability, supported by a theoretical lower bound guarantee. Empirically, CHIPS achieves full fine-tuning performance on 17 medical benchmarks using only 30% of the data, and sustains minimal performance degradation—outperforming baselines—on 31 general-domain benchmarks when trained with just 10%–30% of the data. The framework thus uniquely balances domain specialization with robust general semantic understanding.
📝 Abstract
Adapting CLIP to vertical domains is typically approached by novel fine-tuning strategies or by continual pre-training (CPT) on large domain-specific datasets. Yet, data itself remains an underexplored factor in this process. We revisit this task from a data-centric perspective: Can effective data selection substitute for large-scale datasets in CPT? We introduce CHIPS (Curvature-aware Hybrid Influence in Projection Subspace), which assigns each image-text pair a utility score that integrates three complementary factors aligned with three goals: faithfulness via a curvature-aware, Newton-style alignment computed in CLIP's end-point subspace; scalability via an InfoNCE-aware curvature estimator with Johnson-Lindenstrauss (JL) sketching; and retention via a selection-aware relevance weight combined with learnability to balance target adaptation against general-domain preservation. We justify this design theoretically by proving a lower-bound guarantee on the proxy's correlation with full-parameter alignment and by characterizing the bias-variance trade-offs introduced by curvature mixing and JL sketching. We evaluate CHIPS empirically across various settings: 1) CHIPS attains state-of-the-art performance among selection baselines on 17 medical benchmarks, matches full-dataset CPT with 30% of the data, and outperforms half-dataset CPT using only 10%; 2) on 31 general-domain benchmarks, CHIPS yields the smallest performance drop under 10-30% data-retention budgets. Code, data, and checkpoints will be released.