Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of robust scaling benchmarks for evaluating open-language-vision models. We present the first systematic cross-model, cross-dataset scaling laws for CLIP and MaMMUT across a wide spectrum of model sizes (parameter count) and training data scales. We propose an efficient scaling-law modeling approach based on constant learning rate, enabling reproducible, unified evaluation across multiple tasks—including zero-shot classification, retrieval, and segmentation—and diverse open datasets (DataComp, DFN, Re-LAION). Experimental results demonstrate that MaMMUT significantly outperforms CLIP in both scaling efficiency and sample efficiency. Notably, openMaMMUT-L/14 achieves 80.3% zero-shot ImageNet-1K top-1 accuracy on DataComp-1.4B. All trained models, checkpoints, source code, and raw experimental data are publicly released to foster reproducible research.

Technology Category

Application Category

📝 Abstract
In studies of transferable learning, scaling laws are obtained for various important foundation models to predict their properties and performance at larger scales. We show here how scaling law derivation can also be used for model and dataset comparison, allowing to decide which procedure is to be preferred for pre-training. For the first time, full scaling laws based on dense measurements across a wide span of model and samples seen scales are derived for two important language-vision learning procedures, CLIP and MaMMUT, that use either contrastive only or contrastive and captioning text generative loss. Ensuring sufficient prediction accuracy for held out points, we use derived scaling laws to compare both models, obtaining evidence for MaMMUT's stronger improvement with scale and better sample efficiency than standard CLIP. To strengthen validity of the comparison, we show scaling laws for various downstream tasks, classification, retrieval, and segmentation, and for different open datasets, DataComp, DFN and Re-LAION, observing consistently the same trends. We show that comparison can also be performed when deriving scaling laws with a constant learning rate schedule, reducing compute cost. Accurate derivation of scaling laws provides thus means to perform model and dataset comparison across scale spans, avoiding misleading conclusions based on measurements from single reference scales only, paving the road for systematic comparison and improvement of open foundation models and datasets for their creation. We release all the pre-trained models with their intermediate checkpoints, including openMaMMUT-L/14, which achieves $80.3%$ zero-shot ImageNet-1k accuracy, trained on 12.8B samples from DataComp-1.4B. Code for reproducing experiments in the paper and raw experiments data can be found at https://github.com/LAION-AI/scaling-laws-for-comparison.
Problem

Research questions and friction points this paper is trying to address.

Deriving scaling laws for comparing language-vision models and datasets
Evaluating CLIP and MaMMUT models' performance and sample efficiency
Ensuring robust model comparison across various tasks and datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Derives scaling laws for model and dataset comparison
Compares CLIP and MaMMUT using scaling laws
Uses constant learning rate to reduce compute cost
🔎 Similar Papers
No similar papers found.
M
Marianna Nezhurina
LAION, Juelich Supercomputing Center (JSC), Research Center Juelich (FZJ), Open-Ψ(Open-Sci) Collective
T
Tomer Porian
LAION, Juelich Supercomputing Center (JSC), Research Center Juelich (FZJ)
G
Giovanni Pucceti
Institute of Information Science and Technologies “A. Faedo” - CNR Pisa
Tommie Kerssies
Tommie Kerssies
PhD Candidate, Eindhoven University of Technology; Applied Scientist Intern, Amazon
Artificial IntelligenceMachine LearningDeep Learning
R
Romain Beaumont
LAION
Mehdi Cherti
Mehdi Cherti
Postdoc at Forschungszentrum Jülich, LAION co-founder
Deep learningScaling lawsmulti-modal models
Jenia Jitsev
Jenia Jitsev
Scalable Learning & Multi-Purpose AI (SLAMPAI) Lab, JSC, Forschungszentrum Juelich; ELLIS; LAION
Open Foundation Models & DatasetsScaling lawsPlasticity and Learning in Neural Networks