Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

In high-stakes domains (e.g., healthcare, safety), obtaining non-vacuous, certifiable generalization guarantees for deep models under few-shot learning remains challenging. Method: This paper proposes a model fusion—rather than fine-tuning—paradigm for low-sample learning. Grounded in PAC-Bayes theory, it rigorously establishes that model fusion inherently admits non-vacuous generalization bounds, with the upper bound on generalization error depending solely on the downstream sample size (e.g., as few as 100 examples) and fully decoupled from model parameter count. The approach enables lightweight adaptation of large models—including ViT-B and Mistral-7B—without additional training. Results: Experiments deliver the first tight, verifiable generalization guarantees under extreme data scarcity, significantly enhancing the trustworthy deployment of AI systems in critical applications.

Technology Category

Application Category

📝 Abstract

Certifying the IID generalisation ability of deep networks is the first of many requirements for trusting AI in high-stakes applications from medicine to security. However, when instantiating generalisation bounds for deep networks it remains challenging to obtain non-vacuous guarantees, especially when applying contemporary large models on the small scale data prevalent in such high-stakes fields. In this paper, we draw a novel connection between a family of learning methods based on model fusion and generalisation certificates, and surprisingly show that with minor adjustment several existing learning strategies already provide non-trivial generalisation guarantees. Essentially, by focusing on data-driven learning of downstream tasks by fusion rather than fine-tuning, the certified generalisation gap becomes tiny and independent of the base network size, facilitating its certification. Our results show for the first time non-trivial generalisation guarantees for learning with as low as 100 examples, while using vision models such as VIT-B and language models such as mistral-7B. This observation is significant as it has immediate implications for facilitating the certification of existing systems as trustworthy, and opens up new directions for research at the intersection of practice and theory.

Problem

Research questions and friction points this paper is trying to address.

Certifying IID generalization for deep networks in high-stakes applications

Achieving non-vacuous guarantees for small-scale data with large models

Providing generalization bounds for low-shot learning via model fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model fusion enables non-trivial generalization guarantees

Certified gap independent of base network size

Effective with as few as 100 examples

🔎 Similar Papers

No similar papers found.