Combining pre-trained models via localized model averaging

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the significant performance variation of pretrained models on new tasks, where the relative superiority of models changes dynamically with input, making it challenging to select a single best model universally. To tackle this issue, the authors propose a localized model averaging approach that models fusion weights as functions of covariates and learns context-aware, dynamic weights within a general loss framework to adaptively combine the strengths of multiple models. This method overcomes the limitations of traditional static averaging and is theoretically shown to achieve asymptotic optimality in both in-sample and out-of-sample risk, along with consistent weight estimation. Extensive experiments demonstrate the effectiveness and robustness of the proposed approach across diverse prediction tasks.

📝 Abstract

Many pre-trained models (PTMs) are available in modern applications. Because different PTMs are often trained on different datasets, their performances can vary substantially for different new tasks, and the ranking of the candidates may depend heavily on the input. Motivated by this, we propose a localized model averaging method with weights modeled as functions of the covariates, making it substantially more versatile than existing model averaging methods. This formulation allows the model averaging procedure to adaptively capture the varying relative advantages of different PTMs across heterogeneous contexts. Specifically, we learn flexible local weights under a general loss framework that accommodates a broad class of prediction tasks. We further establish the asymptotic optimality of the proposed method for both in-sample and out-of-sample risks, as well as the consistency of the estimated weights. Extensive numerical experiments further demonstrate the effectiveness of the proposed method.

Problem

Research questions and friction points this paper is trying to address.

pre-trained models

model averaging

localized weighting

heterogeneous contexts

adaptive combination

Innovation

Methods, ideas, or system contributions that make the work stand out.

localized model averaging

pre-trained models

adaptive weighting