Mashup Learning: Faster Finetuning by Remixing Past Checkpoints

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the inefficiency and resource waste caused by the proliferation of isolated, domain-specific fine-tuned model checkpoints, which hinder rapid adaptation to new tasks. It proposes a novel and systematic transfer learning paradigm that treats historical fine-tuning checkpoints as reusable assets. The approach first evaluates the relevance of existing checkpoints to the target task, selects the most pertinent ones, and fuses their parameters to construct an improved initialization for subsequent standard fine-tuning. Extensive experiments across eight benchmarks and four backbone architectures demonstrate that this method consistently enhances model accuracy by 0.5–5.0 percentage points, reduces the required training steps by 41–46%, and lowers total training time by up to 37%.

Technology Category

Application Category

📝 Abstract

Finetuning on domain-specific data is a well-established method for enhancing LLM performance on downstream tasks. Training on each dataset produces a new set of model weights, resulting in a multitude of checkpoints saved in-house or on open-source platforms. However, these training artifacts are rarely reused for subsequent experiments despite containing improved model abilities for potentially similar tasks. In this paper, we propose Mashup Learning, a simple method to leverage the outputs of prior training runs to enhance model adaptation to new tasks. Our procedure identifies the most relevant historical checkpoints for a target dataset, aggregates them with model merging, and uses the result as an improved initialization for training. Across 8 standard LLM benchmarks, four models, and two collections of source checkpoints, Mashup Learning consistently improves average downstream accuracy by 0.5-5 percentage points over training from scratch. It also accelerates convergence, requiring 41-46% fewer training steps and up to 37% less total wall-clock time to match from-scratch accuracy, including all selection and merging overhead.

Problem

Research questions and friction points this paper is trying to address.

finetuning

model checkpoints

LLM adaptation

training efficiency

knowledge reuse

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mashup Learning

model merging

checkpoint reuse