Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

General-purpose robotic policies often overfit during fine-tuning on new tasks, leading to catastrophic forgetting of prior generalization capabilities and poor robustness to in-task distribution shifts. To address this, we propose a weight interpolation-based parameter fusion method: instead of end-to-end fine-tuning, we linearly combine the weights of a task-specific fine-tuned model with those of a pre-trained generalist policy. This approach requires no additional architectural components or explicit regularization. Crucially, it preserves the joint vision-language-action representation capacity of the base model while enabling robust acquisition of novel skills and continual retention of previously learned ones. Experiments demonstrate that the fused model significantly improves out-of-distribution generalization to unseen task variants—both in simulation and on real robotic platforms—and supports progressive integration of multiple skills. Our method provides a lightweight, efficient, and scalable solution for continual learning in general-purpose robotics.

Technology Category

Application Category

📝 Abstract

Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall short on new tasks not covered in the training data. When finetuned on limited demonstrations of a new task, these policies often overfit to the specific demonstrations--not only losing their prior abilities to solve a wide variety of generalist tasks but also failing to generalize within the new task itself. In this work, we aim to develop a method that preserves the generalization capabilities of the generalist policy during finetuning, allowing a single policy to robustly incorporate a new skill into its repertoire. Our goal is a single policy that both learns to generalize to variations of the new task and retains the broad competencies gained from pretraining. We show that this can be achieved through a simple yet effective strategy: interpolating the weights of a finetuned model with that of the pretrained model. We show, across extensive simulated and real-world experiments, that such model merging produces a single model that inherits the generalist abilities of the base model and learns to solve the new task robustly, outperforming both the pretrained and finetuned model on out-of-distribution variations of the new task. Moreover, we show that model merging enables continual acquisition of new skills in a lifelong learning setting, without sacrificing previously learned generalist abilities.

Problem

Research questions and friction points this paper is trying to address.

Prevents overfitting when finetuning generalist robot policies on new tasks.

Maintains pretrained generalization abilities while learning new skills robustly.

Enables continual skill acquisition without losing prior broad competencies.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter merging for robust finetuning of robot policies

Interpolating weights between pretrained and finetuned models

Enables continual skill acquisition without losing prior abilities

🔎 Similar Papers

Wonderful Team: Zero-Shot Physical Task Planning with Visual LLMs

2024-07-26Citations: 2

Toyota Research Institute

Los Altos, CA / Cambridge, MA

Research Scientist Intern, Robotic Control Policy (PhD)