🤖 AI Summary
General-purpose robotic policies often overfit during fine-tuning on new tasks, leading to catastrophic forgetting of prior generalization capabilities and poor robustness to in-task distribution shifts. To address this, we propose a weight interpolation-based parameter fusion method: instead of end-to-end fine-tuning, we linearly combine the weights of a task-specific fine-tuned model with those of a pre-trained generalist policy. This approach requires no additional architectural components or explicit regularization. Crucially, it preserves the joint vision-language-action representation capacity of the base model while enabling robust acquisition of novel skills and continual retention of previously learned ones. Experiments demonstrate that the fused model significantly improves out-of-distribution generalization to unseen task variants—both in simulation and on real robotic platforms—and supports progressive integration of multiple skills. Our method provides a lightweight, efficient, and scalable solution for continual learning in general-purpose robotics.
📝 Abstract
Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall short on new tasks not covered in the training data. When finetuned on limited demonstrations of a new task, these policies often overfit to the specific demonstrations--not only losing their prior abilities to solve a wide variety of generalist tasks but also failing to generalize within the new task itself. In this work, we aim to develop a method that preserves the generalization capabilities of the generalist policy during finetuning, allowing a single policy to robustly incorporate a new skill into its repertoire. Our goal is a single policy that both learns to generalize to variations of the new task and retains the broad competencies gained from pretraining. We show that this can be achieved through a simple yet effective strategy: interpolating the weights of a finetuned model with that of the pretrained model. We show, across extensive simulated and real-world experiments, that such model merging produces a single model that inherits the generalist abilities of the base model and learns to solve the new task robustly, outperforming both the pretrained and finetuned model on out-of-distribution variations of the new task. Moreover, we show that model merging enables continual acquisition of new skills in a lifelong learning setting, without sacrificing previously learned generalist abilities.