Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Behavioral foundation models (BFMs) exhibit severely limited zero-shot generalization under dynamics shifts, hindering their deployment in real-world robotic applications. To address this, we propose a Forward-Backward (FB) adaptive representation framework. Our method introduces a novel Transformer-based belief estimator that implicitly models unknown dynamics, coupled with unsupervised clustering of dynamics-aware policy embeddings—effectively decoupling policy representations from environmental dynamics to enhance cross-dynamics zero-shot transfer. Crucially, the approach requires no fine-tuning. Evaluated on both discrete and continuous control benchmarks, it achieves zero-shot returns twice those of state-of-the-art baselines. This significantly improves BFMs’ robustness and generalization to unseen dynamics at test time.

Technology Category

Application Category

📝 Abstract

Behavioral Foundation Models (BFMs) proved successful in producing policies for arbitrary tasks in a zero-shot manner, requiring no test-time training or task-specific fine-tuning. Among the most promising BFMs are the ones that estimate the successor measure learned in an unsupervised way from task-agnostic offline data. However, these methods fail to react to changes in the dynamics, making them inefficient under partial observability or when the transition function changes. This hinders the applicability of BFMs in a real-world setting, e.g., in robotics, where the dynamics can unexpectedly change at test time. In this work, we demonstrate that Forward-Backward (FB) representation, one of the methods from the BFM family, cannot distinguish between distinct dynamics, leading to an interference among the latent directions, which parametrize different policies. To address this, we propose a FB model with a transformer-based belief estimator, which greatly facilitates zero-shot adaptation. We also show that partitioning the policy encoding space into dynamics-specific clusters, aligned with the context-embedding directions, yields additional gain in performance. These traits allow our method to respond to the dynamics observed during training and to generalize to unseen ones. Empirically, in the changing dynamics setting, our approach achieves up to a 2x higher zero-shot returns compared to the baselines for both discrete and continuous tasks.

Problem

Research questions and friction points this paper is trying to address.

BFMs fail to adapt to unseen dynamics changes

FB representation causes interference among latent policy directions

Need for zero-shot adaptation in real-world dynamic settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based belief estimator for adaptation

Dynamics-specific policy encoding space partitioning

Zero-shot adaptation to unseen dynamics

🔎 Similar Papers

Meta-Dynamical State Space Models for Integrative Neural Data Analysis