๐ค AI Summary
Behavior Foundation Models (BFMs) suffer from heavy reliance on manual prompt engineering for task-specific adaptation, struggling to balance generalization and customization. Method: We propose Task Tokensโa lightweight, trainable task encoder that maps multimodal observations (e.g., text, vision) end-to-end into plug-and-play task tokens, which serve as conditional inputs to dynamically guide a frozen pre-trained BFM for target control tasks. Contribution/Results: This is the first approach enabling zero-shot, compositionally scalable task adaptation of BFMs without fine-tuning. It supports user prior injection and unifies reward modeling with prompt design, while fully preserving the original modelโs multimodal behavioral diversity. Experiments demonstrate significant improvements in zero-shot performance across diverse embodied control tasks, with notably enhanced robustness in out-of-distribution scenarios.
๐ Abstract
Recent advancements in imitation learning have led to transformer-based behavior foundation models (BFMs) that enable multi-modal, human-like control for humanoid agents. While excelling at zero-shot generation of robust behaviors, BFMs often require meticulous prompt engineering for specific tasks, potentially yielding suboptimal results. We introduce"Task Tokens", a method to effectively tailor BFMs to specific tasks while preserving their flexibility. Our approach leverages the transformer architecture of BFMs to learn a new task-specific encoder through reinforcement learning, keeping the original BFM frozen. This allows incorporation of user-defined priors, balancing reward design and prompt engineering. By training a task encoder to map observations to tokens, used as additional BFM inputs, we guide performance improvement while maintaining the model's diverse control characteristics. We demonstrate Task Tokens' efficacy across various tasks, including out-of-distribution scenarios, and show their compatibility with other prompting modalities. Our results suggest that Task Tokens offer a promising approach for adapting BFMs to specific control tasks while retaining their generalization capabilities.