Task Tokens: A Flexible Approach to Adapting Behavior Foundation Models

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Behavior Foundation Models (BFMs) suffer from heavy reliance on manual prompt engineering for task-specific adaptation, struggling to balance generalization and customization. Method: We propose Task Tokens—a lightweight, trainable task encoder that maps multimodal observations (e.g., text, vision) end-to-end into plug-and-play task tokens, which serve as conditional inputs to dynamically guide a frozen pre-trained BFM for target control tasks. Contribution/Results: This is the first approach enabling zero-shot, compositionally scalable task adaptation of BFMs without fine-tuning. It supports user prior injection and unifies reward modeling with prompt design, while fully preserving the original model’s multimodal behavioral diversity. Experiments demonstrate significant improvements in zero-shot performance across diverse embodied control tasks, with notably enhanced robustness in out-of-distribution scenarios.

Technology Category

Application Category

📝 Abstract

Recent advancements in imitation learning have led to transformer-based behavior foundation models (BFMs) that enable multi-modal, human-like control for humanoid agents. While excelling at zero-shot generation of robust behaviors, BFMs often require meticulous prompt engineering for specific tasks, potentially yielding suboptimal results. We introduce"Task Tokens", a method to effectively tailor BFMs to specific tasks while preserving their flexibility. Our approach leverages the transformer architecture of BFMs to learn a new task-specific encoder through reinforcement learning, keeping the original BFM frozen. This allows incorporation of user-defined priors, balancing reward design and prompt engineering. By training a task encoder to map observations to tokens, used as additional BFM inputs, we guide performance improvement while maintaining the model's diverse control characteristics. We demonstrate Task Tokens' efficacy across various tasks, including out-of-distribution scenarios, and show their compatibility with other prompting modalities. Our results suggest that Task Tokens offer a promising approach for adapting BFMs to specific control tasks while retaining their generalization capabilities.

Problem

Research questions and friction points this paper is trying to address.

Adapts behavior foundation models to specific tasks efficiently

Reduces reliance on meticulous prompt engineering for tasks

Maintains model flexibility and generalization capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task Tokens adapt BFMs via reinforcement learning

Frozen BFM with task-specific encoder training

Balances reward design and prompt engineering

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey