Test-Time Mixture of World Models for Embodied Agents in Dynamic Environments

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing language model–driven embodied agents struggle to construct flexible and accurate world models in dynamic environments, hindering effective reasoning and decision-making. To address this limitation, this work proposes Test-time Mixture of Worlds (TMoW), a framework that transcends the static routing constraints of conventional Mixture-of-Experts by introducing a multi-granularity prototype-based routing mechanism. TMoW dynamically optimizes expert routing and aligns reasoning features during inference, while leveraging distillation-driven few-shot model expansion to enable continual adaptation to unseen and evolving environments. Experimental results demonstrate that TMoW significantly improves zero-shot adaptation and few-shot generalization across diverse embodied benchmarks—including VirtualHome, ALFWorld, and RLBench—thereby enhancing the operational efficiency of embodied agents in dynamic settings.

Technology Category

Application Category

📝 Abstract

Language model (LM)-based embodied agents are increasingly deployed in real-world settings. Yet, their adaptability remains limited in dynamic environments, where constructing accurate and flexible world models is crucial for effective reasoning and decision-making. To address this challenge, we extend the Mixture-of-Experts (MoE) paradigm to embodied agents. While conventional MoE architectures modularize knowledge into expert components with pre-trained routing, they remain rigid once deployed, making them less effective for adapting to unseen domains in dynamic environments. We therefore propose Test-time Mixture of World Models (TMoW), a framework that enhances adaptability to unseen and evolving domains. TMoW updates its routing function over world models at test time, unlike conventional MoE where the function remains fixed, enabling agents to recombine existing models and integrate new ones for continual adaptation. It achieves this through (i) multi-granular prototype-based routing, which adapts mixtures across object- to scene-level similarities, (ii) test-time refinement that aligns unseen domain features with prototypes during inference, and (iii) distilled mixture-based augmentation, which efficiently constructs new models from few-shot data and existing prototypes. We evaluate TMoW on VirtualHome, ALFWorld, and RLBench benchmarks, demonstrating strong performance in both zero-shot adaptation and few-shot expansion scenarios, and showing that it enables embodied agents to operate effectively in dynamic environments.

Problem

Research questions and friction points this paper is trying to address.

embodied agents

dynamic environments

world models

adaptability

test-time adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time Adaptation

Mixture of Experts

World Models