Latent Action World Models for Control with Unlabeled Trajectories

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the scarcity of action labels in world model training by proposing the Latent-Action World Model (LAWM), which jointly leverages a small set of action-labeled interaction trajectories and abundant unlabeled passive observations (e.g., videos). Methodologically, LAWM learns an action-invariant dynamical representation in a latent space, maps explicit actions into this latent action space, and performs cross-modal latent dynamics modeling via action-observation alignment. It innovatively unifies offline reinforcement learning with purely passive data training—marking the first approach to extend world model data sources and generalization capability under extreme action-label scarcity. Evaluated on the DeepMind Control Suite, LAWM achieves performance comparable to fully supervised baselines using only 10% of action annotations, demonstrating substantial improvements in data efficiency.

Technology Category

Application Category

📝 Abstract

Inspired by how humans combine direct interaction with action-free experience (e.g., videos), we study world models that learn from heterogeneous data. Standard world models typically rely on action-conditioned trajectories, which limits effectiveness when action labels are scarce. We introduce a family of latent-action world models that jointly use action-conditioned and action-free data by learning a shared latent action representation. This latent space aligns observed control signals with actions inferred from passive observations, enabling a single dynamics model to train on large-scale unlabeled trajectories while requiring only a small set of action-labeled ones. We use the latent-action world model to learn a latent-action policy through offline reinforcement learning (RL), thereby bridging two traditionally separate domains: offline RL, which typically relies on action-conditioned data, and action-free training, which is rarely used with subsequent RL. On the DeepMind Control Suite, our approach achieves strong performance while using about an order of magnitude fewer action-labeled samples than purely action-conditioned baselines. These results show that latent actions enable training on both passive and interactive data, which makes world models learn more efficiently.

Problem

Research questions and friction points this paper is trying to address.

Learns world models from both action-labeled and action-free data.

Reduces reliance on scarce action-labeled trajectories for training.

Enables offline RL using large-scale unlabeled and few labeled samples.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent action representation for heterogeneous data

Single dynamics model trained on unlabeled trajectories

Offline RL bridging action-conditioned and action-free domains

🔎 Similar Papers

PWM: Policy Learning with Multi-Task World Models