Factored Latent Action World Models

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches to learning holistic latent actions from actionless videos struggle to model the complex dynamics of concurrent multi-entity interactions. This work proposes a factorized dynamics framework that introduces, for the first time, a factorized latent action structure to disentangle the scene into independent factors. Each factor independently infers its own latent action and predicts the next state through factorized inverse and forward dynamics models trained on actionless video data. By moving beyond monolithic modeling assumptions, this approach substantially enhances the expressiveness and controllability of world models. Experiments on both simulated and real-world multi-entity datasets demonstrate consistent improvements over current methods in terms of prediction accuracy, representation quality, and performance on downstream policy learning tasks.

Technology Category

Application Category

📝 Abstract
Learning latent actions from action-free video has emerged as a powerful paradigm for scaling up controllable world model learning. Latent actions provide a natural interface for users to iteratively generate and manipulate videos. However, most existing approaches rely on monolithic inverse and forward dynamics models that learn a single latent action to control the entire scene, and therefore struggle in complex environments where multiple entities act simultaneously. This paper introduces Factored Latent Action Model (FLAM), a factored dynamics framework that decomposes the scene into independent factors, each inferring its own latent action and predicting its own next-step factor value. This factorized structure enables more accurate modeling of complex multi-entity dynamics and improves video generation quality in action-free video settings compared to monolithic models. Based on experiments on both simulation and real-world multi-entity datasets, we find that FLAM outperforms prior work in prediction accuracy and representation quality, and facilitates downstream policy learning, demonstrating the benefits of factorized latent action models.
Problem

Research questions and friction points this paper is trying to address.

latent actions
action-free video
multi-entity dynamics
world models
factored dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Factored Latent Action
World Models
Action-Free Video
Multi-Entity Dynamics
Latent Action Modeling
🔎 Similar Papers
No similar papers found.