Why Latent Actions Fail, and How to Prevent It

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses the challenge posed by exogenous states—such as cluttered backgrounds in unlabeled videos—that interfere with an agent’s ability to reliably model its own actions. Extending linear latent action models, the study theoretically demonstrates for the first time that standard reconstruction objectives cause latent actions to encode future exogenous information. To mitigate this issue, the authors propose learning representations that prioritize endogenous dynamics. They show that incorporating auxiliary supervision signals, such as action labels, helps maintain consistency in latent actions despite exogenous interference. The analysis is validated across a range of linear and nonlinear models, offering a unified explanation for both the detrimental effects of exogenous states and the mechanisms underlying existing mitigation strategies.
📝 Abstract
Latent action models (LAMs) aim to learn action-like representations from unlabeled videos by compressing frame-to-frame changes. The frames of in-the-wild videos, however, contain not only the agent's own state but exogenous state such as background clutter. Since the exogenous state introduces changes unrelated to actions, it hinders reliable latent action learning. This paper investigates this problem analytically by extending a linear LAM framework to explicitly model exogenous state. Our analysis reveals two insights: (1) minimizing the standard reconstruction objective produces latent actions that encode exogenous information from future observation; and (2) learning in a representation space that focuses on endogenous components is a key to mitigating the interference of noise. We further show that previously proposed auxiliary objectives, such as action-supervision, provably encourage latent actions to be consistent across exogenous states. These findings are validated through experiments on both linear and nonlinear LAMs, providing a unified theoretical analysis of how exogenous state hinders latent action learning and why common remedies work.
Problem

Research questions and friction points this paper is trying to address.

latent actions
exogenous state
unlabeled videos
action representation
background clutter
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent action models
exogenous state
endogenous representation
reconstruction objective
action-supervision
🔎 Similar Papers
No similar papers found.