Object-Centric Latent Action Learning

πŸ“… 2025-02-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Internet-scale videos remain underutilized in embodied AI due to the absence of action annotations and pervasive action-irrelevant distractors. Method: This paper proposes an object-centric self-supervised latent action learning framework. It introduces a novel agent–object causal interaction disentanglement mechanism driven by object decomposition to generate high-fidelity action pseudo-labels and suppress background interference. Integrating VideoSaur for scene parsing and LAPO for object representation, the framework unifies self-supervised object decomposition, latent action pretraining, and few-shot fine-tuning. Results: On the Distracting Control Suite, the method improves latent action reconstruction quality by 2.7Γ— and boosts average downstream task return by 2.6Γ—, significantly advancing annotation-free video-driven embodied intelligence.

Technology Category

Application Category

πŸ“ Abstract
Leveraging vast amounts of internet video data for Embodied AI is currently bottle-necked by the lack of action annotations and the presence of action-correlated distractors. We propose a novel object-centric latent action learning approach, based on VideoSaur and LAPO, that employs self-supervised decomposition of scenes into object representations and annotates video data with proxy-action labels. This method effectively disentangles causal agent-object interactions from irrelevant background noise and reduces the performance degradation of latent action learning approaches caused by distractors. Our preliminary experiments with the Distracting Control Suite show that latent action pretraining based on object decompositions improve the quality of inferred latent actions by x2.7 and efficiency of downstream fine-tuning with a small set of labeled actions, increasing return by x2.6 on average.
Problem

Research questions and friction points this paper is trying to address.

Lack of action annotations
Presence of action-correlated distractors
Performance degradation in latent action learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-centric latent action learning
Self-supervised scene decomposition
Proxy-action label annotation
πŸ”Ž Similar Papers
No similar papers found.