Hierarchical Latent Action Model

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the challenge that existing latent action models struggle to capture long-term temporal structures and high-level skills in videos lacking explicit action labels. To overcome this limitation, the paper proposes a Hierarchical Latent Action Model that introduces a novel hierarchical architecture: it leverages a pretrained low-level latent action model to extract fundamental dynamic patterns and employs a sequential aggregation mechanism to automatically discover high-level latent skills. This design enables the model to effectively capture long-range temporal dependencies. Experimental results demonstrate that the proposed approach significantly outperforms current baselines on dynamic skill discovery tasks, exhibiting superior robustness and enhanced capability in modeling long-horizon temporal dynamics.

Technology Category

Application Category

📝 Abstract

Latent Action Models (LAMs) enable learning from actionless data for applications ranging from robotic control to interactive world models. However, existing LAMs typically focus on short-horizon frame transitions and capture low-level motion while overlooking longer-term temporal structure. In contrast, actionless videos often contain temporally extended and high-level skills. We present HiLAM, a hierarchical latent action model that discovers latent skills by modeling long-term temporal information. To capture these dependencies across long horizons, we utilize a pretrained LAM as a low-level extractor. This architecture aggregates latent action sequences, which contain the underlying dynamic patterns of the video, into high-level latent skills. Our experiments demonstrate that HiLAM improves over the baseline and exhibits robust dynamic skill discovery.

Problem

Research questions and friction points this paper is trying to address.

Latent Action Models

long-term temporal structure

hierarchical skills

actionless video

temporal dependencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Latent Action Model

Long-term Temporal Modeling

Latent Skill Discovery