ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing approaches to long-horizon human-scene interaction tasks exhibit limited generalization across domains due to their tight coupling of environmental observations with agent states, which hinders flexible composition of novel environments and skills. This work proposes ALAS, a novel framework that, for the first time, introduces an asynchronous decoupling mechanism inspired by the brain’s “where–what” dual pathways into long-horizon task learning. Specifically, an environment-learning module models object affordances, spatial relationships, and scene semantics, while a skill-learning module independently encodes joint degrees of freedom and motion patterns, achieving complete decoupling between environment and state representations. This design substantially enhances cross-environment and cross-skill transferability, yielding an average 23% improvement in subtask success rate and a 29% gain in execution efficiency across diverse long-horizon interaction tasks.

Technology Category

Application Category

📝 Abstract

Long-Horizon (LH) tasks in Human-Scene Interaction (HSI) are complex multi-step tasks that require continuous planning, sequential decision-making, and extended execution across domains to achieve the final goal. However, existing methods heavily rely on skill chaining by concatenating pre-trained subtasks, with environment observations and self-state tightly coupled, lacking the ability to generalize to new combinations of environments and skills, failing to complete various LH tasks across domains. To solve this problem, this paper presents ALAS, a cross-domain learning framework for LH tasks via biologically inspired dual-stream disentanglement. Inspired by the brain's "where-what" dual pathway mechanism, ALAS comprises two core modules: i) an environment learning module for spatial understanding, which captures object functions, spatial relationships, and scene semantics, achieving cross-domain transfer through complete environment-self disentanglement; ii) a skill learning module for task execution, which processes self-state information including joint degrees of freedom and motor patterns, enabling cross-skill transfer through independent motor pattern encoding. We conducted extensive experiments on various LH tasks in HSI scenes. Compared with existing methods, ALAS can achieve an average subtasks success rate improvement of 23\% and average execution efficiency improvement of 29\%.

Problem

Research questions and friction points this paper is trying to address.

Long-Horizon tasks

Human-Scene Interaction

cross-domain generalization

skill chaining

environment-self coupling

Innovation

Methods, ideas, or system contributions that make the work stand out.

dual-stream disentanglement

cross-domain transfer

long-horizon action synthesis