DETACH: Cross-domain Learning for Long-Horizon Tasks via Mixture of Disentangled Experts

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches to long-horizon human-scene interaction (HSI) rely on cascading pre-trained subtasks, resulting in strong coupling between environmental observations and agent self-state, thereby limiting generalization. This paper introduces DETACH, the first brain-inspired “where-what” dual-pathway framework for HSI. It fully decouples spatial scene understanding—including object functionality, spatial relations, and scene semantics—from skill execution—including joint degrees of freedom and motion patterns. DETACH implements this via a dual-stream neural architecture, enabling independent modeling and cross-environment/cross-skill transfer learning. Experiments across diverse long-horizon HSI tasks demonstrate that DETACH improves average subtask success rate by 23% and execution efficiency by 29%, significantly surpassing conventional skill-chaining paradigms.

Technology Category

Application Category

📝 Abstract
Long-Horizon (LH) tasks in Human-Scene Interaction (HSI) are complex multi-step tasks that require continuous planning, sequential decision-making, and extended execution across domains to achieve the final goal. However, existing methods heavily rely on skill chaining by concatenating pre-trained subtasks, with environment observations and self-state tightly coupled, lacking the ability to generalize to new combinations of environments and skills, failing to complete various LH tasks across domains. To solve this problem, this paper presents DETACH, a cross-domain learning framework for LH tasks via biologically inspired dual-stream disentanglement. Inspired by the brain's "where-what" dual pathway mechanism, DETACH comprises two core modules: i) an environment learning module for spatial understanding, which captures object functions, spatial relationships, and scene semantics, achieving cross-domain transfer through complete environment-self disentanglement; ii) a skill learning module for task execution, which processes self-state information including joint degrees of freedom and motor patterns, enabling cross-skill transfer through independent motor pattern encoding. We conducted extensive experiments on various LH tasks in HSI scenes. Compared with existing methods, DETACH can achieve an average subtasks success rate improvement of 23% and average execution efficiency improvement of 29%.
Problem

Research questions and friction points this paper is trying to address.

Generalizing long-horizon tasks across new environments and skills
Disentangling environment observations from self-state for cross-domain transfer
Improving success rate and efficiency in multi-step human-scene interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Biologically inspired dual-stream disentanglement framework
Environment-self disentanglement for cross-domain transfer
Independent motor pattern encoding for cross-skill transfer
🔎 Similar Papers
No similar papers found.
Y
Yutong Shen
School of Information Science and Technology, Beijing University of Technology, China
H
Hangxu Liu
School of Information Science and Engineering, Fudan University, China
P
Penghui Liu
School of Information Science and Technology, Beijing University of Technology, China
R
Ruizhe Xia
School of Information Science and Technology, Beijing University of Technology, China
Tianyi Yao
Tianyi Yao
School of Information Science and Technology, Beijing University of Technology, China
Y
Yitong Sun
School of Information Science and Technology, Beijing University of Technology, China
Tongtong Feng
Tongtong Feng
Tsinghua University
Environment LearningAutonomous Embodied AIMultimedia Intelligence