OnlineHOI: Towards Online Human-Object Interaction Generation and Perception

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing HOI perception and generation methods operate under an offline setting—assuming full-sequence access—which limits their applicability to real-world online scenarios where only current and historical observations are available. This work introduces, for the first time, two novel tasks: online HOI perception and online HOI generation, explicitly designed to model temporal dependencies and state evolution in streaming human-object interactions. To this end, we propose a memory-augmented framework built upon the Mamba architecture, leveraging its linear-complexity state-space modeling capability to efficiently encode historical interactions and update latent states in real time. Evaluated on Core4D and OAKINK2 (for online generation) and HOI4D (for online perception), our method significantly outperforms offline baselines and achieves state-of-the-art performance, demonstrating both the feasibility and effectiveness of online HOI modeling.

Technology Category

Application Category

📝 Abstract
The perception and generation of Human-Object Interaction (HOI) are crucial for fields such as robotics, AR/VR, and human behavior understanding. However, current approaches model this task in an offline setting, where information at each time step can be drawn from the entire interaction sequence. In contrast, in real-world scenarios, the information available at each time step comes only from the current moment and historical data, i.e., an online setting. We find that offline methods perform poorly in an online context. Based on this observation, we propose two new tasks: Online HOI Generation and Perception. To address this task, we introduce the OnlineHOI framework, a network architecture based on the Mamba framework that employs a memory mechanism. By leveraging Mamba's powerful modeling capabilities for streaming data and the Memory mechanism's efficient integration of historical information, we achieve state-of-the-art results on the Core4D and OAKINK2 online generation tasks, as well as the online HOI4D perception task.
Problem

Research questions and friction points this paper is trying to address.

Addressing online human-object interaction generation and perception
Overcoming limitations of offline methods in real-time scenarios
Proposing a memory-based Mamba framework for streaming data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba framework for streaming data
Memory mechanism for historical integration
OnlineHOI architecture for real-time processing
🔎 Similar Papers
No similar papers found.
Y
Yihong Ji
College of Computer Science and Software Engineering, Shenzhen University, Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)
Yunze Liu
Yunze Liu
IIIS, Tsinghua University; Memories.ai Research
AI Memories3D Computer VisionEmbodied AIEgocentric Video
Y
Yiyao Zhuo
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)
Weijiang Yu
Weijiang Yu
Associate Professor, CSE, Sun Yat-sen University
Machine LearningMultimodal AIAI for Science
F
Fei Ma
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)
J
Joshua Zhexue Huang
Shenzhen University
F
Fei Yu
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)