OnlineHOI: Towards Online Human-Object Interaction Generation and Perception

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

Existing HOI perception and generation methods operate under an offline setting—assuming full-sequence access—which limits their applicability to real-world online scenarios where only current and historical observations are available. This work introduces, for the first time, two novel tasks: online HOI perception and online HOI generation, explicitly designed to model temporal dependencies and state evolution in streaming human-object interactions. To this end, we propose a memory-augmented framework built upon the Mamba architecture, leveraging its linear-complexity state-space modeling capability to efficiently encode historical interactions and update latent states in real time. Evaluated on Core4D and OAKINK2 (for online generation) and HOI4D (for online perception), our method significantly outperforms offline baselines and achieves state-of-the-art performance, demonstrating both the feasibility and effectiveness of online HOI modeling.

Technology Category

Application Category

📝 Abstract

The perception and generation of Human-Object Interaction (HOI) are crucial for fields such as robotics, AR/VR, and human behavior understanding. However, current approaches model this task in an offline setting, where information at each time step can be drawn from the entire interaction sequence. In contrast, in real-world scenarios, the information available at each time step comes only from the current moment and historical data, i.e., an online setting. We find that offline methods perform poorly in an online context. Based on this observation, we propose two new tasks: Online HOI Generation and Perception. To address this task, we introduce the OnlineHOI framework, a network architecture based on the Mamba framework that employs a memory mechanism. By leveraging Mamba's powerful modeling capabilities for streaming data and the Memory mechanism's efficient integration of historical information, we achieve state-of-the-art results on the Core4D and OAKINK2 online generation tasks, as well as the online HOI4D perception task.

Problem

Research questions and friction points this paper is trying to address.

Addressing online human-object interaction generation and perception

Overcoming limitations of offline methods in real-time scenarios

Proposing a memory-based Mamba framework for streaming data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba framework for streaming data

Memory mechanism for historical integration

OnlineHOI architecture for real-time processing

🔎 Similar Papers

A Review of Human-Object Interaction Detection

2024-08-202024 2nd International Conference on Computer, Vision and Intelligent Technology (ICCVIT)Citations: 2

FreeA: Human-object Interaction Detection using Free Annotation Labels

2024-03-04Citations: 0

Bosch Group

ARENA2036 in Stuttgart

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)