HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work addresses the challenge of generating temporally coherent and physically plausible 3D hand–object interaction sequences from textual descriptions and canonical 3D object geometries. To this end, the authors propose a unified latent space that jointly models hand–object dynamics, integrating an interaction-aware variational autoencoder, masked flow matching, and autoregressive temporal modeling. A key innovation is the introduction of a relative-to-initial-frame object motion prediction mechanism, which substantially enhances the model’s generalization capability and long-term temporal consistency. Evaluated on the GRAB, OakInk, and DexYCB benchmarks, the method achieves state-of-the-art performance, producing diverse and naturalistic interactions while maintaining physical plausibility.

Technology Category

Application Category

📝 Abstract
Generating realistic 3D hand-object interactions (HOI) is a fundamental challenge in computer vision and robotics, requiring both temporal coherence and high-fidelity physical plausibility. Existing methods remain limited in their ability to learn expressive motion representations for generation and perform temporal reasoning. In this paper, we present HO-Flow, a framework for synthesizing realistic hand-object motion sequences from texts and canoncial 3D objects. HO-Flow first employs an interaction-aware variational autoencoder to encode sequences of hand and object motions into a unified latent manifold by incorporating hand and object kinematics, enabling the representation to capture rich interaction dynamics. It then leverages a masked flow matching model that combines auto-regressive temporal reasoning with continuous latent generation, improving temporal coherence. To further enhance generalization, HO-Flow predicts object motions relative to the initial frame, enabling effective pre-training on large-scale synthetic data. Experiments on the GRAB, OakInk, and DexYCB benchmarks demonstrate that HO-Flow achieves state-of-the-art performance in both physical plausibility and motion diversity for interaction motion synthesis.
Problem

Research questions and friction points this paper is trying to address.

hand-object interaction
motion generation
temporal coherence
physical plausibility
3D interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent flow matching
hand-object interaction
temporal coherence
variational autoencoder
motion synthesis
🔎 Similar Papers
No similar papers found.