Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work introduces the first method for auditory grounding of hand-object interactions in 3D scenes: given a sequence of hand motion trajectories, it synthesizes realistic, material-aware, and action-consistent audio to endow static 3D reconstructions with interactive auditory capabilities. Methodologically, it pioneers direct mapping of hand poses to material-discriminative sounds; proposes an end-to-end cross-modal generative framework based on rectified flow, enabling controllable audio synthesis from arbitrarily parameterized hand poses; and integrates 3D hand pose estimation, multimodal action–sound alignment training, and physics-informed audio rendering priors. Experiments demonstrate high fidelity in both material identity and action type classification. In user studies, participants achieved ≈52% material–action discrimination accuracy—near chance level—indicating perceptual realism far surpassing existing baselines.

Technology Category

Application Category

📝 Abstract
We study the problem of making 3D scene reconstructions interactive by asking the following question: can we predict the sounds of human hands physically interacting with a scene? First, we record a video of a human manipulating objects within a 3D scene using their hands. We then use these action-sound pairs to train a rectified flow model to map 3D hand trajectories to their corresponding audio. At test time, a user can query the model for other actions, parameterized as sequences of hand poses, to estimate their corresponding sounds. In our experiments, we find that our generated sounds accurately convey material properties and actions, and that they are often indistinguishable to human observers from real sounds. Project page: https://www.yimingdou.com/hearing_hands/
Problem

Research questions and friction points this paper is trying to address.

Predict sounds from hand interactions in 3D scenes
Map 3D hand trajectories to corresponding audio
Generate realistic sounds for material and action properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

Records video of hands manipulating 3D objects
Trains rectified flow model for audio mapping
Generates sounds from hand pose sequences
🔎 Similar Papers
No similar papers found.