🤖 AI Summary
This paper addresses the challenge of jointly modeling geometry and dynamics of a single rigid body under severe occlusion, without tactile/force sensors or pre-trained models. The method introduces “physible geometry”—a novel concept that inverts robot motion observations into implicit contact constraints, enabling joint optimization of visible geometric structure (reconstructed via BundleSDF) and implicit contact dynamics (learned via an odometry-driven Physics Learning Library) within the signed distance function (SDF) space. By tightly integrating RGB-D visual tracking with proprioceptive sensing, it achieves physically consistent co-optimization of geometry and dynamics under SDF representation. Experiments demonstrate significant improvements over vision-only baselines: 37% reduction in geometric reconstruction error and 52% reduction in dynamic prediction error—marking a substantial advance in occlusion-robust physical modeling.
📝 Abstract
We introduce Vysics, a vision-and-physics framework for a robot to build an expressive geometry and dynamics model of a single rigid body, using a seconds-long RGBD video and the robot's proprioception. While the computer vision community has built powerful visual 3D perception algorithms, cluttered environments with heavy occlusions can limit the visibility of objects of interest. However, observed motion of partially occluded objects can imply physical interactions took place, such as contact with a robot or the environment. These inferred contacts can supplement the visible geometry with"physible geometry,"which best explains the observed object motion through physics. Vysics uses a vision-based tracking and reconstruction method, BundleSDF, to estimate the trajectory and the visible geometry from an RGBD video, and an odometry-based model learning method, Physics Learning Library (PLL), to infer the"physible"geometry from the trajectory through implicit contact dynamics optimization. The visible and"physible"geometries jointly factor into optimizing a signed distance function (SDF) to represent the object shape. Vysics does not require pretraining, nor tactile or force sensors. Compared with vision-only methods, Vysics yields object models with higher geometric accuracy and better dynamics prediction in experiments where the object interacts with the robot and the environment under heavy occlusion. Project page: https://vysics-vision-and-physics.github.io/