🤖 AI Summary
Existing methods struggle to robustly extract structured 3D representations of deformable objects, and high-quality 3D datasets remain scarce due to reliance on expensive acquisition hardware or labor-intensive manual annotation. This work proposes a low-cost, fully automatic framework that requires only an RGB-D camera and leverages motion consistency constraints to enable unsupervised 3D keypoint detection and robust temporal tracking. The approach produces geometrically consistent and temporally smooth structured representations without any annotated supervision. It significantly outperforms current tracking techniques, achieving notable improvements in both geometric accuracy and stability. Furthermore, the authors release a large-scale, high-quality 3D keypoint trajectory dataset comprising six categories of deformable objects with a total duration of 110 minutes.
📝 Abstract
Structured 3D representations such as keypoints and meshes offer compact, expressive descriptions of deformable objects, jointly capturing geometric and topological information useful for downstream tasks such as dynamics modeling and motion planning. However, robustly extracting such representations remains challenging, as current perception methods struggle to handle complex deformations. Moreover, large-scale 3D data collection remains a bottleneck: existing approaches either require prohibitive data collection efforts, such as labor-intensive annotation or expensive motion capture setups, or rely on simplifying assumptions that break down in unstructured environments. As a result, large-scale 3D datasets and benchmarks for deformable objects remain scarce. To address these challenges, this paper presents an affordable and autonomous framework for collecting 3D datasets of deformable objects using only RGB-D cameras. The proposed method identifies 3D keypoints and robustly tracks their trajectories, incorporating motion consistency constraints to produce temporally smooth and geometrically coherent data. TrackDeform3D is evaluated against several state-of-the-art tracking methods across diverse object categories and demonstrates consistent improvements in both geometric and tracking accuracy. Using this framework, this paper presents a high-quality, large-scale dataset consisting of 6 deformable objects, totaling 110 minutes of trajectory data.