🤖 AI Summary
This work addresses the challenging problem of accurately estimating object mass using only visual sensors—specifically RGB-D and sparse depth inputs. To this end, we propose an end-to-end deep learning framework that fuses RGB images with sparse point clouds. Our method leverages GLPDepth to generate high-fidelity dense depth maps and constructs a large-scale synthetic RGB-D dataset based on ShapeNetSem to augment real-data training. Crucially, this is the first work to systematically apply a synthetic-data-driven paradigm to mass estimation, enabling joint cross-modal feature alignment and physics-aware modeling of physical attributes. On standard benchmarks, our approach achieves significant improvements over state-of-the-art methods in terms of MAE and RMSE. All models and data-generation code are publicly released, empirically validating the effectiveness and generalizability of synthetic data for low-resource physical property estimation.
📝 Abstract
Inertial mass plays a crucial role in robotic applications such as object grasping, manipulation, and simulation, providing a strong prior for planning and control. Accurately estimating an object's mass before interaction can significantly enhance the performance of various robotic tasks. However, mass estimation using only vision sensors is a relatively underexplored area. This paper proposes a novel approach combining sparse point-cloud data from depth images with RGB images to estimate the mass of objects. We evaluate a range of point-cloud processing architectures, alongside RGB-only methods. To overcome the limited availability of training data, we create a synthetic dataset using ShapeNetSem 3D models, simulating RGBD images via a Kinect camera. This synthetic data is used to train an image generation model for estimating dense depth maps, which we then use to augment an existing dataset of images paired with mass values. Our approach significantly outperforms existing benchmarks across all evaluated metrics. The data generation (https://github.com/RavineWindteer/ShapenetSem-to-RGBD) as well as the training of the depth estimator (https://github.com/RavineWindteer/GLPDepth-Edited) and the mass estimator (https://github.com/RavineWindteer/Depth-mass-estimator) are available online.