Estimating Object Physical Properties from RGB-D Vision and Depth Robot Sensors Using Deep Learning

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenging problem of accurately estimating object mass using only visual sensors—specifically RGB-D and sparse depth inputs. To this end, we propose an end-to-end deep learning framework that fuses RGB images with sparse point clouds. Our method leverages GLPDepth to generate high-fidelity dense depth maps and constructs a large-scale synthetic RGB-D dataset based on ShapeNetSem to augment real-data training. Crucially, this is the first work to systematically apply a synthetic-data-driven paradigm to mass estimation, enabling joint cross-modal feature alignment and physics-aware modeling of physical attributes. On standard benchmarks, our approach achieves significant improvements over state-of-the-art methods in terms of MAE and RMSE. All models and data-generation code are publicly released, empirically validating the effectiveness and generalizability of synthetic data for low-resource physical property estimation.

Technology Category

Application Category

📝 Abstract

Inertial mass plays a crucial role in robotic applications such as object grasping, manipulation, and simulation, providing a strong prior for planning and control. Accurately estimating an object's mass before interaction can significantly enhance the performance of various robotic tasks. However, mass estimation using only vision sensors is a relatively underexplored area. This paper proposes a novel approach combining sparse point-cloud data from depth images with RGB images to estimate the mass of objects. We evaluate a range of point-cloud processing architectures, alongside RGB-only methods. To overcome the limited availability of training data, we create a synthetic dataset using ShapeNetSem 3D models, simulating RGBD images via a Kinect camera. This synthetic data is used to train an image generation model for estimating dense depth maps, which we then use to augment an existing dataset of images paired with mass values. Our approach significantly outperforms existing benchmarks across all evaluated metrics. The data generation (https://github.com/RavineWindteer/ShapenetSem-to-RGBD) as well as the training of the depth estimator (https://github.com/RavineWindteer/GLPDepth-Edited) and the mass estimator (https://github.com/RavineWindteer/Depth-mass-estimator) are available online.

Problem

Research questions and friction points this paper is trying to address.

Estimating object mass from RGB-D and depth sensors

Overcoming limited training data with synthetic datasets

Improving robotic tasks via accurate mass prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines RGB and sparse point-cloud data

Uses synthetic dataset for training

Generates dense depth maps for augmentation

🔎 Similar Papers

No similar papers found.

Bosch Group

Stuttgart, Germany

Master Thesis Data-Efficient Hybrid Machine Learning for Robust Vibration System Prediction

Bosch Group

Renningen, BW, DE

Research Scientist, Sensor and Systems Robotics (PhD)