🤖 AI Summary
This work addresses the susceptibility of IMU-provided gravity priors to interference from linear acceleration or vibration, a challenge that existing methods struggle to mitigate using only a single image. To this end, we propose GravCal, an end-to-end feedforward network that fuses image-based independent gravity estimation with a residual correction pathway. A learnable gating mechanism adaptively weights these components, jointly outputting a refined gravity direction and a confidence score correlated with prior quality. GravCal is the first method capable of correcting noisy IMU gravity priors using only a single RGB image. Evaluated on a newly curated dataset of 148K frames, it reduces the mean angular error from 22.02° to 14.24°, with particularly pronounced improvements under severely degraded priors. The predicted confidence effectively guides downstream tasks.
📝 Abstract
Gravity estimation is fundamental to visual-inertial perception, augmented reality, and robotics, yet gravity priors from IMUs are often unreliable under linear acceleration, vibration, and transient motion. Existing methods often estimate gravity directly from images or assume reasonably accurate inertial input, leaving the practical problem of correcting a noisy gravity prior from a single image largely unaddressed.
We present GravCal, a feedforward model for single-image gravity prior calibration. Given one RGB image and a noisy gravity prior, GravCal predicts a corrected gravity direction and a per-sample confidence score. The model combines two complementary predictions, including a residual correction of the input prior and a prior-independent image estimate, and uses a learned gate to fuse them adaptively.
Extensive experiments show strong gains over raw inertial priors: GravCal reduces mean angular error from 22.02° (IMU prior) to 14.24°, with larger improvements when the prior is severely corrupted. We also introduce a novel dataset of over 148K frames with paired VIO-derived ground-truth gravity and Mahony-filter IMU priors across diverse scenes and arbitrary camera orientations. The learned gate also correlates with prior quality, making it a useful confidence signal for downstream systems.