🤖 AI Summary
This work addresses the limitations of traditional radar-inertial odometry, which relies on handcrafted signal processing that often discards raw IQ spectrum information and suffers degraded velocity estimation performance in sparse point cloud scenarios—particularly during lateral motion. The paper presents the first end-to-end learning framework that directly estimates either ego-vehicle linear velocity or angle-resolved Doppler velocity maps from millimeter-wave radar 4D IQ spectrogram cubes, integrating IMU preintegration into a sliding-window pose graph. Built upon a GRT-based Transformer architecture, the method leverages geometric pretraining, followed by velocity/Doppler fine-tuning and negative log-likelihood uncertainty calibration. Evaluated on the IQ1M dataset, it achieves the lowest relative pose error among existing approaches and significantly outperforms conventional digital signal processing baselines on unseen lateral trajectories.
📝 Abstract
We present UNRIO, an uncertainty-aware radar-inertial odometry system that estimates ego-velocity directly from raw mmWave radar IQ signals rather than processed point clouds. Existing radar-inertial odometry methods rely on handcrafted signal processing pipelines that discard latent information in the raw spectrum and require careful parameter tuning. To address this, we propose a transformer-based neural network built on the GRT architecture that processes the full 4-D spectral cube to predict body-frame velocity in two modes: a direct linear velocity estimate and a per-anglebin Doppler velocity map. The network is trained in three stages: geometric pretraining on LiDAR-projected depth, velocity or Doppler fine-tuning, and uncertainty calibration via negative log-likelihood loss, enabling it to produce uncertainty estimates alongside its predictions. These uncertainty estimates are propagated into a sliding-window pose graph that fuses radar velocity factors with IMU preintegration measurements. We train and evaluate UNRIO on the IQ1M dataset across diverse indoor environments with both forward and lateral motion patterns unseen during training. Our method achieves the lowest relative pose error on the majority of sequences, with particularly strong gains over classical DSP baselines on Lateral-motion trajectories where sparse point clouds degrade conventional velocity estimators.