XIRVIO: Critic-guided Iterative Refinement for Visual-Inertial Odometry with Explainable Adaptive Weighting

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient interpretability of monocular visual-inertial odometry (VIO) in safety-critical applications, this paper proposes a Transformer-based generative adversarial framework for error-driven iterative optimization. The method introduces a critic-guided multi-round pose trajectory refinement mechanism and, for the first time, enables self-emergent learning of sensor-specific weighting coefficients—facilitating physically interpretable visualization of visual versus inertial modality contributions. By jointly modeling image sequences and 6-DoF inertial measurements, the framework integrates feature fusion, dynamic modality weighting, and adversarial training to significantly improve both prediction accuracy and decision transparency. Evaluated on the KITTI dataset, the approach achieves state-of-the-art translational and rotational accuracy among learning-based VIO methods, while additionally providing verifiable perceptual attention analysis.

Technology Category

Application Category

📝 Abstract
We introduce XIRVIO, a transformer-based Generative Adversarial Network (GAN) framework for monocular visual inertial odometry (VIO). By taking sequences of images and 6-DoF inertial measurements as inputs, XIRVIO's generator predicts pose trajectories through an iterative refinement process which are then evaluated by the critic to select the iteration with the optimised prediction. Additionally, the self-emergent adaptive sensor weighting reveals how XIRVIO attends to each sensory input based on contextual cues in the data, making it a promising approach for achieving explainability in safety-critical VIO applications. Evaluations on the KITTI dataset demonstrate that XIRVIO matches well-known state-of-the-art learning-based methods in terms of both translation and rotation errors.
Problem

Research questions and friction points this paper is trying to address.

Develops a transformer-based GAN for monocular visual-inertial odometry.
Uses iterative refinement and critic-guided evaluation for pose prediction.
Achieves explainable adaptive sensor weighting for safety-critical VIO applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based GAN for monocular VIO
Iterative refinement with critic-guided optimization
Explainable adaptive sensor weighting mechanism
🔎 Similar Papers
No similar papers found.