MCVO: A Generic Visual Odometry for Arbitrarily Arranged Multi-Cameras

📅 2024-12-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of unobservable scale, unstable initialization, and unreliable loop closure in multi-camera SLAM—stemming from arbitrary camera configurations—this paper proposes the first end-to-end visual odometry framework designed for generic multi-camera setups. Methodologically, it introduces a novel learning-driven framework that jointly models multi-stream feature extraction and inter-camera rigid-motion constraints, enabling online scale initialization and refinement. The approach integrates learned feature tracking, multi-camera rigid-body motion priors, multi-source feature map optimization, and multi-view loop closure detection. Evaluated on KITTI-360 and our newly introduced MultiCamData benchmark, the method significantly outperforms existing stereo and multi-camera SLAM systems in pose accuracy, robustness to wide-field-of-view and texture-deprived scenes, and configurational flexibility—requiring no predefined camera geometry. Code and an interactive online demo are publicly available.

Technology Category

Application Category

📝 Abstract
Making multi-camera visual SLAM systems easier to set up and more robust to the environment is always one of the focuses of vision robots. Existing monocular and binocular vision SLAM systems have narrow FoV and are fragile in textureless environments with degenerated accuracy and limited robustness. Thus multi-camera SLAM systems are gaining attention because they can provide redundancy for texture degeneration with wide FoV. However, current multi-camera SLAM systems face massive data processing pressure and elaborately designed camera configurations, leading to estimation failures for arbitrarily arranged multi-camera systems. To address these problems, we propose a generic visual odometry for arbitrarily arranged multi-cameras, which can achieve metric-scale state estimation with high flexibility in the cameras' arrangement. Specifically, we first design a learning-based feature extraction and tracking framework to shift the pressure of CPU processing of multiple video streams. Then we use the rigid constraints between cameras to estimate the metric scale poses for robust SLAM system initialization. Finally, we fuse the features of the multi-cameras in the SLAM back-end to achieve robust pose estimation and online scale optimization. Additionally, multi-camera features help improve the loop detection for pose graph optimization. Experiments on KITTI-360 and MultiCamData datasets validate the robustness of our method over arbitrarily placed cameras. Compared with other stereo and multi-camera visual SLAM systems, our method obtains higher pose estimation accuracy with better generalization ability. Our codes and online demos are available at url{https://github.com/JunhaoWang615/MCVO}
Problem

Research questions and friction points this paper is trying to address.

Enhances multi-camera SLAM robustness and setup flexibility
Addresses pose scale estimation in arbitrary camera arrangements
Improves feature tracking and loop detection for accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-based feature tracking for multi-camera streams
Metric-scale pose initialization under rigid constraints
Multi-camera feature fusion for robust estimation
🔎 Similar Papers
No similar papers found.
Huai Yu
Huai Yu
Wuhan University
RoboticsRobot VisionSLAM
J
Junhao Wang
School of Electronic Information, Wuhan University, Wuhan, China 430072
Yao He
Yao He
Stanford University
RoboticsSLAMComputer Vision
W
Wen Yang
School of Electronic Information, Wuhan University, Wuhan, China 430072
Gui-Song Xia
Gui-Song Xia
School of Artificial Intelligence, Wuhan University, China
Artificial IntelligenceComputer VisionPhotogrammetryRemote SensingRobotics