SVII-3D: Advancing Roadside Infrastructure Inventory with Decimeter-level 3D Localization and Comprehension from Sparse Street Imagery

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of inaccurate localization, unstable recognition, and coarse-grained state understanding in digital twin construction of roadside infrastructure from sparse street-view imagery. To overcome these limitations, the authors propose SVII-3D, a unified framework that integrates open-set object detection fine-tuned with LoRA, a spatial attention-based matching network, geometry-guided optimization, and a multimodal prompt-driven vision-language model. This synergistic approach enables high-fidelity 3D reconstruction and fine-grained condition assessment directly from sparse image inputs. The method substantially improves asset recognition accuracy and reduces 3D localization error to the decimeter level, offering a cost-effective, scalable, and high-precision digital solution for intelligent infrastructure operation and maintenance.

Technology Category

Application Category

📝 Abstract
The automated creation of digital twins and precise asset inventories is a critical task in smart city construction and facility lifecycle management. However, utilizing cost-effective sparse imagery remains challenging due to limited robustness, inaccurate localization, and a lack of fine-grained state understanding. To address these limitations, SVII-3D, a unified framework for holistic asset digitization, is proposed. First, LoRA fine-tuned open-set detection is fused with a spatial-attention matching network to robustly associate observations across sparse views. Second, a geometry-guided refinement mechanism is introduced to resolve structural errors, achieving precise decimeter-level 3D localization. Third, transcending static geometric mapping, a Vision-Language Model agent leveraging multi-modal prompting is incorporated to automatically diagnose fine-grained operational states. Experiments demonstrate that SVII-3D significantly improves identification accuracy and minimizes localization errors. Consequently, this framework offers a scalable, cost-effective solution for high-fidelity infrastructure digitization, effectively bridging the gap between sparse perception and automated intelligent maintenance.
Problem

Research questions and friction points this paper is trying to address.

3D localization
infrastructure inventory
sparse imagery
digital twin
asset digitization
Innovation

Methods, ideas, or system contributions that make the work stand out.

decimeter-level 3D localization
sparse street imagery
open-set detection
geometry-guided refinement
vision-language model
Chong Liu
Chong Liu
Wuhan University
3D Computer VisionLaser Scanning PointCloud Compression
L
Luxuan Fu
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China
Y
Yang Jia
Sichuan Highway Planning, Survey, Design and Research Institute Ltd, Chengdu 610000, China
Zhen Dong
Zhen Dong
Wuhan University
3D Computer VisionIntelligent Transportation SystemUrban Sustainable Development
B
Bisheng Yang
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS), Wuhan University, Wuhan 430079, China