SVII-3D: Advancing Roadside Infrastructure Inventory with Decimeter-level 3D Localization and Comprehension from Sparse Street Imagery

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the challenges of inaccurate localization, unstable recognition, and coarse-grained state understanding in digital twin construction of roadside infrastructure from sparse street-view imagery. To overcome these limitations, the authors propose SVII-3D, a unified framework that integrates open-set object detection fine-tuned with LoRA, a spatial attention-based matching network, geometry-guided optimization, and a multimodal prompt-driven vision-language model. This synergistic approach enables high-fidelity 3D reconstruction and fine-grained condition assessment directly from sparse image inputs. The method substantially improves asset recognition accuracy and reduces 3D localization error to the decimeter level, offering a cost-effective, scalable, and high-precision digital solution for intelligent infrastructure operation and maintenance.

Technology Category

Application Category

📝 Abstract

The automated creation of digital twins and precise asset inventories is a critical task in smart city construction and facility lifecycle management. However, utilizing cost-effective sparse imagery remains challenging due to limited robustness, inaccurate localization, and a lack of fine-grained state understanding. To address these limitations, SVII-3D, a unified framework for holistic asset digitization, is proposed. First, LoRA fine-tuned open-set detection is fused with a spatial-attention matching network to robustly associate observations across sparse views. Second, a geometry-guided refinement mechanism is introduced to resolve structural errors, achieving precise decimeter-level 3D localization. Third, transcending static geometric mapping, a Vision-Language Model agent leveraging multi-modal prompting is incorporated to automatically diagnose fine-grained operational states. Experiments demonstrate that SVII-3D significantly improves identification accuracy and minimizes localization errors. Consequently, this framework offers a scalable, cost-effective solution for high-fidelity infrastructure digitization, effectively bridging the gap between sparse perception and automated intelligent maintenance.

Problem

Research questions and friction points this paper is trying to address.

3D localization

infrastructure inventory

sparse imagery

digital twin

asset digitization

Innovation

Methods, ideas, or system contributions that make the work stand out.

decimeter-level 3D localization

sparse street imagery

open-set detection