Beyond Pixels: Vector-to-Graph Transformation for Reliable Schematic Auditing

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the structural blind spots of multimodal large language models (MLLMs) in interpreting engineering drawings, particularly their inability to capture topological structures and symbolic logic. To overcome this limitation, the authors propose Vector-to-Graph (V2G), a novel approach that explicitly models CAD vector drawings as attributed graphs, where nodes and edges precisely represent components and their interconnections. By moving beyond conventional pixel-driven paradigms, V2G endows MLLMs with engineering-level structural reasoning capabilities. The method is evaluated on an electrical compliance diagnosis benchmark, demonstrating significant performance gains across all error categories compared to existing MLLMs, whose accuracy remains near random levels.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have shown remarkable progress in visual understanding, yet they suffer from a critical limitation: structural blindness. Even state-of-the-art models fail to capture topology and symbolic logic in engineering schematics, as their pixel-driven paradigm discards the explicit vector-defined relations needed for reasoning. To overcome this, we propose a Vector-to-Graph (V2G) pipeline that converts CAD diagrams into property graphs where nodes represent components and edges encode connectivity, making structural dependencies explicit and machine-auditable. On a diagnostic benchmark of electrical compliance checks, V2G yields large accuracy gains across all error categories, while leading MLLMs remain near chance level. These results highlight the systemic inadequacy of pixel-based methods and demonstrate that structure-aware representations provide a reliable path toward practical deployment of multimodal AI in engineering domains. To facilitate further research, we release our benchmark and implementation at https://github.com/gm-embodied/V2G-Audit.
Problem

Research questions and friction points this paper is trying to address.

structural blindness
engineering schematics
topology
symbolic logic
multimodal LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vector-to-Graph
structural awareness
property graph
schematic auditing
multimodal LLMs
🔎 Similar Papers
No similar papers found.
C
Chengwei Ma
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China
Z
Zhen Tian
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China
Z
Zhou Zhou
Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen, China
Z
Zhixian Xu
Guangdong Power Grid Co., Ltd., Yangjiang Power Supply Bureau, Yangjiang, China
Xiaowei Zhu
Xiaowei Zhu
Ant Research
Graph DatabaseBig Data SystemsPrivacy-Preserving ComputationAI Infra
Xia Hua
Xia Hua
Zhejiang University of Technology
ResearchMechanical Engineering
Si Shi
Si Shi
Macao Polytechnic University
Financial AIEducational AIDeep Learning
F. Richard Yu
F. Richard Yu
Carleton University, FRSC, FCAE, MAE, FIEEE, FEIC
Intell.&Auto. Sys.ML&Embodied AIIoTBlockchain