DiagramNet: An End-to-End Recognition Framework and Dataset for Non-Standard System-Level Diagrams

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge of recognizing non-standardized system-on-chip (SoC) schematics, which existing multimodal large language models struggle with due to inconsistent symbols and scarce structured data. To tackle this, the authors introduce DiagramNet—the first multimodal dataset for system-level circuit diagrams—comprising 10,977 annotated connections and 15,515 chain-of-thought question-answer pairs. They further propose a staged end-to-end recognition framework that decouples visual reasoning into perception, reasoning, and knowledge phases, integrating a 3B-parameter model, progressive training, and a multi-agent workflow. Evaluated on DiagramNet, their approach outperforms the 2025 EDA Elite Challenge champion and models such as GPT-5 by more than twofold in key metrics. Notably, it achieves effective zero-shot transfer to AMSBench using only 60 images, demonstrating connection inference capabilities on par with state-of-the-art models.

📝 Abstract

System-level diagrams encode the architectural blueprint of chip design, specifying module functions, dataflows, and interface protocols. However, non-standardized symbols and the scarcity of structured training data hinder existing multimodal large language models (MLLMs) from recognizing these diagrams. To address this gap, we introduce DiagramNet, the first multimodal dataset for system-level diagrams, comprising 10,977 connection annotations and 15,515 chain-of-thought QA pairs across four tasks: Listing, Localization, Connection, and Circuit QA. Building on this dataset, we propose a progressive training pipeline together with a decoupled multi-agent workflow that decomposes complex visual reasoning into Perception, Reasoning, and Knowledge stages. On the DiagramNet benchmark, integrating our 3B-parameter model with the proposed workflow surpasses the 2025 EDA Elite Challenge winner and outperforms GPT-5, Claude-Sonnet-4, and Gemini-2.5-Pro by over 2x in end-to-end evaluation. Notably, the workflow generalizes beyond our model, boosting Task 1 performance by 128.7x for Gemini-2.5-Pro and 12.4x for GPT-5. Furthermore, with only 60 images for detector adaptation, the method transfers effectively to AMSBench, achieving zero-shot connectivity reasoning on par with GPT-5 and Claude-Sonnet-4 while surpassing the AMS state-of-the-art method Netlistify.

Problem

Research questions and friction points this paper is trying to address.

system-level diagrams

non-standardized symbols

structured training data

diagram recognition

multimodal learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

DiagramNet

system-level diagrams

multimodal dataset