AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A systematic multimodal large language model (MLLM) benchmark for analog/mixed-signal (AMS) circuit design is currently lacking. Method: This work introduces AMS-Bench, the first comprehensive MLLM evaluation benchmark for the AMS domain, covering three core tasks—schematic understanding, circuit analysis, and circuit design—with ~8,000 cross-difficulty questions. It formally defines AMS-specific MLLM capability dimensions and proposes a circuit-knowledge-aware, multi-granularity evaluation framework. A heterogeneous multimodal evaluation pipeline is built, integrating schematics, netlists, and textual descriptions, leveraging models including Qwen 2.5-VL and Gemini 2.5 Pro. Contribution/Results: Evaluation of eight state-of-the-art MLLMs reveals <35% accuracy on advanced design tasks, exposing fundamental bottlenecks in complex multimodal reasoning and generation. The full dataset is open-sourced to advance standardized assessment in the AMS+AI interdisciplinary field.

Technology Category

Application Category

📝 Abstract
Analog/Mixed-Signal (AMS) circuits play a critical role in the integrated circuit (IC) industry. However, automating Analog/Mixed-Signal (AMS) circuit design has remained a longstanding challenge due to its difficulty and complexity. Recent advances in Multi-modal Large Language Models (MLLMs) offer promising potential for supporting AMS circuit analysis and design. However, current research typically evaluates MLLMs on isolated tasks within the domain, lacking a comprehensive benchmark that systematically assesses model capabilities across diverse AMS-related challenges. To address this gap, we introduce AMSbench, a benchmark suite designed to evaluate MLLM performance across critical tasks including circuit schematic perception, circuit analysis, and circuit design. AMSbench comprises approximately 8000 test questions spanning multiple difficulty levels and assesses eight prominent models, encompassing both open-source and proprietary solutions such as Qwen 2.5-VL and Gemini 2.5 Pro. Our evaluation highlights significant limitations in current MLLMs, particularly in complex multi-modal reasoning and sophisticated circuit design tasks. These results underscore the necessity of advancing MLLMs' understanding and effective application of circuit-specific knowledge, thereby narrowing the existing performance gap relative to human expertise and moving toward fully automated AMS circuit design workflows. Our data is released at https://huggingface.co/datasets/wwhhyy/AMSBench
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLM capabilities in AMS circuit design automation
Lack of comprehensive benchmark for diverse AMS-related tasks
Assessing MLLM limitations in multi-modal reasoning and circuit design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces AMSbench for MLLM evaluation
Covers circuit perception, analysis, design
Tests 8000 questions on 8 models
🔎 Similar Papers
No similar papers found.
Y
Yichen Shi
Eastern Institute of Technology, Ningbo
Ze Zhang
Ze Zhang
Ph.D. Student, Chalmers
RoboticsMotion predictionDeep learningControl
H
Hongyang Wang
Eastern Institute of Technology, Ningbo
Z
Zhuofu Tao
University of California, Los Angeles
Z
Zhongyi Li
Eastern Institute of Technology, Ningbo
B
Bingyu Chen
University of California, Los Angeles
Y
Yaxin Wang
University of California, Los Angeles
Zhiping Yu
Zhiping Yu
Tsinghua University
T
Ting-Jung Lin
Eastern Institute of Technology, Ningbo
L
Lei He
University of California, Los Angeles, Eastern Institute of Technology, Ningbo