AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

A systematic multimodal large language model (MLLM) benchmark for analog/mixed-signal (AMS) circuit design is currently lacking. Method: This work introduces AMS-Bench, the first comprehensive MLLM evaluation benchmark for the AMS domain, covering three core tasks—schematic understanding, circuit analysis, and circuit design—with ~8,000 cross-difficulty questions. It formally defines AMS-specific MLLM capability dimensions and proposes a circuit-knowledge-aware, multi-granularity evaluation framework. A heterogeneous multimodal evaluation pipeline is built, integrating schematics, netlists, and textual descriptions, leveraging models including Qwen 2.5-VL and Gemini 2.5 Pro. Contribution/Results: Evaluation of eight state-of-the-art MLLMs reveals <35% accuracy on advanced design tasks, exposing fundamental bottlenecks in complex multimodal reasoning and generation. The full dataset is open-sourced to advance standardized assessment in the AMS+AI interdisciplinary field.

Technology Category

Application Category

📝 Abstract

Analog/Mixed-Signal (AMS) circuits play a critical role in the integrated circuit (IC) industry. However, automating Analog/Mixed-Signal (AMS) circuit design has remained a longstanding challenge due to its difficulty and complexity. Recent advances in Multi-modal Large Language Models (MLLMs) offer promising potential for supporting AMS circuit analysis and design. However, current research typically evaluates MLLMs on isolated tasks within the domain, lacking a comprehensive benchmark that systematically assesses model capabilities across diverse AMS-related challenges. To address this gap, we introduce AMSbench, a benchmark suite designed to evaluate MLLM performance across critical tasks including circuit schematic perception, circuit analysis, and circuit design. AMSbench comprises approximately 8000 test questions spanning multiple difficulty levels and assesses eight prominent models, encompassing both open-source and proprietary solutions such as Qwen 2.5-VL and Gemini 2.5 Pro. Our evaluation highlights significant limitations in current MLLMs, particularly in complex multi-modal reasoning and sophisticated circuit design tasks. These results underscore the necessity of advancing MLLMs' understanding and effective application of circuit-specific knowledge, thereby narrowing the existing performance gap relative to human expertise and moving toward fully automated AMS circuit design workflows. Our data is released at https://huggingface.co/datasets/wwhhyy/AMSBench

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLM capabilities in AMS circuit design automation

Lack of comprehensive benchmark for diverse AMS-related tasks

Assessing MLLM limitations in multi-modal reasoning and circuit design

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces AMSbench for MLLM evaluation

Covers circuit perception, analysis, design

Tests 8000 questions on 8 models

🔎 Similar Papers

No similar papers found.

Apple

Sunnyvale, United States of America

Research Scientist Intern, Multimodal AI (PhD)