ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering?

📅 2024-11-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) exhibit poor performance on visual question answering (VQA) in engineering domains—particularly digital electronics—due to the absence of domain-specific benchmarks and high-quality, expert-annotated data. Method: We introduce ElectroVizQA, the first VQA benchmark tailored to digital electronics, comprising 626 undergraduate-level circuit problems with precisely annotated schematics, questions, and multi-step reasoning answers. We propose a circuit diagram–text joint understanding evaluation framework, along with a standardized assessment protocol and fine-grained metrics. Contribution/Results: Our experiments systematically expose critical deficiencies of state-of-the-art MLLMs across core competencies—including logic gate identification, truth table derivation, and sequential circuit analysis—for the first time. ElectroVizQA provides a reproducible, extensible, domain-specific benchmark that fills a key gap in multimodal model evaluation for engineering education, enabling rigorous domain adaptation and advancing trustworthy AI applications in technical pedagogy.

Technology Category

Application Category

📝 Abstract
Multi-modal Large Language Models (MLLMs) are gaining significant attention for their ability to process multi-modal data, providing enhanced contextual understanding of complex problems. MLLMs have demonstrated exceptional capabilities in tasks such as Visual Question Answering (VQA); however, they often struggle with fundamental engineering problems, and there is a scarcity of specialized datasets for training on topics like digital electronics. To address this gap, we propose a benchmark dataset called ElectroVizQA specifically designed to evaluate MLLMs' performance on digital electronic circuit problems commonly found in undergraduate curricula. This dataset, the first of its kind tailored for the VQA task in digital electronics, comprises approximately 626 visual questions, offering a comprehensive overview of digital electronics topics. This paper rigorously assesses the extent to which MLLMs can understand and solve digital electronic circuit questions, providing insights into their capabilities and limitations within this specialized domain. By introducing this benchmark dataset, we aim to motivate further research and development in the application of MLLMs to engineering education, ultimately bridging the performance gap and enhancing the efficacy of these models in technical fields.
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' performance on digital electronics visual questions
Addressing scarcity of specialized VQA datasets for engineering education
Assessing MLLMs' capabilities and limitations in technical circuit problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed ElectroVizQA benchmark dataset
Evaluates MLLMs on digital electronics circuits
Contains 626 specialized visual questions
🔎 Similar Papers
No similar papers found.
P
Pragati Shuddhodhan Meshram
University of Illinois Urbana-Champaign
S
Swetha Karthikeyan
University of Illinois Urbana-Champaign
Bhavya
Bhavya
Research Scientist, IBM Research
AINatural Language ProcessingText Mining
Suma Bhat
Suma Bhat
University of Illinois at Urbana-Champaign
natural language processingeducational applications of AI