ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering?

📅 2024-11-27

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Current multimodal large language models (MLLMs) exhibit poor performance on visual question answering (VQA) in engineering domains—particularly digital electronics—due to the absence of domain-specific benchmarks and high-quality, expert-annotated data. Method: We introduce ElectroVizQA, the first VQA benchmark tailored to digital electronics, comprising 626 undergraduate-level circuit problems with precisely annotated schematics, questions, and multi-step reasoning answers. We propose a circuit diagram–text joint understanding evaluation framework, along with a standardized assessment protocol and fine-grained metrics. Contribution/Results: Our experiments systematically expose critical deficiencies of state-of-the-art MLLMs across core competencies—including logic gate identification, truth table derivation, and sequential circuit analysis—for the first time. ElectroVizQA provides a reproducible, extensible, domain-specific benchmark that fills a key gap in multimodal model evaluation for engineering education, enabling rigorous domain adaptation and advancing trustworthy AI applications in technical pedagogy.

Technology Category

Application Category

📝 Abstract

Multi-modal Large Language Models (MLLMs) are gaining significant attention for their ability to process multi-modal data, providing enhanced contextual understanding of complex problems. MLLMs have demonstrated exceptional capabilities in tasks such as Visual Question Answering (VQA); however, they often struggle with fundamental engineering problems, and there is a scarcity of specialized datasets for training on topics like digital electronics. To address this gap, we propose a benchmark dataset called ElectroVizQA specifically designed to evaluate MLLMs' performance on digital electronic circuit problems commonly found in undergraduate curricula. This dataset, the first of its kind tailored for the VQA task in digital electronics, comprises approximately 626 visual questions, offering a comprehensive overview of digital electronics topics. This paper rigorously assesses the extent to which MLLMs can understand and solve digital electronic circuit questions, providing insights into their capabilities and limitations within this specialized domain. By introducing this benchmark dataset, we aim to motivate further research and development in the application of MLLMs to engineering education, ultimately bridging the performance gap and enhancing the efficacy of these models in technical fields.

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs' performance on digital electronics visual questions

Addressing scarcity of specialized VQA datasets for engineering education

Assessing MLLMs' capabilities and limitations in technical circuit problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed ElectroVizQA benchmark dataset

Evaluates MLLMs on digital electronics circuits

Contains 626 specialized visual questions

🔎 Similar Papers

No similar papers found.