QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture

📅 2025-01-03

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

To address the lack of high-quality, domain-specific evaluation benchmarks for large language models (LLMs) in computer architecture understanding, this paper introduces QuArch—the first fine-grained, human-expert-annotated, and multi-round-validated question-answering dataset tailored to architecture. Comprising 1,500 QA pairs, QuArch covers core topics including processor design, memory systems, and performance optimization. Methodologically, it employs a rigorous human-in-the-loop annotation protocol and adopts standard QA accuracy as the primary evaluation metric, supporting both supervised fine-tuning and zero-/few-shot assessment. Key contributions include: (1) establishing the first human-verified, architecture-specific QA benchmark; (2) revealing substantial capability gaps between state-of-the-art closed- and open-source small models—particularly in memory systems (84% vs. 72% accuracy); and (3) demonstrating that supervised fine-tuning improves small-model accuracy by up to 8 percentage points. The dataset and an online evaluation platform are publicly released.

Technology Category

Application Category

📝 Abstract

We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.

Problem

Research questions and friction points this paper is trying to address.

Computer Architecture

AI Language Models

Question-Answering Dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

QuArch

Computer Architecture Understanding

Language Model Enhancement

🔎 Similar Papers

No similar papers found.