HSCO-Bench: An Agent-Driven End-to-End Hardware-Software Co-design Benchmark for Systems-on-Chip

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Existing benchmarks struggle to evaluate the end-to-end capabilities of large language models (LLMs) in system-level hardware-software co-design, often assessing hardware and software components in isolation. This work introduces the first benchmark that encompasses the full co-design workflow, requiring an LLM agent to analyze applications, design heterogeneous accelerators, map kernel functions, and deploy a complete system-on-chip (SoC) prototype on an AMD VC707 FPGA. Built upon an open-source SoC platform and a structured repository, the benchmark enables LLMs to jointly reason about and modify both hardware and software stacks. Experimental results show that among five state-of-the-art models, only two successfully generated functional prototypes, achieving a peak speedup of 16.22×, yet with a maximum resource utilization of merely 23.67%, indicating that current LLMs have not yet fully harnessed the potential of hardware acceleration.

📝 Abstract

Large language models (LLMs) are adopted for software and hardware design, yet these domains are still evaluated separately. Software benchmarks typically assume fixed hardware targets, while hardware benchmarks focus on component-level optimization without considering the full hardware-software stack. Consequently, no existing benchmark evaluates whether an LLM agent can perform end-to-end, system-level hardware-software co-design. Such a process requires: 1) analyzing applications to identify kernels requiring acceleration, 2) designing and integrating heterogeneous accelerators into a System-on-Chip (SoC) under resource constraints, and 3) mapping kernels onto the generated accelerators. We present HSCO-Bench, an end-to-end hardware-software co-design benchmark for accelerator-rich heterogeneous SoC generation. Built upon an open-source SoC platform with a curated repository structure, HSCO-Bench evaluates the ability of LLMs to jointly optimize software and hardware stacks, producing SoC prototypes deployed on the AMD Virtex-7 FPGA VC707 Evaluation Kit. Experimental results show that end-to-end integration remains challenging for current models. Among the five frontier models evaluated, only two of them could successfully generate valid SoC prototypes. Yet, even in these successful instances, the generated designs are far from optimal. While we observe a promising peak speedup of 16.22X, the maximum additional resource utilization reaches only 23.67%. This highlights that while state-of-the-art models demonstrate an emerging capability for hardware acceleration, they still heavily underutilize the available hardware capacity, leaving room for future optimization. To the best of our knowledge, HSCO-Bench is the first benchmark targeting this complete co-design flow, enabling LLMs to jointly reason about and modify both the software and hardware stacks of heterogeneous SoCs.

Problem

Research questions and friction points this paper is trying to address.

hardware-software co-design

System-on-Chip

large language models

accelerator integration

end-to-end benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

hardware-software co-design

large language models

System-on-Chip