π€ AI Summary
This work addresses the challenge of jointly converging design rule check (DRC) violation repair and power-performance-area (PPA) optimization during electronic design automation (EDA) signoff. To this end, it introduces PostEDA-Benchβthe first hierarchical benchmark enabling machine-verifiable evaluation across four key tasks spanning DRC correction and PPA tuning. Notably, this study is the first to integrate DRC repair into an LLM-driven EDA evaluation framework, leveraging vision-augmented large language models (LLMs) coupled with both commercial and open-source EDA toolchains to construct multi-architecture LLM agents for automated reasoning. Experimental results demonstrate that the best-performing agent achieves a 36.66% success rate on the DRC-Reasoning task and 20.00% on the PPA-Multi task, confirming the efficacy of vision enhancement for DRC repair and revealing that the core bottleneck in multi-objective PPA optimization lies in trade-off reasoning capability.
π Abstract
LLM-based agents are increasingly applied to the "last mile" of Electronic Design Automation (EDA): repairing residual sign-off Design Rule Check (DRC) violations and converging Power-Performance-Area (PPA) targets after tool runs. Existing EDA-LLM benchmarks, however, omit DRC fixing entirely and rely on flat hierarchies tied to a single toolchain. We introduce PostEDA-Bench, a hierarchical benchmark with 145 tasks across DRC-Essential, DRC-Reasoning, PPA-Mono, and PPA-Multi, supported by EDA toolchains with machine-checkable evaluation. Across eight commercial and open-source LLMs under multiple agent scaffolds, we find that agents handle synthetic DRC-Essential and single-objective PPA-Mono reasonably well but degrade sharply on the more practical DRC-Reasoning, where the best success rate is 36.66%, and PPA-Multi, where the best success rate is 20.00%; vision augmentation consistently enhances DRC-Bench; and trade-off reasoning, rather than knob knowledge, is the dominant PPA-Multi bottleneck.