π€ AI Summary
This work addresses the frequent failures of electronic design automation (EDA) code generated by large language models (LLMs), which often arise from violations of implicit structural dependencies among design entitiesβsuch as invalid paths, missing preconditions, or API incompatibilities. To overcome the high latency and poor scalability of existing tool-in-the-loop debugging approaches, the authors propose a novel framework for reliable code generation that operates without runtime feedback. The key innovation lies in explicitly modeling structural dependencies as execution contracts and guiding a validator-driven synthesis process via a structural dependency graph. This approach integrates graph-conditioned retrieval, constraint generation, and staged pre-execution validation. Empirical results demonstrate a single-step task pass rate of 82.5%, an improvement in multi-step task success from 30.0% to 84.0%, over twofold reduction in tool invocations, and a validator precision of 93.3% (6.7% false positive rate).
π Abstract
Large language models (LLMs) have enabled natural-language-driven automation of electronic design automation (EDA) workflows, but reliable execution of generated scripts remains a fundamental challenge. In LLM-based EDA tasks, failures arise not from syntax errors but from violations of implicit structural dependencies over design objects, including invalid acquisition paths, missing prerequisites, and incompatible API usage. Existing approaches address these failures through tool-in-the-loop debugging, repeatedly executing and repairing programs using runtime feedback. While effective, this paradigm couples correctness to repeated tool invocation, leading to high latency and poor scalability in multi-step settings. We propose to eliminate tool-in-the-loop debugging by enforcing structural correctness prior to execution. Each task is represented as a structural dependency graph that serves as an explicit execution contract, and a verifier-guided synthesis framework enforces this contract through graph-conditioned retrieval, constrained generation, and staged pre-execution verification with diagnosis-driven repair. On single-step tasks, our method improves pass rate from 73.0% (LLM+RAG) and 76.0% (tool-in-loop) to 82.5%, while requiring exactly one tool call per task and reducing total tool calls by more than 2x. On multi-step tasks, pass rate improves from 30.0% to 70.0%, and further to 84.0% with trajectory-level reflection. Uncertainty-aware filtering further reduces verifier false positives from 20.0% to 6.7% and improves precision from 80.0% to 93.3%. These results show that enforcing structural consistency prior to execution decouples correctness from tool interaction, improving both reliability and efficiency in long-horizon EDA code generation.