🤖 AI Summary
This work addresses the growing susceptibility of digital circuits to soft errors at advanced technology nodes, where traditional full-chip hardening incurs prohibitive overhead and existing selective approaches rely on time-consuming simulations and manual intervention, hindering efficient reliability optimization at the RTL stage. To overcome these limitations, the paper proposes FT-Pilot, a novel framework that for the first time integrates graph neural networks (GNNs) and large language models (LLMs) within an end-to-end automated pipeline. Leveraging dual knowledge-base retrieval-augmented generation (RAG) and an automatic repair mechanism, FT-Pilot enables seamless identification of vulnerable modules and synthesis of fault-tolerant RTL code. Experimental results across multiple benchmark circuits demonstrate that the framework automatically generates syntactically correct, functionally accurate, and synthesizable hardened RTL, significantly reducing output error rates and validating the feasibility of left-shifting reliability optimization to early design phases.
📝 Abstract
As integrated circuit technologies continue to scale toward advanced process nodes, the continual reduction in node capacitance and supply voltage has made digital systems increasingly vulnerable to soft errors. Although traditional full-chip hardening methods can improve reliability, they often incur unacceptable area and power overhead, making selective hardening a more practical engineering solution. However, existing approaches typically rely on time-consuming fault-injection simulation to determine hardening locations through vulnerability analysis, and still depend heavily on manual strategy selection and RTL modification during the hardening stage, making them ill-suited for efficient automated reliability optimization at early design stages. To address these challenges, this paper proposes FT-Pilot, a GNN-guided LLM framework for automatic RTL soft-error hardening. The framework first employs a GNN to identify critical vulnerable assets directly at the RTL level, and then introduces an LLM-driven rewriting engine composed of an analyzer and a rewriter, which performs RTL-level fault-tolerant code rewriting with the support of dual-knowledge-base retrieval-augmented generation and an automatic repair mechanism. Experimental results show that the proposed framework can automatically generate hardened RTL designs that are syntactically correct, functionally correct, and synthesizable across multiple benchmark circuits, while significantly reducing output error rates under soft-error scenarios. This work provides a practical automated path toward shift-left reliability optimization at the RTL level.