๐ค AI Summary
RTL program repair remains a critical bottleneck in hardware verification: conventional template-based approaches suffer from limited coverage, while pure large language models (LLMs) are hindered by stochasticity and interference from long contextual inputs. This work proposes a neurosymbolic agent framework that formulates repair as a structured search process, dynamically dispatching subtasks to specialized LLM agents or symbolic solvers. To balance exploration and exploitation, the framework incorporates a stochastic tree-of-thoughts mechanism. Integrated with an RTL-specific toolbox and an interactive debugging environment, the method achieves a 96.8% bug-fixing rate on standard benchmarks, improving coverage by 94% and 63% over traditional and pure-LLM baselines, respectively, with an average pass@1 accuracy of 87.5%.
๐ Abstract
RTL program repair remains a critical bottleneck in hardware design and verification. Traditional automatic program repair (APR) methods rely on predefined templates and synthesis, limiting their bug coverage. Large language models (LLMs) and coding agents based on them offer flexibility but suffer from randomness and context corruption when handling long RTL code and waveforms. We present Clover, a neural-symbolic agentic harness that orchestrates RTL repair as a structured search over code manipulations to explore a validated solution for the bug. Recognizing that different repair operations favor distinct strategies, Clover dynamically dispatches tasks to specialized LLM agents or symbolic solvers. At its core, Clover introduces stochastic tree-of-thoughts, a test-time scaling mechanism that manages the main agent's context as a search tree, balancing exploration and exploitation for reliable outcomes. An RTL-specific toolbox further empowers agents to interact with the debugging environment. Evaluated on the RTL-repair benchmark, Clover fixes 96.8% of bugs within a fixed time limit, covering 94% and 63% more bugs than both pure traditional and LLM-based baselines, respectively, while achieving an average pass@1 rate of 87.5%, demonstrating high reliability and effectiveness.