🤖 AI Summary
This work addresses the vulnerability of large language models (LLMs) to data poisoning attacks during fine-tuning for RTL code generation, where unverified training data can lead to the synthesis of hardware circuits containing malicious trojans. To mitigate this risk, the authors propose SafeTune, a novel framework that, without altering the underlying model architecture, integrates graph neural networks with textual embeddings to jointly model structural anomalies in circuit graphs and assess the semantic safety of input prompts. This dual-modality approach enables effective identification and filtering of poisoned training samples while preserving legitimate data. Experimental results demonstrate that SafeTune significantly enhances the robustness of LLMs against data poisoning and improves the reliability of generated hardware code, establishing a new paradigm for secure and trustworthy RTL synthesis.
📝 Abstract
As large language models (LLMs) are increasingly fine-tuned for hardware tasks like RTL code generation, the scarcity of high-quality datasets often leads to the use of rapidly assembled or generated training data. These datasets frequently lack security verification and are highly susceptible to data poisoning attacks. Such poisoning can cause models to generate syntactically valid but insecure hardware modules that bypass standard functionality checks. To address this, we present SafeTune, a framework designed to harden LLM-based RTL generation against poisoning, specifically focusing on hardware Trojan (HT) insertion. SafeTune integrates two core components: (i) a Graph Neural Network (GNN) that models structural properties to identify anomalous circuitry patterns during fine-tuning, and (ii) a semantic verification module using text embeddings and an XGBoost classifier to assess prompt security. By coupling structural and semantic knowledge, SafeTune effectively filters poisoned inputs without sacrificing legitimate data. Experimental results demonstrate that SafeTune significantly enhances the robustness and reliability of LLM fine-tuning without requiring modifications to the underlying model architecture.