🤖 AI Summary
To address semantic loss, limited graph neural network (GNN) receptive fields, and coarse-grained localization in RTL-level hardware Trojan (HT) detection, this paper proposes the first fine-grained HT detection framework based on RTL-finetuned large language models (LLMs). Our method directly extracts module- and line-level semantic features from RTL source code and integrates them with dataflow graphs to preserve both global context and local structural information. We introduce TrojanInS, a large-scale, synthetically generated dataset with fine-grained annotations, enabling multi-class, effect-oriented HT detection. Experiments demonstrate state-of-the-art performance: 0.99 F1-score for module-level detection (up to +0.68 over baselines), 0.84 macro-F1 for HT type classification, and 0.93 macro-F1 for line-level localization—significantly enhancing precise HT identification. This work pioneers the adaptation of LLMs to RTL for hardware security analysis, establishing a new paradigm for semantic-aware, multi-granularity, and high-accuracy HT detection.
📝 Abstract
Hardware Trojans (HT s) are a persistent threat to integrated circuits, especially when inserted at the register-transfer level (RTL). Existing methods typically first convert the design into a graph, such as a gate-level netlist or an RTL-derived dataflow graph (DFG), and then use a graph neural network (GNN ) to obtain an embedding of that graph, which (i) loses compact RTL semantics, (ii) relies on shallow GNNs with limited receptive field, and (iii) is largely restricted to coarse, module-level binary HT detection. We propose TrojanLoC, an LLM-based framework for RTL-level HT localization. We use an RTL-finetuned LLM to derive module-level and line-level embeddings directly from RTL code, capturing both global design context and local semantics. Next, we train task-specific classifiers on these embeddings to perform module-level Trojan detection, type prediction, and fine-grained line-level localization. We also introduce TrojanInS, a large synthetic dataset of RTL designs with systematically injected Trojans from four effect-based categories, each accompanied by precise line-level annotations. Our experiments show that TrojanLoC achieves strong module-level performance, reaching 0.99 F1-score for Trojan detection, up to 0.68 higher than baseline, and 0.84 macro-F1 for Trojan-type classification. At the line level, TrojanLoc further achieves up to 0.93 macro-F1, enabling fine-grained localization of Trojan-relevant RTL lines