TrojanGYM: A Detector-in-the-Loop LLM for Adaptive RTL Hardware Trojan Insertion

πŸ“… 2026-01-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the vulnerability of existing learning-based hardware Trojan detectors, which often overfit to limited trigger patterns and small-scale benchmarks, rendering them ineffective against stealthy Trojans in real-world RTL designs. To overcome this limitation, we propose a novel large language model (LLM)-driven agent framework featuring an adaptive β€œdetector-in-the-loop” generation mechanism. Multiple LLM agents collaboratively optimize Trojan insertion strategies while ensuring functional correctness through constraint-aware syntactic validation and iteratively evading detection via feedback from a graph neural network (GNN)-based detector. Experimental results demonstrate that the generated RTL Trojans achieve evasion rates up to 83.33% against state-of-the-art GNN detectors on benchmark designs including SRAM, AES-128, and UART. Furthermore, our enhanced Robust-GNN4TJ detector significantly improves detection performance, raising the success rate from 0% to 60%.

Technology Category

Application Category

πŸ“ Abstract
Hardware Trojans (HTs) remain a critical threat because learning-based detectors often overfit to narrow trigger/payload patterns and small, stylized benchmarks. We introduce TrojanGYM, an agentic, LLM-driven framework that automatically curates HT insertions to expose detector blind spots while preserving design correctness. Given high-level HT specifications, a suite of cooperating LLM agents (instantiated with GPT-4, LLaMA-3.3-70B, and Gemini-2.5Pro) proposes and refines RTL modifications that realize diverse triggers and payloads without impacting normal functionality. TrojanGYM implements a feedback-driven benchmark generation loop co-designed with HT detectors, in which constraint-aware syntactic checking and GNN-based HT detectors provide feedback that iteratively refines HT specifications and insertion strategies to better surface detector blind spots. We further propose Robust-GNN4TJ, a new implementation of the GNN4TJ with improved graph extraction, training robustness, and prediction reliability, especially on LLM-generated HT designs. On the most challenging TrojanGYM-generated benchmarks, Robust-GNN4TJ raises HT detection rates from 0% to 60% relative to a prior GNN-based detector. We instantiate TrojanGYM on SRAM, AES-128, and UART designs at RTL level, and show that it systematically produces diverse, functionally correct HTs that reach up to 83.33% evasion rates against modern GNN-based detectors, revealing robustness gaps that are not apparent when these detectors are evaluated solely on existing TrustHub-style benchmarks. Post peer-review, we will release all codes and artifacts.
Problem

Research questions and friction points this paper is trying to address.

Hardware Trojans
LLM-driven framework
detector blind spots
RTL modifications
benchmark generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware Trojan
LLM-driven framework
Adaptive benchmark generation
GNN-based detection
RTL security
πŸ”Ž Similar Papers
No similar papers found.
S
Saideep Sreekumar
NYU Abu Dhabi, Abu Dhabi, UAE
Zeng Wang
Zeng Wang
New York University
Hardware SecurityLogic Locking
A
Akashdeep Saha
NYU Abu Dhabi, Abu Dhabi, UAE
W
Weihua Xiao
NYU Tandon School of Engineering, New York, USA
M
Minghao Shao
NYU Tandon School of Engineering, New York, USA
Muhammad Shafique
Muhammad Shafique
Professor, ECE, New York University (AD-UAE, Tandon-USA), Director eBRAIN Lab
Embedded Machine LearningBrain-Inspired ComputingRobust & Energy-Efficient System DesignSmart
Ozgur Sinanoglu
Ozgur Sinanoglu
Professor of Electrical and Computer Engineering, New York University Abu Dhabi
Hardware Security
R
Ramesh Karri
NYU Tandon School of Engineering, New York, USA
J
J. Knechtel
NYU Abu Dhabi, Abu Dhabi, UAE