LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Logic locking (LL) suffers from persistent reproducibility challenges and a severe lack of open-source implementations. This paper introduces the first multi-agent large language model (MLLM) framework specifically designed for LL, enabling end-to-end automated generation of executable and verifiable code directly from methodological descriptions in research papers. Methodologically, we propose a holistic pipeline comprising forward-looking modeling, iterative refinement, and multi-stage verification, and pioneer a dual-layer evaluation mechanism: LLM-as-Judge (for semantic correctness assessment) and LLM-as-Examiner (for functional equivalence verification). Our framework integrates reasoning-based LLMs, behavioral comparison, structural analysis, and benchmarking to establish a closed-loop feedback optimization pipeline. We successfully generate implementations for 10 mainstream LL schemes—most of which are open-sourced for the first time—and release all code alongside a standardized benchmark suite, substantially enhancing reproducibility and enabling fair, rigorous evaluation across the field.

Technology Category

Application Category

📝 Abstract

Despite rapid progress in logic locking (LL), reproducibility remains a challenge as codes are rarely made public. We present LockForge, a first-of-its-kind, multi-agent large language model (LLM) framework that turns LL descriptions in papers into executable and tested code. LockForge provides a carefully crafted pipeline realizing forethought, implementation, iterative refinement, and a multi-stage validation, all to systematically bridge the gap between prose and practice for complex LL schemes. For validation, we devise (i) an LLM-as-Judge stage with a scoring system considering behavioral checks, conceptual mechanisms, structural elements, and reproducibility on benchmarks, and (ii) an independent LLM-as-Examiner stage for ground-truth assessment. We apply LockForge to 10 seminal LL schemes, many of which lack reference implementations. Our evaluation on multiple SOTA LLMs, including ablation studies, reveals the significant complexity of the task. We show that an advanced reasoning model and a sophisticated, multi-stage framework like LockForge are required. We release all implementations and benchmarks, providing a reproducible and fair foundation for evaluation of further LL research.

Problem

Research questions and friction points this paper is trying to address.

Automating conversion of logic locking descriptions into executable code

Addressing reproducibility challenges in logic locking research

Validating implementations through multi-stage LLM assessment frameworks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent LLM framework automating paper-to-code conversion

Pipeline with forethought, implementation, refinement and validation stages

LLM-as-Judge and LLM-as-Examiner for multi-stage validation

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation