Feedback-Driven Execution for LLM-Based Binary Analysis

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Current large language model (LLM)-driven binary analysis approaches predominantly rely on a single-pass execution paradigm, which struggles to dynamically adapt strategies or support long-horizon, multi-path reasoning. This work proposes a feedback-driven execution framework that enables interleaved interaction between the LLM and analysis tools through a “reason–act–observe” loop. It further introduces a dynamic Forest of Agents (FoA) architecture that, while constraining individual agent context length, facilitates parallel and incremental exploration alongside evidence construction. Evaluated on 3,457 real-world firmware binaries, the approach identifies 1,274 vulnerabilities across 591 unique files with a precision of 72.3%, demonstrating broader vulnerability-type coverage than existing methods.

Technology Category

Application Category

📝 Abstract

Binary analysis increasingly relies on large language models (LLMs) to perform semantic reasoning over complex program behaviors. However, existing approaches largely adopt a one-pass execution paradigm, where reasoning operates over a fixed program representation constructed by static analysis tools. This formulation limits the ability to adapt exploration based on intermediate results and makes it difficult to sustain long-horizon, multi-path analysis under constrained context. We present FORGE, a system that rethinks LLM-based analysis as a feedback-driven execution process. FORGE interleaves reasoning and tool interaction through a reasoning-action-observation loop, enabling incremental exploration and evidence construction. To address the instability of long-horizon reasoning, we introduce a Dynamic Forest of Agents (FoA), a decomposed execution model that dynamically coordinates parallel exploration while bounding per-agent context. We evaluate FORGE on 3,457 real-world firmware binaries. FORGE identifies 1,274 vulnerabilities across 591 unique binaries, achieving 72.3% precision while covering a broader range of vulnerability types than prior approaches. These results demonstrate that structuring LLM-based analysis as a decomposed, feedback-driven execution system enables both scalable reasoning and high-quality outcomes in long-horizon tasks.

Problem

Research questions and friction points this paper is trying to address.

LLM-based binary analysis

long-horizon reasoning

multi-path analysis

context constraint

adaptive exploration

Innovation

Methods, ideas, or system contributions that make the work stand out.

feedback-driven execution

Dynamic Forest of Agents

LLM-based binary analysis