DebugHarness: Emulating Human Dynamic Debugging for Autonomous Program Repair

📅 2026-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current large language models (LLMs) in repairing deep, low-level vulnerabilities such as use-after-free errors, which stem from their reliance on static code generation and lack of dynamic execution context. To overcome this, the paper introduces DebugHarness, the first LLM-driven autonomous debugging agent that integrates an interactive, dynamic debugging mechanism. By querying program state at runtime, generating hypotheses, and iteratively validating them in a closed loop, DebugHarness emulates the reasoning process of human engineers. This approach transcends conventional static repair paradigms by establishing a novel framework that bridges static reasoning with dynamic system behavior. Evaluated on SEC-bench—a real-world benchmark of C/C++ vulnerabilities—DebugHarness achieves a repair success rate of approximately 90%, representing a relative improvement of over 30% compared to the current state-of-the-art baseline.
📝 Abstract
Patching severe security flaws in complex software remains a major challenge. While automated tools like fuzzers efficiently discover bugs, fixing deep-rooted low-level faults (e.g., use-after-free and memory corruption) still requires labor-intensive manual analysis by experts. Emerging Large Language Model (LLM) agents attempt to automate this pipeline, but they typically treat bug fixing as a purely static code-generation task. Relying solely on static artifacts, these methods miss the dynamic execution context strictly necessary for diagnosing intricate memory safety violations. To overcome these limitations, we introduce DebugHarness, an autonomous LLM-powered debugging agent harness that resolves complex vulnerabilities by emulating the interactive debugging practices of human systems engineers. Instead of merely examining static code, DebugHarness actively queries the live runtime environment. Driven by a reproducible crash, it utilizes a pattern-guided investigation strategy to formulate hypotheses, interactively probes program memory states and execution paths, and synthesizes patches via a closed-loop validation cycle. We evaluate DebugHarness on SEC-bench, a rigorous dataset of real-world C/C++ security vulnerabilities. DebugHarness successfully patches approximately 90% of the evaluated bugs. This yields a relative improvement of over 30% compared to state-of-the-art baselines, demonstrating that dynamic debugging significantly enhances LLM diagnostic capabilities. Overall, DebugHarness establishes a novel paradigm for automated program repair, bridging the gap between static LLM reasoning and the dynamic intricacies of low-level systems programming.
Problem

Research questions and friction points this paper is trying to address.

automated program repair
dynamic debugging
memory safety
LLM agents
security vulnerabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic debugging
LLM agent
automated program repair
memory safety
interactive debugging
🔎 Similar Papers
No similar papers found.