Agentic Code Reasoning

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the challenge of enhancing large language model agents’ semantic understanding and reasoning over codebases without executing the code. It proposes a semi-formal reasoning approach that requires agents to explicitly construct premises, trace execution paths, and derive formal conclusions. By leveraging structured prompts and verifiable reasoning certificates, the method prevents unjustified leaps or unsupported assertions. The approach significantly improves the reliability of code reasoning in execution-free settings, achieving strong performance on practical tasks: 93% accuracy in patch equivalence verification, a 5-percentage-point gain in top-5 accuracy for bug localization, and 87% accuracy on code-related question answering—results that closely meet real-world application requirements.

Technology Category

Application Category

📝 Abstract

Can LLM agents explore codebases and reason about code semantics without executing the code? We study this capability, which we call agentic code reasoning, and introduce semi-formal reasoning: a structured prompting methodology that requires agents to construct explicit premises, trace execution paths, and derive formal conclusions. Unlike unstructured chain-of-thought, semi-formal reasoning acts as a certificate: the agent cannot skip cases or make unsupported claims. We evaluate across three tasks (patch equivalence verification, fault localization, and code question answering) and show that semi-formal reasoning consistently improves accuracy on all of them. For patch equivalence, accuracy improves from 78% to 88% on curated examples and reaches 93% on real-world agent-generated patches, approaching the reliability needed for execution-free RL reward signals. For code question answering on RubberDuckBench Mohammad et al. (2026), semi-formal reasoning achieves 87% accuracy. For fault localization on Defects4J Just et al. (2014), semi-formal reasoning improves Top-5 accuracy by 5 percentage points over standard reasoning. These results demonstrate that structured agentic reasoning enables meaningful semantic code analysis without execution, opening practical applications in RL training pipelines, code review, and static program analysis.

Problem

Research questions and friction points this paper is trying to address.

agentic code reasoning

code semantics

execution-free reasoning

static program analysis

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic code reasoning

semi-formal reasoning

execution-free code analysis