RepoAudit: An Autonomous LLM-Agent for Repository-Level Code Auditing

📅 2025-01-30

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing large language models (LLMs) suffer from low accuracy, poor scalability, and high computational cost in code repository auditing. This paper introduces the first automated auditing framework tailored to real-world software repositories, overcoming the single-file limitation by enabling cross-function data-flow tracking and path-condition satisfiability verification. Our approach integrates on-demand code exploration with lightweight static analysis, built upon Claude 3.5 Sonnet. Key contributions include: (1) a memory-augmented autonomous LLM agent architecture for iterative reasoning and state retention; and (2) a lightweight data-flow fact verifier that collaboratively suppresses hallucinations and substantially reduces false positives. Evaluated on 15 real-world projects, the system identified 38 confirmed vulnerabilities, achieving an average analysis time of 0.44 hours per project and a cost of only $2.54 per project—demonstrating strong precision, efficiency, and cost-effectiveness.

Technology Category

Application Category

📝 Abstract

Code auditing is a code review process with the goal of finding bugs. Large Language Models (LLMs) have shown substantial potential in this task, offering the ability to analyze programs without compilation and enabling customized bug detection following specified prompts. However, applying LLMs to repository-level code auditing presents notable challenges. The inherent context limits and hallucinations of LLMs can lead to the low quality of bug reports. Meanwhile, the large size of software repositories introduces substantial time and token costs, hindering efficiency and scalability in real-world scenarios. This work introduces an autonomous LLM-agent, RepoAudit, designed to enable precise and efficient repository-level code auditing. Equipped with the agent memory, RepoAudit explores the code repository on demand, analyzing data-flow facts along different feasible program paths in individual functions. It also introduces the validator to check the data-flow facts for hallucination mitigation and examine the satisfiability of path conditions of potential buggy paths, which enables RepoAudit to discard false positives in the code auditing. Our experiment shows that RepoAudit powered by Claude 3.5 Sonnet successfully finds 38 true bugs in 15 real-world systems, consuming 0.44 hours and $2.54 per project on average.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Code Repository Auditing

Efficiency and Cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

RepoAudit

Code Audit

Error Detection

🔎 Similar Papers

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence