Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Detecting runtime errors without code execution remains challenging due to the difficulty of identifying error-prone code paths statically. Method: This paper proposes a coverage-guided static exception prediction framework leveraging a large language model (LLM)-based multi-agent collaborative system. It integrates three core components: static coverage prediction, error-triggering input generation, and a two-stage adaptive feedback mechanism—first optimizing for coverage, then dynamically shifting focus to exception detection. This design overcomes the limitation of conventional LLM prompting, which struggles to jointly optimize coverage and error identification in a single pass. Contribution/Results: The framework introduces the first pre-execution coverage-guided two-stage switching strategy. Experiments demonstrate significant improvements in both static coverage and exception detection rates without code execution, outperforming traditional and learning-based testing approaches—especially on incomplete or partial code fragments.

Technology Category

Application Category

📝 Abstract

In several software development scenarios, it is desirable to detect runtime errors and exceptions in code snippets without actual execution. A typical example is to detect runtime exceptions in online code snippets before integrating them into a codebase. In this paper, we propose Cerberus, a novel predictive, execution-free coverage-guided testing framework. Cerberus uses LLMs to generate the inputs that trigger runtime errors and to perform code coverage prediction and error detection without code execution. With a two-phase feedback loop, Cerberus first aims to both increasing code coverage and detecting runtime errors, then shifts to focus only detecting runtime errors when the coverage reaches 100% or its maximum, enabling it to perform better than prompting the LLMs for both purposes. Our empirical evaluation demonstrates that Cerberus performs better than conventional and learning-based testing frameworks for (in)complete code snippets by generating high-coverage test cases more efficiently, leading to the discovery of more runtime errors.

Problem

Research questions and friction points this paper is trying to address.

Detects runtime errors without code execution

Generates high-coverage test cases using LLMs

Improves error detection in incomplete code snippets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to generate inputs and detect errors without execution

Employs two-phase feedback loop for coverage and error detection

Shifts focus to error detection after achieving maximum coverage

🔎 Similar Papers

Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach