InfCode-C++: Intent-Guided Semantic Retrieval and AST-Structured Search for C++ Issue Resolution

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based agents exhibit significantly inferior performance in repairing C++ projects compared to Python, primarily due to their reliance on lexical retrieval and shallow code navigation—approaches ill-suited for C++’s complexities, including overloaded identifiers, nested namespaces, template instantiations, and intricate control flow. Method: We propose the first end-to-end automated repair system tailored for C++. It integrates intent-guided semantic retrieval with AST-structured querying to construct language-aware, precise contextual representations and localize defects accurately. Our approach explicitly models C++’s static typing, multi-namespace scoping, and template semantics through fault-reproduction analysis and collaborative LLM reasoning. Contribution/Results: Evaluated on MultiSWE-bench-CPP, our system achieves a 25.58% solution rate—outperforming the strongest baseline by 10.85 percentage points and more than doubling the performance of MSWE-agent.

Technology Category

Application Category

📝 Abstract
Large language model (LLM) agents have recently shown strong performance on repository-level issue resolution, but existing systems are almost exclusively designed for Python and rely heavily on lexical retrieval and shallow code navigation. These approaches transfer poorly to C++ projects, where overloaded identifiers, nested namespaces, template instantiations, and deep control-flow structures make context retrieval and fault localization substantially more difficult. As a result, state-of-the-art Python-oriented agents show a drastic performance drop on the C++ subset of MultiSWE-bench. We introduce INFCODE-C++, the first C++-aware autonomous system for end-to-end issue resolution. The system combines two complementary retrieval mechanisms -- semantic code-intent retrieval and deterministic AST-structured querying -- to construct accurate, language-aware context for repair.These components enable precise localization and robust patch synthesis in large, statically typed C++ repositories. Evaluated on the exttt{MultiSWE-bench-CPP} benchmark, INFCODE-C++ achieves a resolution rate of 25.58%, outperforming the strongest prior agent by 10.85 percentage points and more than doubling the performance of MSWE-agent. Ablation and behavioral studies further demonstrate the critical role of semantic retrieval, structural analysis, and accurate reproduction in C++ issue resolution. INFCODE-C++ highlights the need for language-aware reasoning in multi-language software agents and establishes a foundation for future research on scalable, LLM-driven repair for complex, statically typed ecosystems.
Problem

Research questions and friction points this paper is trying to address.

Addressing poor performance of Python-oriented LLM agents on C++ issue resolution
Overcoming challenges with overloaded identifiers and complex C++ code structures
Improving context retrieval and fault localization in statically typed C++ repositories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Intent-guided semantic retrieval for C++ code
AST-structured search for precise fault localization
Combining semantic and structural analysis for repair
🔎 Similar Papers
No similar papers found.
Q
Qingao Dong
Beihang University, China and Beijing Tokfinity Technology Co., Ltd., China
M
Mengfei Wang
Beijing Tokfinity Technology Co., Ltd., China
H
Hengzhi Zhang
Beijing Tokfinity Technology Co., Ltd., China
Z
Zhichao Li
Beijing Tokfinity Technology Co., Ltd., China
Y
Yuan Yuan
Beihang University, China
M
Mu Li
Beihang University, China
X
Xiang Gao
Beihang University, China
Hailong Sun
Hailong Sun
Professor of Computer Science, Beihang University
Software EngineeringArtificial IntelligenceSoftware Systems
C
Chunming Hu
Beihang University, China
W
Weifeng Lv
Beihang University, China