SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization

πŸ“… 2025-12-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenge of precisely localizing buggy functions from natural language error descriptions in large-scale multilingual codebases, this paper proposes SweRankβ€”the first cross-lingual defect localization framework supporting multi-turn interactive retrieval. Methodologically, it integrates multilingual-cooperative cross-lingual code embedding retrieval with a listwise LLM-based re-ranker, and introduces a memory-augmented agent search loop for progressive, context-aware function ranking. Key contributions include: (1) the first large-scale benchmark dataset specifically designed for multilingual defect localization; (2) a novel hybrid architecture combining cross-lingual embedding retrieval and LLM-based re-ranking; and (3) an iterative, memory-equipped search agent. Experiments demonstrate that SweRankMulti establishes new state-of-the-art performance in multilingual defect localization, while SweRankAgent improves average Top-1 accuracy by 12.7% over prior approaches.

Technology Category

Application Category

πŸ“ Abstract
Maintaining large-scale, multilingual codebases hinges on accurately localizing issues, which requires mapping natural-language error descriptions to the relevant functions that need to be modified. However, existing ranking approaches are often Python-centric and perform a single-pass search over the codebase. This work introduces SweRank+, a framework that couples SweRankMulti, a cross-lingual code ranking tool, with SweRankAgent, an agentic search setup, for iterative, multi-turn reasoning over the code repository. SweRankMulti comprises a code embedding retriever and a listwise LLM reranker, and is trained using a carefully curated large-scale issue localization dataset spanning multiple popular programming languages. SweRankAgent adopts an agentic search loop that moves beyond single-shot localization with a memory buffer to reason and accumulate relevant localization candidates over multiple turns. Our experiments on issue localization benchmarks spanning various languages demonstrate new state-of-the-art performance with SweRankMulti, while SweRankAgent further improves localization over single-pass ranking.
Problem

Research questions and friction points this paper is trying to address.

Multilingual code ranking for issue localization
Iterative multi-turn reasoning over code repositories
Improving localization accuracy beyond single-pass search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual code embedding retriever and LLM reranker
Agentic search loop with memory for multi-turn reasoning
Iterative ranking framework combining retrieval and agentic reasoning
πŸ”Ž Similar Papers
No similar papers found.