SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

132K/year

🤖 AI Summary

To address the challenge of precisely localizing buggy functions from natural language error descriptions in large-scale multilingual codebases, this paper proposes SweRank—the first cross-lingual defect localization framework supporting multi-turn interactive retrieval. Methodologically, it integrates multilingual-cooperative cross-lingual code embedding retrieval with a listwise LLM-based re-ranker, and introduces a memory-augmented agent search loop for progressive, context-aware function ranking. Key contributions include: (1) the first large-scale benchmark dataset specifically designed for multilingual defect localization; (2) a novel hybrid architecture combining cross-lingual embedding retrieval and LLM-based re-ranking; and (3) an iterative, memory-equipped search agent. Experiments demonstrate that SweRankMulti establishes new state-of-the-art performance in multilingual defect localization, while SweRankAgent improves average Top-1 accuracy by 12.7% over prior approaches.

Technology Category

Application Category

📝 Abstract

Maintaining large-scale, multilingual codebases hinges on accurately localizing issues, which requires mapping natural-language error descriptions to the relevant functions that need to be modified. However, existing ranking approaches are often Python-centric and perform a single-pass search over the codebase. This work introduces SweRank+, a framework that couples SweRankMulti, a cross-lingual code ranking tool, with SweRankAgent, an agentic search setup, for iterative, multi-turn reasoning over the code repository. SweRankMulti comprises a code embedding retriever and a listwise LLM reranker, and is trained using a carefully curated large-scale issue localization dataset spanning multiple popular programming languages. SweRankAgent adopts an agentic search loop that moves beyond single-shot localization with a memory buffer to reason and accumulate relevant localization candidates over multiple turns. Our experiments on issue localization benchmarks spanning various languages demonstrate new state-of-the-art performance with SweRankMulti, while SweRankAgent further improves localization over single-pass ranking.

Problem

Research questions and friction points this paper is trying to address.

Multilingual code ranking for issue localization

Iterative multi-turn reasoning over code repositories

Improving localization accuracy beyond single-pass search

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual code embedding retriever and LLM reranker

Agentic search loop with memory for multi-turn reasoning

Iterative ranking framework combining retrieval and agentic reasoning

🔎 Similar Papers

BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning