Multi-CoLoR: Context-Aware Localization and Reasoning across Multi-Language Codebases

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large language models struggle to effectively locate relevant code across multilingual repositories due to a lack of understanding of organizational context and cross-language structural relationships. To this end, we propose Multi-CoLoR, a novel framework that integrates organizational knowledge—such as historical bug-fix patterns—with graph-based reasoning to enable context-aware, cross-language code localization. Our approach introduces a Similar Issue Context (SIC) module for retrieving semantically and organizationally relevant contexts and extends LocAgent to construct a graph-traversal agent capable of structured reasoning. Evaluated on real-world enterprise datasets, Multi-CoLoR significantly improves the Acc@5 metric over lexical and graph-structured baselines while reducing the number of tool invocations.

Technology Category

Application Category

📝 Abstract
Large language models demonstrate strong capabilities in code generation but struggle to navigate complex, multi-language repositories to locate relevant code. Effective code localization requires understanding both organizational context (e.g., historical issue-fix patterns) and structural relationships within heterogeneous codebases. Existing methods either (i) focus narrowly on single-language benchmarks, (ii) retrieve code across languages via shallow textual similarity, or (iii) assume no prior context. We present Multi-CoLoR, a framework for Context-aware Localization and Reasoning across Multi-Language codebases, which integrates organizational knowledge retrieval with graph-based reasoning to traverse complex software ecosystems. Multi-CoLoR operates in two stages: (i) a similar issue context (SIC) module retrieves semantically and organizationally related historical issues to prune the search space, and (ii) a code graph traversal agent (an extended version of LocAgent, a state-of-the-art localization framework) performs structural reasoning within C++ and QML codebases. Evaluations on a real-world enterprise dataset show that incorporating SIC reduces the search space and improves localization accuracy, and graph-based reasoning generalizes effectively beyond Python-only repositories. Combined, Multi-CoLoR improves Acc@5 over both lexical and graph-based baselines while reducing tool calls on an AMD codebase.
Problem

Research questions and friction points this paper is trying to address.

code localization
multi-language codebases
context-aware reasoning
software repository navigation
cross-language code retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-language code localization
context-aware reasoning
code graph traversal
organizational knowledge retrieval
heterogeneous codebases
🔎 Similar Papers
No similar papers found.
I
Indira Vats
University of Toronto
S
Sanjukta De
Advanced Micro Devices, Inc. (AMD)
S
Subhayan Roy
Advanced Micro Devices, Inc. (AMD)
S
Saurabh Bodhe
Advanced Micro Devices, Inc. (AMD)
L
Lejin Varghese
Advanced Micro Devices, Inc. (AMD)
M
Max Kiehn
Advanced Micro Devices, Inc. (AMD)
Y
Yonas Bedasso
Advanced Micro Devices, Inc. (AMD)
Marsha Chechik
Marsha Chechik
Professor of Computer Science, University of Toronto
software engineeringsoftware verificationmodelingformal methodsassurance and dependability