🤖 AI Summary
This work addresses the challenge that large language models struggle to effectively locate relevant code across multilingual repositories due to a lack of understanding of organizational context and cross-language structural relationships. To this end, we propose Multi-CoLoR, a novel framework that integrates organizational knowledge—such as historical bug-fix patterns—with graph-based reasoning to enable context-aware, cross-language code localization. Our approach introduces a Similar Issue Context (SIC) module for retrieving semantically and organizationally relevant contexts and extends LocAgent to construct a graph-traversal agent capable of structured reasoning. Evaluated on real-world enterprise datasets, Multi-CoLoR significantly improves the Acc@5 metric over lexical and graph-structured baselines while reducing the number of tool invocations.
📝 Abstract
Large language models demonstrate strong capabilities in code generation but struggle to navigate complex, multi-language repositories to locate relevant code. Effective code localization requires understanding both organizational context (e.g., historical issue-fix patterns) and structural relationships within heterogeneous codebases. Existing methods either (i) focus narrowly on single-language benchmarks, (ii) retrieve code across languages via shallow textual similarity, or (iii) assume no prior context. We present Multi-CoLoR, a framework for Context-aware Localization and Reasoning across Multi-Language codebases, which integrates organizational knowledge retrieval with graph-based reasoning to traverse complex software ecosystems. Multi-CoLoR operates in two stages: (i) a similar issue context (SIC) module retrieves semantically and organizationally related historical issues to prune the search space, and (ii) a code graph traversal agent (an extended version of LocAgent, a state-of-the-art localization framework) performs structural reasoning within C++ and QML codebases. Evaluations on a real-world enterprise dataset show that incorporating SIC reduces the search space and improves localization accuracy, and graph-based reasoning generalizes effectively beyond Python-only repositories. Combined, Multi-CoLoR improves Acc@5 over both lexical and graph-based baselines while reducing tool calls on an AMD codebase.