CoIR: A Comprehensive Benchmark for Code Information Retrieval Models

📅 2024-07-03
🏛️ arXiv.org
📈 Citations: 10
Influential: 4
📄 PDF
🤖 AI Summary
Existing code information retrieval (IR) research suffers from a lack of comprehensive benchmarks, task monotony, and insufficient domain coverage, hindering holistic evaluation of model capabilities. To address this, we introduce CoIR—the first comprehensive, multi-task, cross-domain benchmark for code IR—encompassing 10 datasets, 8 distinct retrieval tasks, and 7 programming languages/domains. CoIR is the first to systematically formalize diversity-oriented evaluation dimensions for code retrieval and ensures full compatibility with MTEB and BEIR standards. We release an open-source, installable Python framework enabling fair, reproducible evaluation. Standardized assessment across nine state-of-the-art IR models reveals substantial performance degradation on code-specific tasks compared to natural-language IR. CoIR has been widely adopted by the community, catalyzing the development of multiple novel code IR models and bridging a critical gap in extending general-purpose IR methodologies to code.

Technology Category

Application Category

📝 Abstract
Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Addressing this gap, we present COIR (Code Information Retrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. COIR comprises ten meticulously curated code datasets, spanning eight distinctive retrieval tasks across seven diverse domains. We first discuss the construction of COIR and its diverse dataset composition. Further, we evaluate nine widely used retrieval models using COIR, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems. To facilitate easy adoption and integration within existing research workflows, COIR has been developed as a user-friendly Python framework, readily installable via pip. It shares same data schema as other popular benchmarks like MTEB and BEIR, enabling seamless cross-benchmark evaluations. Through COIR, we aim to invigorate research in the code retrieval domain, providing a versatile benchmarking tool that encourages further development and exploration of code retrieval systems https://github.com/CoIR-team/coir.
Problem

Research questions and friction points this paper is trying to address.

Addresses the gap in code retrieval benchmarks
Evaluates code retrieval models across diverse domains
Provides a user-friendly framework for code retrieval research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed COIR benchmark for code retrieval
Includes ten datasets across seven domains
User-friendly Python framework for easy integration
🔎 Similar Papers
No similar papers found.
X
Xiangyang Li
HUAWEI NOAH'S ARK LAB
Kuicai Dong
Kuicai Dong
Huawei Noah's Ark Lab, Nanyang Technological University
Natural Language ProcessingInformation ExtractionInformation RetrievalRAGRecommendation
Y
Yi Quan Lee
HUAWEI NOAH'S ARK LAB
W
Wei Xia
HUAWEI NOAH'S ARK LAB
Yichun Yin
Yichun Yin
Noah's Ark Lab, Huawei
LLM
H
Hao Zhang
HUAWEI NOAH'S ARK LAB
Y
Yong Liu
HUAWEI NOAH'S ARK LAB
Yasheng Wang
Yasheng Wang
Tencent
Natural Language Processing
R
Ruiming Tang
HUAWEI NOAH'S ARK LAB