CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Repetitive and semantically similar problems in competitive programming distort evaluation fidelity and inflate model capability estimates. Method: This paper introduces CPRet, the first retrieval-oriented benchmark for competitive programming, featuring four fine-grained tasks: text-to-code, code-to-code, problem-to-duplicate, and simplified-problem-to-original. It proposes a problem-level similarity retrieval paradigm, constructs a temporally separated test set to ensure evaluation authenticity, designs a Group-InfoNCE loss for fine-grained problem–code alignment, and formalizes two novel core tasks: duplicate problem identification and simplified problem provenance tracing. Using fine-tuned pre-trained language models and hybrid-annotated data (human-curated + web-crawled), we release open-source dual models—CPRetriever-Code and CPRetriever-Prob—alongside a high-quality dataset. Results: Empirical analysis shows that high-similarity problems artificially inflate model pass rates by up to 12.7%, demonstrating CPRet’s critical role in enabling fair, rigorous, and realistic model evaluation.

Technology Category

Application Category

📝 Abstract
Competitive programming benchmarks are widely used in scenarios such as programming contests and large language model assessments. However, the growing presence of duplicate or highly similar problems raises concerns not only about competition fairness, but also about the validity of competitive programming as a benchmark for model evaluation. In this paper, we propose a new problem -- similar question retrieval -- to address this issue. Due to the lack of both data and models, solving this problem is challenging. To this end, we introduce CPRet, a retrieval-oriented benchmark suite for competitive programming, covering four retrieval tasks: two code-centric (i.e., Text-to-Code and Code-to-Code) and two newly proposed problem-centric tasks (i.e., Problem-to-Duplicate and Simplified-to-Full), built from a combination of automatically crawled problem-solution data and manually curated annotations. Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation. In addition, we develop two task-specialized retrievers based on this dataset: CPRetriever-Code, trained with a novel Group-InfoNCE loss for problem-code alignment, and CPRetriever-Prob, fine-tuned for identifying problem-level similarity. Both models achieve strong results and are open-sourced for local use. Finally, we analyze LiveCodeBench and find that high-similarity problems inflate model pass rates and reduce differentiation, underscoring the need for similarity-aware evaluation in future benchmarks. Code and data are available at: https://github.com/coldchair/CPRet
Problem

Research questions and friction points this paper is trying to address.

Addressing duplicate or similar problems in competitive programming benchmarks
Introducing CPRet for retrieval tasks in competitive programming
Developing models to identify problem-code and problem-level similarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

CPRet benchmark suite for competitive programming retrieval
Group-InfoNCE loss for problem-code alignment
Task-specialized retrievers for code and problem similarity
🔎 Similar Papers
No similar papers found.
Han Deng
Han Deng
Houston Methodist Research Institute
Machine Learning
Y
Yuan Meng
Tsinghua University
S
Shixiang Tang
Shanghai Artificial Intelligence Laboratory
W
Wanli Ouyang
Shanghai Artificial Intelligence Laboratory, The Chinese University of Hong Kong
Xinzhu Ma
Xinzhu Ma
Associate Professor, Beihang University
deep learningcomputer vision3D scene understandingai4science