AlgoSimBench: Identifying Algorithmically Similar Problems for Competitive Programming

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Large language models (LLMs) struggle to identify algorithm-level similarity—i.e., Algorithm Similarity Problems (ASPs)—due to overreliance on superficial textual cues rather than deep algorithmic logic. Method: We introduce AlgoSimBench, the first fine-grained benchmark explicitly designed for ASP evaluation, emphasizing semantic equivalence of algorithmic structure over surface-level matching. We propose Attempt-based Solution Matching (ASM), a novel method that generates and compares candidate solutions to enhance algorithmic semantic alignment; it integrates code embeddings, BM25 retrieval, and problem summarization for multi-source collaborative assessment. Contribution/Results: Experiments reveal that state-of-the-art LLMs achieve only 65.9% accuracy on multiple-choice ASP tasks. ASM yields up to a 11.7-percentage-point absolute improvement. The summary+BM25 combination attains 52.2% accuracy. Crucially, this work is the first to systematically uncover and quantify the significant interference effect of narrative problem descriptions on ASP judgment. AlgoSimBench provides a reproducible, algorithm-centric evaluation framework and establishes a new paradigm for assessing algorithmic reasoning in LLMs.

Technology Category

Application Category

📝 Abstract

Recent progress in LLMs, such as reasoning models, has demonstrated strong abilities to solve complex competitive programming problems, often rivaling top human competitors. However, it remains underexplored whether these abilities generalize to relevant domains that are less seen during training. To address this, we introduce AlgoSimBench, a new benchmark designed to assess LLMs' ability to identify algorithmically similar problems (ASPs)-problems that can be solved using similar algorithmic approaches. AlgoSimBench consists of 1317 problems, annotated with 231 distinct fine-grained algorithm tags, from which we curate 402 multiple-choice questions (MCQs), where each question presents one algorithmically similar problem alongside three textually similar but algorithmically dissimilar distractors. Our evaluation reveals that LLMs struggle to identify ASPs, with the best-performing model (o3-mini) achieving only 65.9% accuracy on the MCQ task. To address this challenge, we propose attempted solution matching (ASM), a novel method for improving problem similarity detection. On our MCQ task, ASM yields an absolute accuracy improvement of 6.7% to 11.7% across different models. We also evaluated code embedding models and retrieval methods on similar problem identification. While the adversarial selection of problems degrades the performance to be less than random, we found that simply summarizing the problem to remove narrative elements eliminates the effect, and combining ASM with a keyword-prioritized method, BM25, can yield up to 52.2% accuracy. Code and data are available at github.com

Problem

Research questions and friction points this paper is trying to address.

Assess LLMs' ability to identify algorithmically similar problems

Improve problem similarity detection with attempted solution matching

Evaluate code embedding models on similar problem identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces AlgoSimBench for algorithmic similarity assessment

Proposes attempted solution matching (ASM) for detection

Combines ASM with BM25 for improved accuracy

🔎 Similar Papers

No similar papers found.