AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) can transcend imitation-based programming to achieve algorithm-level innovation targeting computational efficiency. Method: We introduce AlgoTune—the first general-purpose numerical program acceleration benchmark, comprising 155 interdisciplinary programming tasks—and propose AlgoTuner, an LLM-based autonomous agent integrating code generation, formal verification, and precise wall-clock timing. The framework systematically evaluates LLMs’ ability to design efficient algorithms across computer science, physics, and mathematics, benchmarking against reference implementations in SciPy, scikit-learn, and CVXPY. Contribution/Results: Experiments show AlgoTuner achieves a 1.72× average speedup over baselines, demonstrating the feasibility of LLM-assisted algorithmic optimization. However, analysis reveals that current models predominantly perform local or implementation-level optimizations—such as loop unrolling or vectorization—without yet attaining structural, paradigm-level algorithmic invention.

Technology Category

Application Category

📝 Abstract

Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (Jimenez et al., 2024) and mathematics (Glazer et al., 2024). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 155 coding tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages. In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models. AlgoTuner achieves an average 1.72x speedup against our reference solvers, which use libraries such as SciPy, sk-learn and CVXPY. However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.

Problem

Research questions and friction points this paper is trying to address.

Evaluate LMs' ability to design efficient algorithms

Test LMs on open-ended computational problems

Assess if LMs can outperform human-coded reference solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

LM-based code synthesis for numerical programs

Benchmark with 155 expert-collected coding tasks

Agent achieves 1.72x speedup over reference solvers

🔎 Similar Papers

No similar papers found.