AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether large language models (LLMs) can transcend imitation-based programming to achieve algorithm-level innovation targeting computational efficiency. Method: We introduce AlgoTune—the first general-purpose numerical program acceleration benchmark, comprising 155 interdisciplinary programming tasks—and propose AlgoTuner, an LLM-based autonomous agent integrating code generation, formal verification, and precise wall-clock timing. The framework systematically evaluates LLMs’ ability to design efficient algorithms across computer science, physics, and mathematics, benchmarking against reference implementations in SciPy, scikit-learn, and CVXPY. Contribution/Results: Experiments show AlgoTuner achieves a 1.72× average speedup over baselines, demonstrating the feasibility of LLM-assisted algorithmic optimization. However, analysis reveals that current models predominantly perform local or implementation-level optimizations—such as loop unrolling or vectorization—without yet attaining structural, paradigm-level algorithmic invention.

Technology Category

Application Category

📝 Abstract
Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (Jimenez et al., 2024) and mathematics (Glazer et al., 2024). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 155 coding tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages. In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models. AlgoTuner achieves an average 1.72x speedup against our reference solvers, which use libraries such as SciPy, sk-learn and CVXPY. However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.
Problem

Research questions and friction points this paper is trying to address.

Evaluate LMs' ability to design efficient algorithms
Test LMs on open-ended computational problems
Assess if LMs can outperform human-coded reference solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

LM-based code synthesis for numerical programs
Benchmark with 155 expert-collected coding tasks
Agent achieves 1.72x speedup over reference solvers
🔎 Similar Papers
No similar papers found.
Ori Press
Ori Press
University of Tübingen & International Max Planck Research School for Intelligent Systems (IMPRS-IS)
Machine Learning
Brandon Amos
Brandon Amos
Meta
machine learningoptimizationdeep learning
H
Haoyu Zhao
Princeton University
Yikai Wu
Yikai Wu
PhD student in Computer Science, Princeton University
OptimizationDifferential Privacy
S
Samuel K. Ainsworth
D
Dominik Krupke
TU Braunschweig
Patrick Kidger
Patrick Kidger
"ML Wizard" @ Cradle.bio
BioMLneural differential equationsopen sourcenumerical methods
T
Touqir Sajed
LG Electronics Canada
Bartolomeo Stellato
Bartolomeo Stellato
Princeton University
OptimizationControlMachine Learning
J
Jisun Park
Princeton University, Seoul National University
N
Nathanael Bosch
Tübingen AI Center, University of Tübingen
E
Eli Meril
Tel Aviv University
A
Albert Steppi
Quansight PBC
Arman Zharmagambetov
Arman Zharmagambetov
Research Scientist, FAIR, Meta
machine learningoptimizationAI safety
F
Fangzhao Zhang
Stanford University
D
David Perez-Pineiro
Norwegian University of Science and Technology
A
Alberto Mercurio
EPFL
Ni Zhan
Ni Zhan
Postdoc in CS, Princeton University
T
Talor Abramovich
Tel Aviv University
K
Kilian Lieret
Princeton University
H
Hanlin Zhang
Harvard University
S
Shirley Huang
Harvard University
Matthias Bethge
Matthias Bethge
Tübingen University & Maddox Co-Founder
Computational NeuroscienceMachine LearningVision
Ofir Press
Ofir Press
Princeton University
Deep LearningNatural Language Processing