Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) can outperform classical algorithms in hyperparameter optimization (HPO) while addressing their limitations in state tracking and reliability. To this end, the authors propose Centaur, a hybrid approach that integrates an LLM with the CMA-ES optimizer: the LLM performs unconstrained search by directly editing training code via the autoresearch platform, while a lightweight LLM collaborates with CMA-ES by leveraging its internal state—including mean, step size, and covariance matrix—for informed decision-making. Experimental results under fixed computational budgets show that classical HPO methods retain an advantage in fixed search spaces; however, Centaur significantly surpasses both pure LLM-based and traditional approaches. Notably, a 0.8B-parameter LLM combined with CMA-ES even exceeds the performance of a standalone 27B-parameter LLM, demonstrating the efficacy of integrating lightweight models with powerful optimizers.

Technology Category

Application Category

📝 Abstract

The autoresearch repository enables an LLM agent to search for optimal hyperparameter configurations on an unconstrained search space by editing the training code directly. Given a fixed compute budget and constraints, we use \emph{autoresearch} as a testbed to compare classical hyperparameter optimization (HPO) algorithms against LLM-based methods on tuning the hyperparameters of a small language model. Within a fixed hyperparameter search space, classical HPO methods such as CMA-ES and TPE consistently outperform LLM-based agents. However, an LLM agent that directly edits training source code in an unconstrained search space narrows the gap to classical methods substantially despite using only a self-hosted open-weight 27B model. Methods that avoid out-of-memory failures outperform those with higher search diversity, suggesting that reliability matters more than exploration breadth. While small and mid-sized LLMs struggle to track optimization state across trials, classical methods lack domain knowledge. To bridge this gap, we introduce Centaur, a hybrid that shares CMA-ES's internal state, including mean vector, step-size, and covariance matrix, with an LLM. Centaur achieves the best result in our experiments, with its 0.8B variant outperforming the 27B variant, suggesting that a cheap LLM suffices when paired with a strong classical optimizer. The 0.8B model is insufficient for unconstrained code editing but sufficient for hybrid optimization, while scaling to 27B provides no advantage for fixed search space methods with the open-weight models tested. Code is available at https://github.com/ferreirafabio/autoresearch-automl.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Hyperparameter Optimization

Classical HPO Algorithms

Unconstrained Search Space

Hybrid Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid optimization

LLM-based hyperparameter tuning

CMA-ES integration