GRPOformer: Advancing Hyperparameter Optimization via Group Relative Policy Optimization

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing Transformer-based hyperparameter optimization (HPO) methods heavily rely on large-scale historical trajectory datasets and lack efficient reinforcement learning mechanisms, resulting in poor cold-start capability and training instability. To address these limitations, we propose Trans-GRPO—a novel framework that introduces Groupwise Relative Policy Optimization (GRPO) to HPO for the first time, enabling end-to-end policy learning from scratch. We further design Policy Correlation Regularization (PCR) to enhance training stability and employ a Transformer architecture to model historical trajectories and generate high-quality hyperparameter configurations. Evaluated on the OpenML multi-task benchmark, Trans-GRPO significantly outperforms state-of-the-art methods across diverse tasks, demonstrating superior generalization, robustness, and data efficiency—particularly under low-data and cold-start regimes.

Technology Category

Application Category

📝 Abstract

Hyperparameter optimization (HPO) plays a critical role in improving model performance. Transformer-based HPO methods have shown great potential; however, existing approaches rely heavily on large-scale historical optimization trajectories and lack effective reinforcement learning (RL) techniques, thereby limiting their efficiency and performance improvements. Inspired by the success of Group Relative Policy Optimization (GRPO) in large language models (LLMs), we propose GRPOformer -- a novel hyperparameter optimization framework that integrates reinforcement learning (RL) with Transformers. In GRPOformer, Transformers are employed to generate new hyperparameter configurations from historical optimization trajectories, while GRPO enables rapid trajectory construction and optimization strategy learning from scratch. Moreover, we introduce Policy Churn Regularization (PCR) to enhance the stability of GRPO training. Experimental results on OpenML demonstrate that GRPOformer consistently outperforms baseline methods across diverse tasks, offering new insights into the application of RL for HPO.

Problem

Research questions and friction points this paper is trying to address.

Improving hyperparameter optimization efficiency without large historical data

Integrating reinforcement learning with Transformers for HPO

Enhancing training stability in RL-based hyperparameter optimization methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates reinforcement learning with Transformer architecture

Uses Group Relative Policy Optimization for trajectory construction

Introduces Policy Churn Regularization to stabilize training

🔎 Similar Papers

Large Language Model Agent for Hyper-Parameter Optimization