GRPOformer: Advancing Hyperparameter Optimization via Group Relative Policy Optimization

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Transformer-based hyperparameter optimization (HPO) methods heavily rely on large-scale historical trajectory datasets and lack efficient reinforcement learning mechanisms, resulting in poor cold-start capability and training instability. To address these limitations, we propose Trans-GRPO—a novel framework that introduces Groupwise Relative Policy Optimization (GRPO) to HPO for the first time, enabling end-to-end policy learning from scratch. We further design Policy Correlation Regularization (PCR) to enhance training stability and employ a Transformer architecture to model historical trajectories and generate high-quality hyperparameter configurations. Evaluated on the OpenML multi-task benchmark, Trans-GRPO significantly outperforms state-of-the-art methods across diverse tasks, demonstrating superior generalization, robustness, and data efficiency—particularly under low-data and cold-start regimes.

Technology Category

Application Category

📝 Abstract
Hyperparameter optimization (HPO) plays a critical role in improving model performance. Transformer-based HPO methods have shown great potential; however, existing approaches rely heavily on large-scale historical optimization trajectories and lack effective reinforcement learning (RL) techniques, thereby limiting their efficiency and performance improvements. Inspired by the success of Group Relative Policy Optimization (GRPO) in large language models (LLMs), we propose GRPOformer -- a novel hyperparameter optimization framework that integrates reinforcement learning (RL) with Transformers. In GRPOformer, Transformers are employed to generate new hyperparameter configurations from historical optimization trajectories, while GRPO enables rapid trajectory construction and optimization strategy learning from scratch. Moreover, we introduce Policy Churn Regularization (PCR) to enhance the stability of GRPO training. Experimental results on OpenML demonstrate that GRPOformer consistently outperforms baseline methods across diverse tasks, offering new insights into the application of RL for HPO.
Problem

Research questions and friction points this paper is trying to address.

Improving hyperparameter optimization efficiency without large historical data
Integrating reinforcement learning with Transformers for HPO
Enhancing training stability in RL-based hyperparameter optimization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates reinforcement learning with Transformer architecture
Uses Group Relative Policy Optimization for trajectory construction
Introduces Policy Churn Regularization to stabilize training
🔎 Similar Papers
No similar papers found.
H
Haoxin Guo
College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
J
Jiawen Pan
1College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China; 2Key Laboratory of Agricultural Machinery Monitoring and Big Data Application, Ministry of Agriculture and Rural Affairs, Beijing 100083, China
Weixin Zhai
Weixin Zhai
China Agricultural University
spatial big datageographic information scienceagricultural machinery