READY: Reward Discovery for Meta-Black-Box Optimization

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of handcrafted reward functions in meta black-box optimization, which often introduce subjective biases and lead to reward hacking, thereby constraining algorithmic performance. To overcome these issues, the study proposes a novel approach that integrates large language models (LLMs) with a hybrid architecture combining heuristic-based and multi-task evolutionary strategies for automatic reward function discovery. The method employs iterative program search to continuously refine reward functions while leveraging cross-task knowledge sharing to enhance generalization. By significantly reducing human intervention, the proposed framework outperforms existing meta black-box optimization algorithms across multiple benchmark tasks, demonstrating the effectiveness and potential of automated reward design in this domain.

Technology Category

Application Category

📝 Abstract
Meta-Black-Box Optimization (MetaBBO) is an emerging avenue within Optimization community, where algorithm design policy could be meta-learned by reinforcement learning to enhance optimization performance. So far, the reward functions in existing MetaBBO works are designed by human experts, introducing certain design bias and risks of reward hacking. In this paper, we use Large Language Model~(LLM) as an automated reward discovery tool for MetaBBO. Specifically, we consider both effectiveness and efficiency sides. On effectiveness side, we borrow the idea of evolution of heuristics, introducing tailored evolution paradigm in the iterative LLM-based program search process, which ensures continuous improvement. On efficiency side, we additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches. Such parallel process also benefits from knowledge sharing across tasks to accelerate convergence. Empirical results demonstrate that the reward functions discovered by our approach could be helpful for boosting existing MetaBBO works, underscoring the importance of reward design in MetaBBO. We provide READY's project at https://anonymous.4open.science/r/ICML_READY-747F.
Problem

Research questions and friction points this paper is trying to address.

Meta-Black-Box Optimization
reward design
reward hacking
design bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward Discovery
Meta-Black-Box Optimization
Large Language Model
Evolutionary Heuristics
Multi-task Evolution
🔎 Similar Papers
No similar papers found.
Z
Zechuan Huang
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Zhiguang Cao
Zhiguang Cao
Singapore Management University
Learning to OptimizeNeural Combinatorial OptimizationComputational Intelligence
H
Hongshu Guo
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Y
Yue-Jiao Gong
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Zeyuan Ma
Zeyuan Ma
South China University of Technology
Meta-Black-Box OptimizationReinforcement LearningLearning to Optimize