READY: Reward Discovery for Meta-Black-Box Optimization

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the limitations of handcrafted reward functions in meta black-box optimization, which often introduce subjective biases and lead to reward hacking, thereby constraining algorithmic performance. To overcome these issues, the study proposes a novel approach that integrates large language models (LLMs) with a hybrid architecture combining heuristic-based and multi-task evolutionary strategies for automatic reward function discovery. The method employs iterative program search to continuously refine reward functions while leveraging cross-task knowledge sharing to enhance generalization. By significantly reducing human intervention, the proposed framework outperforms existing meta black-box optimization algorithms across multiple benchmark tasks, demonstrating the effectiveness and potential of automated reward design in this domain.

Technology Category

Application Category

📝 Abstract

Meta-Black-Box Optimization (MetaBBO) is an emerging avenue within Optimization community, where algorithm design policy could be meta-learned by reinforcement learning to enhance optimization performance. So far, the reward functions in existing MetaBBO works are designed by human experts, introducing certain design bias and risks of reward hacking. In this paper, we use Large Language Model~(LLM) as an automated reward discovery tool for MetaBBO. Specifically, we consider both effectiveness and efficiency sides. On effectiveness side, we borrow the idea of evolution of heuristics, introducing tailored evolution paradigm in the iterative LLM-based program search process, which ensures continuous improvement. On efficiency side, we additionally introduce multi-task evolution architecture to support parallel reward discovery for diverse MetaBBO approaches. Such parallel process also benefits from knowledge sharing across tasks to accelerate convergence. Empirical results demonstrate that the reward functions discovered by our approach could be helpful for boosting existing MetaBBO works, underscoring the importance of reward design in MetaBBO. We provide READY's project at https://anonymous.4open.science/r/ICML_READY-747F.

Problem

Research questions and friction points this paper is trying to address.

Meta-Black-Box Optimization

reward design

reward hacking

design bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward Discovery

Meta-Black-Box Optimization

Large Language Model

Evolutionary Heuristics