Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Offline black-box optimization (BBO) suffers from a lack of unified representation and poor cross-task generalization in heterogeneous numerical spaces. Method: This paper proposes the first language-model-embedding-based universal BBO framework. It stringifies multivariate heterogeneous optimization problems and maps them into the embedding space of a pretrained language model, enabling an end-to-end next-token prediction paradigm jointly optimized with offline reinforcement learning for robust latent-space representation learning. Contribution/Results: The approach eliminates reliance on task-specific designs and fixed-dimensional parameterizations inherent in conventional BBO methods. Evaluated on diverse open-source offline BBO benchmarks, the framework significantly improves cross-task and cross-dimensional generalization, achieving an average 23.6% performance gain in zero-shot transfer settings. This work establishes a novel paradigm and provides empirical foundations for developing universal black-box optimization algorithms.

Technology Category

Application Category

📝 Abstract

The pursuit of universal black-box optimization (BBO) algorithms is a longstanding goal. However, unlike domains such as language or vision, where scaling structured data has driven generalization, progress in offline BBO remains hindered by the lack of unified representations for heterogeneous numerical spaces. Thus, existing offline BBO approaches are constrained to single-task and fixed-dimensional settings, failing to achieve cross-domain universal optimization. Recent advances in language models (LMs) offer a promising path forward: their embeddings capture latent relationships in a unifying way, enabling universal optimization across different data types possible. In this paper, we discuss multiple potential approaches, including an end-to-end learning framework in the form of next-token prediction, as well as prioritizing the learning of latent spaces with strong representational capabilities. To validate the effectiveness of these methods, we collect offline BBO tasks and data from open-source academic works for training. Experiments demonstrate the universality and effectiveness of our proposed methods. Our findings suggest that unifying language model priors and learning string embedding space can overcome traditional barriers in universal BBO, paving the way for general-purpose BBO algorithms. The code is provided at https://github.com/lamda-bbo/universal-offline-bbo.

Problem

Research questions and friction points this paper is trying to address.

Lack of unified representations for heterogeneous numerical spaces in offline BBO.

Existing offline BBO methods fail to achieve cross-domain universal optimization.

Need to leverage language model embeddings for universal black-box optimization.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning language model embeddings for optimization

End-to-end next-token prediction framework

Unifying latent spaces with strong representations

🔎 Similar Papers

OmniPred: Language Models as Universal Regressors