POLCA: Stochastic Generative Optimization with LLM

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work proposes a large language model (LLM)-based stochastic generative optimization framework to address key challenges in optimizing complex systems—such as LLM prompting and multi-agent interactions—including high human iteration costs, highly stochastic feedback, and combinatorial explosion of the solution space. The approach integrates numerical rewards with textual feedback to guide the search process, employs a priority queue to balance exploration and exploitation, leverages an ε-Net to preserve parameter diversity, and introduces an LLM-based summarizer to enable cross-iteration meta-learning. Theoretical analysis guarantees convergence to a near-optimal solution. Extensive experiments on benchmarks including τ-bench, HotpotQA, VeriBench, and KernelBench demonstrate that the method significantly outperforms existing approaches, exhibiting superior efficiency, robustness, and sample efficiency.

Technology Category

Application Category

📝 Abstract

Optimizing complex systems, ranging from LLM prompts to multi-turn agents, traditionally requires labor-intensive manual iteration. We formalize this challenge as a stochastic generative optimization problem where a generative language model acts as the optimizer, guided by numerical rewards and text feedback to discover the best system. We introduce Prioritized Optimization with Local Contextual Aggregation (POLCA), a scalable framework designed to handle stochasticity in optimization -- such as noisy feedback, sampling minibatches, and stochastic system behaviors -- while effectively managing the unconstrained expansion of solution space. POLCA maintains a priority queue to manage the exploration-exploitation tradeoff, systematically tracking candidate solutions and their evaluation histories. To enhance efficiency, we integrate an $\varepsilon$-Net mechanism to maintain parameter diversity and an LLM Summarizer to perform meta-learning across historical trials. We theoretically prove that POLCA converges to near-optimal candidate solutions under stochasticity. We evaluate our framework on diverse benchmarks, including $τ$-bench, HotpotQA (agent optimization), VeriBench (code translation) and KernelBench (CUDA kernel generation). Experimental results demonstrate that POLCA achieves robust, sample and time-efficient performance, consistently outperforming state-of-the-art algorithms in both deterministic and stochastic problems. The codebase for this work is publicly available at https://github.com/rlx-lab/POLCA.

Problem

Research questions and friction points this paper is trying to address.

stochastic generative optimization

LLM-based optimization

noisy feedback

solution space expansion

complex system optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Generative Optimization

Prioritized Optimization

LLM-based Meta-learning