GFlowPO: Generative Flow Network as a Language Model Prompt Optimizer

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently searching the combinatorially vast prompt space of language models, which is exacerbated by sparse reward signals. The authors frame prompt optimization as a posterior inference problem over latent prompts, guided by a meta-prompt prior. They propose an off-policy optimization framework based on Generative Flow Networks (GFlowNets), enhanced with a replay buffer and a priority queue to balance exploration and exploitation. A novel training-free dynamic memory update mechanism is introduced to focus sampling on high-reward regions without additional learning overhead. Empirical evaluations across few-shot classification, instruction induction, and question answering tasks demonstrate that the proposed method significantly outperforms existing discrete prompt optimization approaches.

Technology Category

Application Category

📝 Abstract
Finding effective prompts for language models (LMs) is critical yet notoriously difficult: the prompt space is combinatorially large, rewards are sparse due to expensive target-LM evaluation. Yet, existing RL-based prompt optimizers often rely on on-policy updates and a meta-prompt sampled from a fixed distribution, leading to poor sample efficiency. We propose GFlowPO, a probabilistic prompt optimization framework that casts prompt search as a posterior inference problem over latent prompts regularized by a meta-prompted reference-LM prior. In the first step, we fine-tune a lightweight prompt-LM with an off-policy Generative Flow Network (GFlowNet) objective, using a replay-based training policy that reuses past prompt evaluations to enable sample-efficient exploration. In the second step, we introduce Dynamic Memory Update (DMU), a training-free mechanism that updates the meta-prompt by injecting both (i) diverse prompts from a replay buffer and (ii) top-performing prompts from a small priority queue, thereby progressively concentrating the search process on high-reward regions. Across few-shot text classification, instruction induction benchmarks, and question answering tasks, GFlowPO consistently outperforms recent discrete prompt optimization baselines.
Problem

Research questions and friction points this paper is trying to address.

prompt optimization
language models
combinatorial search
sparse rewards
sample efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

GFlowNet
prompt optimization
off-policy learning
Dynamic Memory Update
posterior inference
🔎 Similar Papers
No similar papers found.
J
Junmo Cho
Korea Advanced Institute of Science and Technology (KAIST)
S
Suhan Kim
Korea University
S
Sangjune An
Korea University
Minsu Kim
Minsu Kim
KAIST
Machine LearningSignal ProcessingEfficient AI
D
Dong Bok Lee
Korea Advanced Institute of Science and Technology (KAIST)
Heejun Lee
Heejun Lee
Korea Advanced Institute of Science and Technology
TransformersEfficient Neural Network
Sung Ju Hwang
Sung Ju Hwang
KAIST, DeepAuto
Machine learning
H
Haebeom Lee
Korea University