FedPOB: Sample-Efficient Federated Prompt Optimization via Bandits

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of black-box access, low sample efficiency, and multi-user privacy preservation in large language model (LLM) prompt optimization, this paper proposes FedPOB—the first prompt optimization framework integrating federated learning with multi-armed bandits (MAB)—and its preference-feedback extension, FedPOB-Pref. FedPOB enables distributed prompt exploration via linear UCB and federated MAB, requiring only black-box LLM queries and lightweight pairwise preference feedback, without sharing raw data or model parameters. Theoretically, it guarantees collaborative gain and achieves a sublinear regret bound. Empirically, FedPOB significantly outperforms existing baselines, and its performance improves with increasing numbers of participating users—demonstrating the feasibility of efficient, privacy-preserving collaborative prompt optimization.

Technology Category

Application Category

📝 Abstract
The performance of large language models (LLMs) is highly sensitive to the input prompt, making prompt optimization a critical task. However, real-world application is hindered by three major challenges: (1) the black-box nature of powerful proprietary LLMs, (2) the need for high sample efficiency due to query costs, and (3) the desire for privacy-preserving collaboration among multiple users. To address these challenges simultaneously, we introduce a novel framework for sample-efficient federated prompt optimization based on multi-armed bandits (MABs). The MAB framework is uniquely suited for this problem as it is (1) inherently a black-box optimization method, (2) practically sample-efficient, and (3) enables collaborative learning with theoretically guaranteed benefit from more participating agents. We first propose the Federated Prompt Optimization via Bandits (FedPOB) algorithm, a federated variant of the Linear UCB algorithm, where agents collaborate by sharing model parameters instead of raw data. We then extend our approach to the practical setting of comparative user feedback by introducing FedPOB with Preference Feedback (FedPOB-Pref), an efficient algorithm based on federated dueling bandits. Extensive experiments demonstrate that both FedPOB and FedPOB-Pref significantly outperform existing baselines and that their performance consistently improves as more agents participate in the collaboration, validating the effectiveness of our federated approach.
Problem

Research questions and friction points this paper is trying to address.

Optimizing prompts for black-box large language models
Achieving high sample efficiency with limited query budgets
Enabling privacy-preserving collaborative prompt optimization across users
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated bandit algorithm optimizes prompts collaboratively
Uses multi-armed bandits for black-box prompt optimization
Shares model parameters instead of raw data
🔎 Similar Papers