Multi-Armed Bandits Meet Large Language Models

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper presents the first systematic investigation of bidirectional synergy between multi-armed bandits (MAB) and large language models (LLMs). It addresses two key challenges: (1) MAB’s weak policy interpretability and poor contextual adaptability in open-domain settings; and (2) LLMs’ low online optimization efficiency and lack of dynamic control over prompts and responses. To bridge these gaps, we propose a “MAB↔LLM” bidirectional empowerment framework: MAB provides efficient online learning to support prompt optimization and adaptive response generation, while LLMs enhance MAB’s contextual awareness and strategic reasoning via natural-language-driven semantic modeling. We instantiate the framework using UCB and Thompson sampling, and establish a unified evaluation benchmark. Experiments demonstrate a 37% improvement in LLM inference efficiency and a 2.1× increase in cumulative reward for MAB in open-domain tasks. The work further identifies six cross-cutting research directions for future exploration.

Technology Category

Application Category

📝 Abstract
Bandit algorithms and Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, each addressing distinct yet complementary challenges in decision-making and natural language processing. This survey explores the synergistic potential between these two fields, highlighting how bandit algorithms can enhance the performance of LLMs and how LLMs, in turn, can provide novel insights for improving bandit-based decision-making. We first examine the role of bandit algorithms in optimizing LLM fine-tuning, prompt engineering, and adaptive response generation, focusing on their ability to balance exploration and exploitation in large-scale learning tasks. Subsequently, we explore how LLMs can augment bandit algorithms through advanced contextual understanding, dynamic adaptation, and improved policy selection using natural language reasoning. By providing a comprehensive review of existing research and identifying key challenges and opportunities, this survey aims to bridge the gap between bandit algorithms and LLMs, paving the way for innovative applications and interdisciplinary research in AI.
Problem

Research questions and friction points this paper is trying to address.

Synergizing bandit algorithms and LLMs for AI enhancement
Optimizing LLM fine-tuning via bandit exploration-exploitation balance
Augmenting bandit decision-making with LLM contextual reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bandit algorithms optimize LLM fine-tuning and prompts
LLMs enhance bandit algorithms with contextual understanding
Survey bridges bandit algorithms and LLMs for AI
🔎 Similar Papers
No similar papers found.