Analysis of Search Heuristics in the Multi-Armed Bandit Setting

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the ability of evolutionary algorithms to identify the Condorcet winner within the Dueling Bandits framework. Addressing the exploration–exploitation trade-off inherent in multi-armed bandit settings, the authors employ steady-state analysis of Markov chains to demonstrate, for the first time, that the (1+1) Evolutionary Algorithm (EA) selects the Condorcet winner with only constant probability when the pairwise winning probability satisfies \( p = \Omega(1/n) \). In contrast, an Estimation-of-Distribution Algorithm (EDA) based on MMAS achieves a significantly higher success probability of \( 1 - \Theta(p) \). The work further introduces a repeated dueling mechanism that substantially enhances the Condorcet winner identification performance of EAs.
📝 Abstract
We consider the classic Multi-Armed Bandit setting to understand the exploration/exploitation tradeoffs made by different search heuristics. Since many search heuristics work by comparing different options (in evolutionary algorithms called "individuals"; in the Bandit literature called "arms"), we work with the "Dueling Bandits" setting. In each iteration, a comparison between different arms can be made; in the binary stochastic setting, each arm has a fixed winning probability against any other arm. A Condorcet winner is any arm that beats every other arm with a probability strictly higher than $1/2$. We show that evolutionary algorithms are rather bad at identifying the Condorcet winner: Even if the Condorcet winner beats every other arm with a probability $1-p$, the (1+1) EA, in its stationary distribution, chooses the Condorcet winner only with constant probability if $p=Ω(1/n)$. By contrast, we show that a simple EDA (based on the Max-Min Ant System with iteration-best update) will choose the Condorcet winner in its maintained distribution with probability $1-Θ(p)$. As a remedy for the (1+1) EA, we show how repeated duels can significantly boost the probability of the Condorcet winner in the stationary distribution.
Problem

Research questions and friction points this paper is trying to address.

Multi-Armed Bandit
Exploration/Exploitation Tradeoff
Condorcet Winner
Search Heuristics
Dueling Bandits
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dueling Bandits
Condorcet winner
Evolutionary Algorithms
Estimation of Distribution Algorithms
Exploration-Exploitation Tradeoff
🔎 Similar Papers
No similar papers found.
J
Jasmin Brandt
University of Bielefeld
Barbara Hammer
Barbara Hammer
Professor, Bielefeld University
machine learningdata miningneural networksbioinformaticstheoretical computer science
T
Timo Kötzing
Hasso Plattner Institute, University of Potsdam
J
Jurek Sander
Hasso Plattner Institute, University of Potsdam