Analysis of Search Heuristics in the Multi-Armed Bandit Setting

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This study investigates the ability of evolutionary algorithms to identify the Condorcet winner within the Dueling Bandits framework. Addressing the exploration–exploitation trade-off inherent in multi-armed bandit settings, the authors employ steady-state analysis of Markov chains to demonstrate, for the first time, that the (1+1) Evolutionary Algorithm (EA) selects the Condorcet winner with only constant probability when the pairwise winning probability satisfies $ p = \Omega(1/n) $. In contrast, an Estimation-of-Distribution Algorithm (EDA) based on MMAS achieves a significantly higher success probability of $ 1 - \Theta(p) $. The work further introduces a repeated dueling mechanism that substantially enhances the Condorcet winner identification performance of EAs.

Technology Category

Application Category

📝 Abstract

We consider the classic Multi-Armed Bandit setting to understand the exploration/exploitation tradeoffs made by different search heuristics. Since many search heuristics work by comparing different options (in evolutionary algorithms called "individuals"; in the Bandit literature called "arms"), we work with the "Dueling Bandits" setting. In each iteration, a comparison between different arms can be made; in the binary stochastic setting, each arm has a fixed winning probability against any other arm. A Condorcet winner is any arm that beats every other arm with a probability strictly higher than $1/2$. We show that evolutionary algorithms are rather bad at identifying the Condorcet winner: Even if the Condorcet winner beats every other arm with a probability $1-p$, the (1+1) EA, in its stationary distribution, chooses the Condorcet winner only with constant probability if $p=Ω(1/n)$. By contrast, we show that a simple EDA (based on the Max-Min Ant System with iteration-best update) will choose the Condorcet winner in its maintained distribution with probability $1-Θ(p)$. As a remedy for the (1+1) EA, we show how repeated duels can significantly boost the probability of the Condorcet winner in the stationary distribution.

Problem

Research questions and friction points this paper is trying to address.

Multi-Armed Bandit

Exploration/Exploitation Tradeoff

Condorcet Winner

Search Heuristics

Dueling Bandits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dueling Bandits

Condorcet winner

Evolutionary Algorithms