🤖 AI Summary
This study investigates the ability of evolutionary algorithms to identify the Condorcet winner within the Dueling Bandits framework. Addressing the exploration–exploitation trade-off inherent in multi-armed bandit settings, the authors employ steady-state analysis of Markov chains to demonstrate, for the first time, that the (1+1) Evolutionary Algorithm (EA) selects the Condorcet winner with only constant probability when the pairwise winning probability satisfies \( p = \Omega(1/n) \). In contrast, an Estimation-of-Distribution Algorithm (EDA) based on MMAS achieves a significantly higher success probability of \( 1 - \Theta(p) \). The work further introduces a repeated dueling mechanism that substantially enhances the Condorcet winner identification performance of EAs.
📝 Abstract
We consider the classic Multi-Armed Bandit setting to understand the exploration/exploitation tradeoffs made by different search heuristics. Since many search heuristics work by comparing different options (in evolutionary algorithms called "individuals"; in the Bandit literature called "arms"), we work with the "Dueling Bandits" setting. In each iteration, a comparison between different arms can be made; in the binary stochastic setting, each arm has a fixed winning probability against any other arm. A Condorcet winner is any arm that beats every other arm with a probability strictly higher than $1/2$.
We show that evolutionary algorithms are rather bad at identifying the Condorcet winner: Even if the Condorcet winner beats every other arm with a probability $1-p$, the (1+1) EA, in its stationary distribution, chooses the Condorcet winner only with constant probability if $p=Ω(1/n)$. By contrast, we show that a simple EDA (based on the Max-Min Ant System with iteration-best update) will choose the Condorcet winner in its maintained distribution with probability $1-Θ(p)$. As a remedy for the (1+1) EA, we show how repeated duels can significantly boost the probability of the Condorcet winner in the stationary distribution.