Overcoming Deceptiveness in Fitness Optimization with Unsupervised Quality-Diversity

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional optimization methods—such as reinforcement learning and evolutionary algorithms—often stagnate at local optima when confronted with deceptive fitness landscapes. To address this, this paper proposes an unsupervised Quality-Diversity (QD) framework that requires no hand-crafted features. Methodologically, it pioneers the integration of AURORA into classical optimization tasks, synergistically combining self-supervised representation learning, contrastive learning–driven behavioral space clustering, and a periodic extinction mechanism. This enables effective diversity maintenance and local optimum escape without domain-specific prior knowledge. Empirically, the approach consistently outperforms conventional baselines across multiple deceptive control benchmarks; notably, on several tasks, it surpasses the best handcrafted-feature QD method by up to 34%. These results significantly broaden the applicability of QD algorithms to domains where defining meaningful features is inherently challenging.

Technology Category

Application Category

📝 Abstract
Policy optimization seeks the best solution to a control problem according to an objective or fitness function, serving as a fundamental field of engineering and research with applications in robotics. Traditional optimization methods like reinforcement learning and evolutionary algorithms struggle with deceptive fitness landscapes, where following immediate improvements leads to suboptimal solutions. Quality-diversity (QD) algorithms offer a promising approach by maintaining diverse intermediate solutions as stepping stones for escaping local optima. However, QD algorithms require domain expertise to define hand-crafted features, limiting their applicability where characterizing solution diversity remains unclear. In this paper, we show that unsupervised QD algorithms - specifically the AURORA framework, which learns features from sensory data - efficiently solve deceptive optimization problems without domain expertise. By enhancing AURORA with contrastive learning and periodic extinction events, we propose AURORA-XCon, which outperforms all traditional optimization baselines and matches, in some cases even improving by up to 34%, the best QD baseline with domain-specific hand-crafted features. This work establishes a novel application of unsupervised QD algorithms, shifting their focus from discovering novel solutions toward traditional optimization and expanding their potential to domains where defining feature spaces poses challenges.
Problem

Research questions and friction points this paper is trying to address.

Overcoming deceptive fitness landscapes in optimization
Eliminating need for domain expertise in QD algorithms
Enhancing optimization performance with unsupervised feature learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised QD algorithms learn features automatically
AURORA-XCon uses contrastive learning and extinction events
Outperforms traditional methods without domain expertise
L
Lisa Coiffard
Imperial College London, London, United Kingdom
P
Paul Templier
Imperial College London, London, United Kingdom
Antoine Cully
Antoine Cully
Professor of Machine Learning and Robotics at Imperial College London
Robot LearningMachine LearningQuality-DiversityReinforcement LearningNeuroevolution