The Pursuit of Diversity: Multi-Objective Testing of Deep Reinforcement Learning Agents

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing deep reinforcement learning (DRL) testing tools—such as INDAGO—optimize solely for the number of failing scenarios, limiting coverage of diverse scene configurations and failure modes. Method: This paper proposes INDAGO-Nexus, a multi-objective evolutionary framework for failure-scenario generation that jointly optimizes failure probability and scenario diversity. It incorporates behavioral feature distance and trajectory divergence as diversity metrics and employs Pareto-optimal front selection to overcome the limitations of single-objective optimization. Contribution/Results: Evaluated on autonomous driving and automatic parking tasks, INDAGO-Nexus discovers, on average, 83% more unique failures than INDAGO and accelerates fault detection by 67%. The approach significantly enhances the effectiveness and comprehensiveness of DRL agent testing in safety-critical scenarios.

Technology Category

Application Category

📝 Abstract
Testing deep reinforcement learning (DRL) agents in safety-critical domains requires discovering diverse failure scenarios. Existing tools such as INDAGO rely on single-objective optimization focused solely on maximizing failure counts, but this does not ensure discovered scenarios are diverse or reveal distinct error types. We introduce INDAGO-Nexus, a multi-objective search approach that jointly optimizes for failure likelihood and test scenario diversity using multi-objective evolutionary algorithms with multiple diversity metrics and Pareto front selection strategies. We evaluated INDAGO-Nexus on three DRL agents: humanoid walker, self-driving car, and parking agent. On average, INDAGO-Nexus discovers up to 83% and 40% more unique failures (test effectiveness) than INDAGO in the SDC and Parking scenarios, respectively, while reducing time-to-failure by up to 67% across all agents.
Problem

Research questions and friction points this paper is trying to address.

Discover diverse failure scenarios for DRL agents
Overcome single-objective limitations in testing approaches
Jointly optimize failure likelihood and test diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-objective evolutionary algorithms optimize failure likelihood
Multiple diversity metrics ensure varied test scenarios
Pareto front selection balances failure diversity and probability
🔎 Similar Papers
No similar papers found.
A
Antony Bartlett
Delft University of Technology, Delft, The Netherlands
C
Cynthia Liem
Delft University of Technology, Delft, The Netherlands
Annibale Panichella
Annibale Panichella
Associate Professor, Delft University of Technology
Software TestingSE4AITest GenerationSBSEFuzzing