Diversity from Human Feedback

📅 2023-10-10
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Existing behavioral space representations suffer from misaligned diversity metrics due to the absence of human semantic grounding. Method: We propose a novel approach to learn interpretable behavioral descriptors from sparse human feedback, explicitly modeling human preferences as the objective for behavioral space learning—thereby eliminating reliance on expert-defined priors. Our framework integrates active querying with human feedback into a quality-diversity (QD) optimization pipeline, tightly coupled with the MAP-Elites algorithm and the QDax benchmark platform. Contribution/Results: Experiments on QDax tasks demonstrate that our method significantly improves both the consistency of solution sets with human preferences and their behavioral diversity, outperforming purely data-driven behavioral space construction. It enables human-intent-driven, interpretable diversity measurement—marking the first work to explicitly incorporate human preference signals as a learning objective in QD-based behavioral space discovery.
📝 Abstract
Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.
Problem

Research questions and friction points this paper is trying to address.

Defining diversity measures without expert input
Learning behavior spaces from human feedback
Enhancing diversity in solutions via human preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning behavior space from human feedback
Combining descriptor with distance for diversity
Integrating DivHF with MAP-Elites algorithm
🔎 Similar Papers
No similar papers found.
Ren-Jian Wang
Ren-Jian Wang
Nanjing University
Quality-DiversityEvolutionary AlgorithmsReinforcement LearningMachine Learning
Ke Xue
Ke Xue
Nanjing University
Black-Box OptimizationMachine Learning
Y
Yutong Wang
National Key Laboratory for Novel Software Technology, Nanjing University, School of Artificial Intelligence, Nanjing University, Nanjing, China
P
Peng Yang
Southern University of Science and Technology, Shenzhen, China
Haobo Fu
Haobo Fu
Tencent AI Lab, University of Birmingham
Reinforcement LearningEvolutionary Computation
Q
Qiang Fu
Tencent AI Lab, Shenzhen, China
Chao Qian
Chao Qian
Nanjing University
Artificial intelligenceevolutionary algorithmsmachine learning