Offline Learning of Controllable Diverse Behaviors

๐Ÿ“… 2025-04-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address insufficient behavioral diversity and poor controllability in offline imitation learning, this paper proposes a controllable diversity policy learning framework for multi-expert demonstrations. Methodologically, we construct an interpretable and disentangled behavioral latent space, integrating variational autoencoding with trajectory-level temporal consistency regularization to enable user-driven behavior selection and fine-grained interpolation. We further incorporate offline reinforcement learning to enhance policy quality. Evaluated on multi-task, multi-environment benchmarks, our approach significantly outperforms state-of-the-art methods: it achieves a 92.7% fidelity rate in reproducing expert demonstration diversity, supports high-fidelity conditional generation, and enables precise, interpretable behavioral modulation. This work establishes a novel paradigm for explainable and editable imitation learning in complex real-world scenarios.

Technology Category

Application Category

๐Ÿ“ Abstract
Imitation Learning (IL) techniques aim to replicate human behaviors in specific tasks. While IL has gained prominence due to its effectiveness and efficiency, traditional methods often focus on datasets collected from experts to produce a single efficient policy. Recently, extensions have been proposed to handle datasets of diverse behaviors by mainly focusing on learning transition-level diverse policies or on performing entropy maximization at the trajectory level. While these methods may lead to diverse behaviors, they may not be sufficient to reproduce the actual diversity of demonstrations or to allow controlled trajectory generation. To overcome these drawbacks, we propose a different method based on two key features: a) Temporal Consistency that ensures consistent behaviors across entire episodes and not just at the transition level as well as b) Controllability obtained by constructing a latent space of behaviors that allows users to selectively activate specific behaviors based on their requirements. We compare our approach to state-of-the-art methods over a diverse set of tasks and environments. Project page: https://mathieu-petitbois.github.io/projects/swr/
Problem

Research questions and friction points this paper is trying to address.

Traditional IL methods produce single policies, lacking behavioral diversity
Existing diversity methods fail to ensure controllable trajectory generation
Proposed method ensures temporal consistency and user-controllable behavior selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensures temporal consistency across entire episodes
Constructs latent space for controllable behaviors
Allows selective activation of specific behaviors
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Mathieu Petitbois
Ubisoft La Forge
R
R'emy Portelas
Ubisoft La Forge
S
Sylvain Lamprier
University of Angers
Ludovic Denoyer
Ludovic Denoyer
Lead Agent research at H -- Full Professor at Sorbonne Universitรฉs on Sabatical
Machine LearningReinforcement LearningDeep learning