CoMI-IRL: Contrastive Multi-Intention Inverse Reinforcement Learning

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of unsupervised reward function inference from multi-intention expert demonstrations without prior knowledge of the number of underlying behavioral modes. To this end, the authors propose a Transformer-based disentangled framework that decouples behavior representation learning, unsupervised clustering, and reward learning—enabling, for the first time, simultaneous discovery of behavioral modes and reward modeling without any prior labels or assumptions. By integrating contrastive learning with clustering techniques, the method not only outperforms existing approaches when the number of behavioral modes is unknown but also facilitates interpretable analysis, visualization of behavioral relationships, and rapid incremental adaptation to previously unseen behaviors.

Technology Category

Application Category

📝 Abstract
Inverse Reinforcement Learning (IRL) seeks to infer reward functions from expert demonstrations. When demonstrations originate from multiple experts with different intentions, the problem is known as Multi-Intention IRL (MI-IRL). Recent deep generative MI-IRL approaches couple behavior clustering and reward learning, but typically require prior knowledge of the number of true behavioral modes $K^*$. This reliance on expert knowledge limits their adaptability to new behaviors, and only enables analysis related to the learned rewards, and not across the behavior modes used to train them. We propose Contrastive Multi-Intention IRL (CoMI-IRL), a transformer-based unsupervised framework that decouples behavior representation and clustering from downstream reward learning. Our experiments show that CoMI-IRL outperforms existing approaches without a priori knowledge of $K^*$ or labels, while allowing for visual interpretation of behavior relationships and adaptation to unseen behavior without full retraining.
Problem

Research questions and friction points this paper is trying to address.

Inverse Reinforcement Learning
Multi-Intention IRL
Behavior Clustering
Unsupervised Learning
Reward Function
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Learning
Multi-Intention IRL
Transformer-based Framework
Unsupervised Clustering
Reward Decoupling
🔎 Similar Papers
No similar papers found.