Offline Discovery of Interpretable Skills from Multi-Task Trajectories

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes LOKI, a framework for reusable and interpretable skill discovery and hierarchical imitation from long-horizon, multi-task offline trajectories lacking explicit rewards or subtask annotations. LOKI employs a three-stage end-to-end learning pipeline: it first uses weak task labels to segment trajectories at a macro level, then integrates an alignment-augmented vector-quantized VAE with self-supervised sequence modeling to jointly refine skill boundaries through iterative clustering, and finally constructs an option-based hierarchical policy with learnable termination conditions. LOKI is the first method to achieve semantically interpretable and composable skill discovery in the reward-free, multi-task offline setting. On the D4RL Kitchen benchmark, it outperforms existing hierarchical imitation learning approaches, yielding skills that align with human intuition and can be effectively composed to solve novel tasks.

Technology Category

Application Category

📝 Abstract
Hierarchical Imitation Learning is a powerful paradigm for acquiring complex robot behaviors from demonstrations. A central challenge, however, lies in discovering reusable skills from long-horizon, multi-task offline data, especially when the data lacks explicit rewards or subtask annotations. In this work, we introduce LOKI, a three-stage end-to-end learning framework designed for offline skill discovery and hierarchical imitation. The framework commences with a two-stage, weakly supervised skill discovery process: Stage one performs coarse, task-aware macro-segmentation by employing an alignment-enforced Vector Quantized VAE guided by weak task labels. Stage two then refines these segments at a micro-level using a self-supervised sequential model, followed by an iterative clustering process to consolidate skill boundaries. The third stage then leverages these precise boundaries to construct a hierarchical policy within an option-based framework-complete with a learned termination condition beta for explicit skill switching. LOKI achieves high success rates on the challenging D4RL Kitchen benchmark and outperforms standard HIL baselines. Furthermore, we demonstrate that the discovered skills are semantically meaningful, aligning with human intuition, and exhibit compositionality by successfully sequencing them to solve a novel, unseen task.
Problem

Research questions and friction points this paper is trying to address.

offline skill discovery
hierarchical imitation learning
multi-task trajectories
interpretable skills
long-horizon tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline Skill Discovery
Hierarchical Imitation Learning
Vector Quantized VAE
Option Framework
Self-supervised Segmentation
C
Chongyu Zhu
Department of Mechanical and Industrial Engineering, and the Operation Research and Reinforcement Learning (DORL) Lab, University of Toronto, Toronto, ON, Canada
M
Mithun Vanniasinghe
University of Toronto Institute for Aerospace Studies (UTIAS), Toronto, ON, Canada
J
Jiayu Chen
Agentic Intelligence Lab, The University of Hong Kong, Hong Kong SAR, China
Chi-Guhn Lee
Chi-Guhn Lee
University of Toronto
Operations ResearchMarkov Decision ProcessesReinforcement Learning