LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning

📅 2024-07-22
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Poor interpretability of neural policies in reinforcement learning (RL) remains a critical challenge, while existing concept-bottleneck methods require prohibitively expensive human annotations of semantic concepts. This paper proposes a low-annotation-cost, concept-driven RL paradigm featuring a three-stage synergistic mechanism: (i) alternating concept learning and policy optimization, (ii) active annotation selection guided by ensemble-based uncertainty estimation, and (iii) concept-data decorrelation. The approach reduces required human annotations to only 500–5,000 instances without sacrificing performance. Integrating active learning, concept-bottleneck modeling, ensemble uncertainty quantification, and vision-language model (VLM)-assisted annotation, it achieves high-performing and interpretable policies across five diverse environments, cutting annotation costs by one to two orders of magnitude. Empirical results further demonstrate that VLMS serve as effective, low-cost annotation surrogates—matching human-level annotation quality in several tasks.

Technology Category

Application Category

📝 Abstract
Recent advances in reinforcement learning (RL) have predominantly leveraged neural network policies for decision-making, yet these models often lack interpretability, posing challenges for stakeholder comprehension and trust. Concept bottleneck models offer an interpretable alternative by integrating human-understandable concepts into policies. However, prior work assumes that concept annotations are readily available during training. For RL, this requirement poses a significant limitation: it necessitates continuous real-time concept annotation, which either places an impractical burden on human annotators or incurs substantial costs in API queries and inference time when employing automated labeling methods. To overcome this limitation, we introduce a novel training scheme that enables RL agents to efficiently learn a concept-based policy by only querying annotators to label a small set of data. Our algorithm, LICORICE, involves three main contributions: interleaving concept learning and RL training, using an ensemble to actively select informative data points for labeling, and decorrelating the concept data. We show how LICORICE reduces human labeling efforts to 500 or fewer concept labels in three environments, and 5000 or fewer in two more complex environments, all at no cost to performance. We also explore the use of VLMs as automated concept annotators, finding them effective in some cases but imperfect in others. Our work significantly reduces the annotation burden for interpretable RL, making it more practical for real-world applications that necessitate transparency.
Problem

Research questions and friction points this paper is trying to address.

Reduces need for continuous real-time concept annotation in RL.
Enables efficient learning with minimal human-labeled concept data.
Explores automated concept annotation using VLMs for interpretable RL.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interleaves concept learning and RL training
Uses ensemble for active data selection
Decorrelates concept data efficiently
🔎 Similar Papers
No similar papers found.
Z
Zhuorui Ye
Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
Stephanie Milani
Stephanie Milani
NYU
Reinforcement LearningMachine LearningArtificial IntelligenceHuman-AI Interaction
G
Geoffrey J. Gordon
Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213
F
Fei Fang
Software and Societal Systems Department, Carnegie Mellon University, Pittsburgh, PA 15213