Rule-Bottleneck Reinforcement Learning: Joint Explanation and Decision Optimization for Resource Allocation with Language Agents

📅 2025-02-15

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Addressing the challenge of balancing decision-making performance and interpretability in resource allocation, this paper proposes a collaborative framework integrating deep reinforcement learning (DRL) and large language models (LLMs). Methodologically, it employs an attention-based DRL policy, chain-of-thought reasoning, rule generation and filtering, and multi-objective reward modeling. Key contributions include: (1) the novel “rule bottleneck” mechanism, which constrains the DRL action space to interpretable, LLM-generated candidate rules; and (2) an LLM-driven interpretability reward that enables end-to-end joint optimization of actions and natural-language explanations. Evaluated on real-world resource allocation tasks, the approach matches the performance of state-of-the-art DRL baselines while substantially outperforming fine-tuned LLM methods. User studies confirm significant improvements in explanation quality, and inference efficiency increases by 3.2× compared to baseline approaches.

Technology Category

Application Category

📝 Abstract

Deep Reinforcement Learning (RL) is remarkably effective in addressing sequential resource allocation problems in domains such as healthcare, public policy, and resource management. However, deep RL policies often lack transparency and adaptability, challenging their deployment alongside human decision-makers. In contrast, Language Agents, powered by large language models (LLMs), provide human-understandable reasoning but may struggle with effective decision making. To bridge this gap, we propose Rule-Bottleneck Reinforcement Learning (RBRL), a novel framework that jointly optimizes decision and explanations. At each step, RBRL generates candidate rules with an LLM, selects among them using an attention-based RL policy, and determines the environment action with an explanation via chain-of-thought reasoning. The RL rule selection is optimized using the environment rewards and an explainability metric judged by the LLM. Evaluations in real-world scenarios highlight RBRL's competitive performance with deep RL and efficiency gains over LLM fine-tuning. A survey further confirms the enhanced quality of its explanations.

Problem

Research questions and friction points this paper is trying to address.

Bridges transparency gap in deep RL

Integrates decision and explanation optimization

Enhances resource allocation with language agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-Bottleneck Reinforcement Learning

LLM-generated candidate rules

Attention-based RL policy selection

🔎 Similar Papers

Policy Learning with a Language Bottleneck