Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited interpretability and trustworthiness of reinforcement learning (RL) agents arising from behavioral divergence from human demonstrators, this paper proposes Macro Action Quantization (MAQ). MAQ models human demonstrations as sequences of macro-actions and jointly optimizes trajectory matching to expert behavior and reward maximization. It integrates vector-quantized variational autoencoders (VQ-VAEs) for action discretization with model-predictive control—specifically, receding-horizon control—without modifying the underlying RL algorithm, enabling plug-and-play deployment. Evaluated on the D4RL Adroit benchmark, MAQ achieves substantial improvements in trajectory similarity metrics. In human-subject evaluations, it attains the highest ranking for human-likeness among compared methods. Extensive experiments demonstrate MAQ’s generality, effectiveness, and compatibility across diverse RL algorithms—including on-policy and off-policy variants—while preserving task performance.

Technology Category

Application Category

📝 Abstract
Human-like agents have long been one of the goals in pursuing artificial intelligence. Although reinforcement learning (RL) has achieved superhuman performance in many domains, relatively little attention has been focused on designing human-like RL agents. As a result, many reward-driven RL agents often exhibit unnatural behaviors compared to humans, raising concerns for both interpretability and trustworthiness. To achieve human-like behavior in RL, this paper first formulates human-likeness as trajectory optimization, where the objective is to find an action sequence that closely aligns with human behavior while also maximizing rewards, and adapts the classic receding-horizon control to human-like learning as a tractable and efficient implementation. To achieve this, we introduce Macro Action Quantization (MAQ), a human-like RL framework that distills human demonstrations into macro actions via Vector-Quantized VAE. Experiments on D4RL Adroit benchmarks show that MAQ significantly improves human-likeness, increasing trajectory similarity scores, and achieving the highest human-likeness rankings among all RL agents in the human evaluation study. Our results also demonstrate that MAQ can be easily integrated into various off-the-shelf RL algorithms, opening a promising direction for learning human-like RL agents. Our code is available at https://rlg.iis.sinica.edu.tw/papers/MAQ.
Problem

Research questions and friction points this paper is trying to address.

Designing reinforcement learning agents that exhibit human-like behaviors
Improving trajectory similarity to human demonstrations while maximizing rewards
Addressing unnatural behaviors in reward-driven RL agents for better interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory optimization with receding-horizon control
Macro Action Quantization via Vector-Quantized VAE
Distilling human demonstrations into macro actions
🔎 Similar Papers
2024-07-16Neural Information Processing SystemsCitations: 16