Can you see how I learn? Human observers' inferences about Reinforcement Learning agents' learning processes

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing explainable reinforcement learning (XRL) research lacks empirical human-centered foundations for understanding how humans observe and infer RL agents’ learning processes—a capability critical for human-AI collaborative teaching. Method: We conducted a two-stage behavioral experiment involving navigation and manipulation tasks, contrasting tabular and function-approximation RL algorithms. Integrating thematic analysis, qualitative interviews, and quantitative response coding, we established the first observation-driven human reasoning assessment paradigm. Contribution/Results: Validated on 816 human responses with high inter-rater reliability, our paradigm yields a four-dimensional dynamic explanation framework—encompassing goals, knowledge, decision-making, and learning mechanisms—that captures their temporal evolution and interdependencies. This work fills a critical empirical gap in human factors within XRL, providing both theoretical grounding and design principles for interpretable RL systems and human-AI collaborative teaching.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) agents often exhibit learning behaviors that are not intuitively interpretable by human observers, which can result in suboptimal feedback in collaborative teaching settings. Yet, how humans perceive and interpret RL agent's learning behavior is largely unknown. In a bottom-up approach with two experiments, this work provides a data-driven understanding of the factors of human observers' understanding of the agent's learning process. A novel, observation-based paradigm to directly assess human inferences about agent learning was developed. In an exploratory interview study ( extit{N}=9), we identify four core themes in human interpretations: Agent Goals, Knowledge, Decision Making, and Learning Mechanisms. A second confirmatory study ( extit{N}=34) applied an expanded version of the paradigm across two tasks (navigation/manipulation) and two RL algorithms (tabular/function approximation). Analyses of 816 responses confirmed the reliability of the paradigm and refined the thematic framework, revealing how these themes evolve over time and interrelate. Our findings provide a human-centered understanding of how people make sense of agent learning, offering actionable insights for designing interpretable RL systems and improving transparency in Human-Robot Interaction.

Problem

Research questions and friction points this paper is trying to address.

How humans perceive RL agents' learning behaviors

Factors influencing human understanding of agent learning

Designing interpretable RL systems for human interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Observation-based paradigm for human inferences

Exploratory and confirmatory studies design

Human-centered interpretable RL systems

🔎 Similar Papers

Revealing the learning process in reinforcement learning agents through attention-oriented metrics