π€ AI Summary
This work investigates how to strategically guide and enhance human prosocial behavior in repeated humanβagent interactions. Modeling prosociality as a latent state that evolves over time, the study treats it for the first time as a dynamic variable amenable to active shaping. A belief-driven decision framework based on a partially observable Markov decision process (POMDP) is developed to jointly optimize task efficiency and social objectives. The system learns the latent state transition and observation models via an expectation-maximization algorithm and generates optimal interaction policies accordingly. User studies demonstrate that the proposed strategy significantly outperforms baseline approaches, effectively fostering human cooperation while simultaneously improving team performance.
π Abstract
We propose a decision-theoretic framework in which a robot strategically can shape inferred human's prosocial state during repeated interactions. Modeling the human's prosociality as a latent state that evolves over time, the robot learns to infer and influence this state through its own actions, including helping and signaling. We formalize this as a latent-state POMDP with limited observations and learn the transition and observation dynamics using expectation maximization. The resulting belief-based policy balances task and social objectives, selecting actions that maximize long-term cooperative outcomes. We evaluate the model using data from user studies and show that the learned policy outperforms baseline strategies in both team performance and increasing observed human cooperative behavior.