Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the overemphasis on task performance—while neglecting interaction quality—in human-AI collaboration evaluation. We propose *interdependence*—the degree to which agents rely on each other’s actions to achieve a shared goal—as a core, quantifiable metric. Methodologically, we formalize interdependence as a computable symbolic logic measure; integrate it into multi-agent reinforcement learning (MARL) training of AI agents; and embed a learned human behavioral model within the Overcooked testbed for empirical validation. Our contributions are threefold: (1) establishing interdependence as a novel, rigorously quantifiable paradigm for evaluating human-AI collaboration; (2) demonstrating that task-level reward is not significantly correlated with actual collaborative behavior; and (3) empirically showing that current MARL-trained AI agents fail to elicit meaningful collaboration, with all human-AI teams exhibiting extremely low interdependence scores.

Technology Category

Application Category

📝 Abstract
The long-standing research challenges of Human-AI Teaming(HAT) and Zero-shot Cooperation(ZSC) have been tackled by applying multi-agent reinforcement learning(MARL) to train an agent by optimizing the environment reward function and evaluating their performance through task performance metrics such as task reward. However, such evaluation focuses only on task completion, while being agnostic to `how' the two agents work with each other. Specifically, we are interested in understanding the cooperation arising within the team when trained agents are paired with humans. To formally address this problem, we propose the concept of interdependence to measure how much agents rely on each other's actions to achieve the shared goal, as a key metric for evaluating cooperation in human-agent teams. Towards this, we ground this concept through a symbolic formalism and define evaluation metrics that allow us to assess the degree of reliance between the agents' actions. We pair state-of-the-art agents trained through MARL for HAT, with learned human models for the the popular Overcooked domain, and evaluate the team performance for these human-agent teams. Our results demonstrate that trained agents are not able to induce cooperative behavior, reporting very low levels of interdependence across all the teams. We also report that teaming performance of a team is not necessarily correlated with the task reward.
Problem

Research questions and friction points this paper is trying to address.

Assessing cooperation in Human-AI teams
Measuring interdependence between agents
Evaluating task vs. team performance correlation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent reinforcement learning
Interdependence measurement
Human-AI team evaluation
🔎 Similar Papers
No similar papers found.