Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This paper addresses the overemphasis on task performance—while neglecting interaction quality—in human-AI collaboration evaluation. We propose *interdependence*—the degree to which agents rely on each other’s actions to achieve a shared goal—as a core, quantifiable metric. Methodologically, we formalize interdependence as a computable symbolic logic measure; integrate it into multi-agent reinforcement learning (MARL) training of AI agents; and embed a learned human behavioral model within the Overcooked testbed for empirical validation. Our contributions are threefold: (1) establishing interdependence as a novel, rigorously quantifiable paradigm for evaluating human-AI collaboration; (2) demonstrating that task-level reward is not significantly correlated with actual collaborative behavior; and (3) empirically showing that current MARL-trained AI agents fail to elicit meaningful collaboration, with all human-AI teams exhibiting extremely low interdependence scores.

Technology Category

Application Category

📝 Abstract

The long-standing research challenges of Human-AI Teaming(HAT) and Zero-shot Cooperation(ZSC) have been tackled by applying multi-agent reinforcement learning(MARL) to train an agent by optimizing the environment reward function and evaluating their performance through task performance metrics such as task reward. However, such evaluation focuses only on task completion, while being agnostic to `how' the two agents work with each other. Specifically, we are interested in understanding the cooperation arising within the team when trained agents are paired with humans. To formally address this problem, we propose the concept of interdependence to measure how much agents rely on each other's actions to achieve the shared goal, as a key metric for evaluating cooperation in human-agent teams. Towards this, we ground this concept through a symbolic formalism and define evaluation metrics that allow us to assess the degree of reliance between the agents' actions. We pair state-of-the-art agents trained through MARL for HAT, with learned human models for the the popular Overcooked domain, and evaluate the team performance for these human-agent teams. Our results demonstrate that trained agents are not able to induce cooperative behavior, reporting very low levels of interdependence across all the teams. We also report that teaming performance of a team is not necessarily correlated with the task reward.

Problem

Research questions and friction points this paper is trying to address.

Assessing cooperation in Human-AI teams

Measuring interdependence between agents

Evaluating task vs. team performance correlation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent reinforcement learning

Interdependence measurement

Human-AI team evaluation

🔎 Similar Papers

Human Delegation Behavior in Human-AI Collaboration: The Effect of Contextual Information

2024-01-09Citations: 2

Anthropic

$500,000—$850,000 USD

San Francisco, CA, USA

AI Research Scientist - FAIR Social Intelligence