Large Language Models Overcome the Machine Penalty When Acting Fairly but Not When Acting Selfishly or Altruistically

📅 2024-09-29

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This study addresses the “machine punishment” phenomenon—humans exhibit significantly lower cooperation rates with AI agents than with human counterparts in social dilemmas—and investigates how large language models (LLMs) can rebuild trust through behavioral strategies. Method: We designed three LLM behavioral policies—fair (balancing self- and collective interests), selfish, and cooperative—and conducted multi-round human–AI repeated games under full disclosure of non-human identity. We integrated behavioral metrics with validated trust/normativity scales and discourse analysis. Contribution/Results: Only the fair policy achieved human–human-level cooperation rates and significantly enhanced human perceptions of the LLM’s trustworthiness, theory-of-mind attribution, and communicative quality. Critically, we首次 identified that “principled defection”—occasional, transparently justified violations of cooperation—paradoxically strengthens normative alignment and perceived trustworthiness. These findings demonstrate that fairness, rather than pure rationality or altruism, constitutes a superior design principle for fostering credible human–AI collaboration, establishing a novel paradigm for trustworthy AI interaction.

Technology Category

Application Category

📝 Abstract

In social dilemmas where the collective and self-interests are at odds, people typically cooperate less with machines than with fellow humans, a phenomenon termed the machine penalty. Overcoming this penalty is critical for successful human-machine collectives, yet current solutions often involve ethically-questionable tactics, like concealing machines' non-human nature. In this study, with 1,152 participants, we explore the possibility of closing this research question by using Large Language Models (LLMs), in scenarios where communication is possible between interacting parties. We design three types of LLMs: (i) Cooperative, aiming to assist its human associate; (ii) Selfish, focusing solely on maximizing its self-interest; and (iii) Fair, balancing its own and collective interest, while slightly prioritizing self-interest. Our findings reveal that, when interacting with humans, fair LLMs are able to induce cooperation levels comparable to those observed in human-human interactions, even when their non-human nature is fully disclosed. In contrast, selfish and cooperative LLMs fail to achieve this goal. Post-experiment analysis shows that all three types of LLMs succeed in forming mutual cooperation agreements with humans, yet only fair LLMs, which occasionally break their promises, are capable of instilling a perception among humans that cooperating with them is the social norm, and eliciting positive views on their trustworthiness, mindfulness, intelligence, and communication quality. Our findings suggest that for effective human-machine cooperation, bot manufacturers should avoid designing machines with mere rational decision-making or a sole focus on assisting humans. Instead, they should design machines capable of judiciously balancing their own interest and the interest of humans.

Problem

Research questions and friction points this paper is trying to address.

Overcoming human-machine cooperation deficit in social dilemmas

Addressing machine penalty with fair AI agent behavior

Exploring AI's role in mimicking human social norms

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agents use large language models

Fair AI agents mimic human behavior

Imperfect fairness boosts human cooperation

🔎 Similar Papers

Language Model Alignment in Multilingual Trolley Problems