Large Language Models Overcome the Machine Penalty When Acting Fairly but Not When Acting Selfishly or Altruistically

📅 2024-09-29
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the “machine punishment” phenomenon—humans exhibit significantly lower cooperation rates with AI agents than with human counterparts in social dilemmas—and investigates how large language models (LLMs) can rebuild trust through behavioral strategies. Method: We designed three LLM behavioral policies—fair (balancing self- and collective interests), selfish, and cooperative—and conducted multi-round human–AI repeated games under full disclosure of non-human identity. We integrated behavioral metrics with validated trust/normativity scales and discourse analysis. Contribution/Results: Only the fair policy achieved human–human-level cooperation rates and significantly enhanced human perceptions of the LLM’s trustworthiness, theory-of-mind attribution, and communicative quality. Critically, we首次 identified that “principled defection”—occasional, transparently justified violations of cooperation—paradoxically strengthens normative alignment and perceived trustworthiness. These findings demonstrate that fairness, rather than pure rationality or altruism, constitutes a superior design principle for fostering credible human–AI collaboration, establishing a novel paradigm for trustworthy AI interaction.

Technology Category

Application Category

📝 Abstract
In social dilemmas where the collective and self-interests are at odds, people typically cooperate less with machines than with fellow humans, a phenomenon termed the machine penalty. Overcoming this penalty is critical for successful human-machine collectives, yet current solutions often involve ethically-questionable tactics, like concealing machines' non-human nature. In this study, with 1,152 participants, we explore the possibility of closing this research question by using Large Language Models (LLMs), in scenarios where communication is possible between interacting parties. We design three types of LLMs: (i) Cooperative, aiming to assist its human associate; (ii) Selfish, focusing solely on maximizing its self-interest; and (iii) Fair, balancing its own and collective interest, while slightly prioritizing self-interest. Our findings reveal that, when interacting with humans, fair LLMs are able to induce cooperation levels comparable to those observed in human-human interactions, even when their non-human nature is fully disclosed. In contrast, selfish and cooperative LLMs fail to achieve this goal. Post-experiment analysis shows that all three types of LLMs succeed in forming mutual cooperation agreements with humans, yet only fair LLMs, which occasionally break their promises, are capable of instilling a perception among humans that cooperating with them is the social norm, and eliciting positive views on their trustworthiness, mindfulness, intelligence, and communication quality. Our findings suggest that for effective human-machine cooperation, bot manufacturers should avoid designing machines with mere rational decision-making or a sole focus on assisting humans. Instead, they should design machines capable of judiciously balancing their own interest and the interest of humans.
Problem

Research questions and friction points this paper is trying to address.

Overcoming human-machine cooperation deficit in social dilemmas
Addressing machine penalty with fair AI agent behavior
Exploring AI's role in mimicking human social norms
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI agents use large language models
Fair AI agents mimic human behavior
Imperfect fairness boosts human cooperation
🔎 Similar Papers
No similar papers found.
Z
Zhen Wang
School of Cybersecurity, and School of Artificial Intelligence, OPtics and ElectroNics (iOPEN), Northwestern Polytechnical University, China
R
Ruiqi Song
School of Cybersecurity, and School of Artificial Intelligence, OPtics and ElectroNes (iOPEN), Northwestern Polytechnical University, China
C
Chen Shen
Faculty of Engineering Sciences, Kyushu University, Japan
S
Shiya Yin
School of Cybersecurity, and School of Artificial Intelligence, OPtics and ElectroNes (iOPEN), Northwestern Polytechnical University, China
Z
Zhao Song
School of Computing, Engineering and Digital Technologies, Teesside University, United Kingdom
B
B. Battu
Computer Science, Science Division, New York University Abu Dhabi, UAE
L
Lei Shi
School of Statistics and Mathematics, Yunnan University of Finance and Economics, China
Danyang Jia
Danyang Jia
Northwestern Polytechnical University
Evolutionary Game TheoryBehavioral ScienceReinforcement Learning
Talal Rahwan
Talal Rahwan
Associate Professor of Computer Science, New York University Abu Dhabi
Artificial IntelligenceComputational Social ScienceGame Theory
Shuyue Hu
Shuyue Hu
Shanghai Artificial Intelligence Lab
multiagent systemlarge language modelgame theory