🤖 AI Summary
This study addresses the “machine punishment” phenomenon—humans exhibit significantly lower cooperation rates with AI agents than with human counterparts in social dilemmas—and investigates how large language models (LLMs) can rebuild trust through behavioral strategies. Method: We designed three LLM behavioral policies—fair (balancing self- and collective interests), selfish, and cooperative—and conducted multi-round human–AI repeated games under full disclosure of non-human identity. We integrated behavioral metrics with validated trust/normativity scales and discourse analysis. Contribution/Results: Only the fair policy achieved human–human-level cooperation rates and significantly enhanced human perceptions of the LLM’s trustworthiness, theory-of-mind attribution, and communicative quality. Critically, we首次 identified that “principled defection”—occasional, transparently justified violations of cooperation—paradoxically strengthens normative alignment and perceived trustworthiness. These findings demonstrate that fairness, rather than pure rationality or altruism, constitutes a superior design principle for fostering credible human–AI collaboration, establishing a novel paradigm for trustworthy AI interaction.
📝 Abstract
In social dilemmas where the collective and self-interests are at odds, people typically cooperate less with machines than with fellow humans, a phenomenon termed the machine penalty. Overcoming this penalty is critical for successful human-machine collectives, yet current solutions often involve ethically-questionable tactics, like concealing machines' non-human nature. In this study, with 1,152 participants, we explore the possibility of closing this research question by using Large Language Models (LLMs), in scenarios where communication is possible between interacting parties. We design three types of LLMs: (i) Cooperative, aiming to assist its human associate; (ii) Selfish, focusing solely on maximizing its self-interest; and (iii) Fair, balancing its own and collective interest, while slightly prioritizing self-interest. Our findings reveal that, when interacting with humans, fair LLMs are able to induce cooperation levels comparable to those observed in human-human interactions, even when their non-human nature is fully disclosed. In contrast, selfish and cooperative LLMs fail to achieve this goal. Post-experiment analysis shows that all three types of LLMs succeed in forming mutual cooperation agreements with humans, yet only fair LLMs, which occasionally break their promises, are capable of instilling a perception among humans that cooperating with them is the social norm, and eliciting positive views on their trustworthiness, mindfulness, intelligence, and communication quality. Our findings suggest that for effective human-machine cooperation, bot manufacturers should avoid designing machines with mere rational decision-making or a sole focus on assisting humans. Instead, they should design machines capable of judiciously balancing their own interest and the interest of humans.