Causal Foundations of Collective Agency

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

Determining whether multiple simple agents inadvertently form a collective agent with emergent goals and capabilities poses a critical challenge for the safety of advanced AI systems. This work proposes a formal, behavior-based framework: a group is deemed a collective agent when its joint behavior can be effectively predicted by a rational, goal-directed model. Innovatively integrating causal game theory and causal abstraction, the paper establishes—for the first time—a fidelity-preserving mapping between high- and low-level models, providing a rigorous foundation for identifying and quantifying collective agency. The approach successfully resolves the multi-agent incentive paradox in Actor-Critic architectures and enables quantitative assessment of the degree of collective intelligence under diverse voting mechanisms.

📝 Abstract

A key challenge for the safety of advanced AI systems is the possibility that multiple simpler agents might inadvertently form a collective agent with capabilities and goals distinct from those of any individual. More generally, determining when a group of agents can be viewed as a unified collective agent is a foundational question in the study of interactions and incentives in both biological and artificial systems. We adopt a behavioral perspective in answering this question, ascribing collective agency to a group when viewing the group's joint actions as rational and goal-directed successfully predicts its behavior. We formalize this perspective on collective agency using causal games -- which are causal models of strategic, multi-agent interactions -- and causal abstraction -- which formalizes when a simple, high-level model faithfully captures a more complex, low-level model. We use this framework to solve a puzzle regarding multi-agent incentives in actor-critic models and to make quantitative assessments of the degree of collective agency exhibited by different voting mechanisms. Our framework aims to provide a foundation for theoretical and empirical work to understand, predict, and control emergent collective agents in multi-agent AI systems.

Problem

Research questions and friction points this paper is trying to address.

collective agency

multi-agent systems

emergent behavior

AI safety

causal modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

causal games

causal abstraction

collective agency

multi-agent systems

emergent behavior

🔎 Similar Papers

Evidence and quantification of cooperation of driving agents in mixed traffic flow

2024-07-31Citations: 0