🤖 AI Summary
This paper addresses the challenge of achieving efficient human-AI alignment in multi-objective, pluralistic value environments. We propose the Resource-Rational Contractualism (RRC) framework—a novel integration of contractualist ethics and resource-rational cognitive modeling—that employs cognitively inspired heuristics to rapidly approximate intersubjectively acceptable normative agreements under bounded computational resources. Unlike conventional approaches, RRC substantially reduces the computational and coordination overhead required for large-scale social consensus formation and enables dynamic adaptation of AI systems to heterogeneous value landscapes. Technically, RRC unifies normative modeling, computational game theory, bounded-rational decision-making, and social preference inference, jointly optimizing normative robustness and computational feasibility. Empirical evaluation demonstrates significant improvements in interpretability, cross-group adaptability, and collaborative reliability.
📝 Abstract
AI systems will soon have to navigate human environments and make decisions that affect people and other AI agents whose goals and values diverge. Contractualist alignment proposes grounding those decisions in agreements that diverse stakeholders would endorse under the right conditions, yet securing such agreement at scale remains costly and slow -- even for advanced AI. We therefore propose Resource-Rational Contractualism (RRC): a framework where AI systems approximate the agreements rational parties would form by drawing on a toolbox of normatively-grounded, cognitively-inspired heuristics that trade effort for accuracy. An RRC-aligned agent would not only operate efficiently, but also be equipped to dynamically adapt to and interpret the ever-changing human social world.