Possible principles for aligned structure learning agents

📅 2024-09-30
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
How can AI autonomously construct a human-preference-embedded world model via structural learning—specifically, causal representation learning and model discovery? Method: We propose principled structural learning grounded in core-knowledge priors, information geometry, and model reduction. We formalize Asimov’s Three Laws of Robotics as a “minimize adverse states of others” alignment paradigm—the first such rigorous formulation—and develop a refined alignment theoretical framework. Integrating causal representation learning, cognitive modeling, and theoretical psychology, we construct a mathematical prototype of an aligned agent endowed with theory-of-mind capabilities. Contribution/Results: We establish a novel alignment pathway synergistically driven by causal representation learning and theory of mind, providing a foundational architecture for scalable AI that balances theoretical rigor with practical implementability. This framework advances alignment beyond reward modeling toward autonomous, interpretable, and preference-embedded world-model construction.

Technology Category

Application Category

📝 Abstract
This paper offers a roadmap for the development of scalable aligned artificial intelligence (AI) from first principle descriptions of natural intelligence. In brief, a possible path toward scalable aligned AI rests upon enabling artificial agents to learn a good model of the world that includes a good model of our preferences. For this, the main objective is creating agents that learn to represent the world and other agents' world models; a problem that falls under structure learning (a.k.a. causal representation learning). We expose the structure learning and alignment problems with this goal in mind, as well as principles to guide us forward, synthesizing various ideas across mathematics, statistics, and cognitive science. 1) We discuss the essential role of core knowledge, information geometry and model reduction in structure learning, and suggest core structural modules to learn a wide range of naturalistic worlds. 2) We outline a way toward aligned agents through structure learning and theory of mind. As an illustrative example, we mathematically sketch Asimov's Laws of Robotics, which prescribe agents to act cautiously to minimize the ill-being of other agents. We supplement this example by proposing refined approaches to alignment. These observations may guide the development of artificial intelligence in helping to scale existing -- or design new -- aligned structure learning systems.
Problem

Research questions and friction points this paper is trying to address.

Develop scalable aligned AI via natural intelligence principles
Enable agents to learn world and preference models
Guide alignment through structure learning and theory of mind
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structure learning for scalable aligned AI
Core knowledge and information geometry
Alignment via theory of mind
🔎 Similar Papers
No similar papers found.