Utility-Learning Tension in Self-Modifying Agents

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the fundamental tension—termed the “utility–learning trade-off”—between utility-driven self-improvement and reliable learning in self-modifying agents. To formalize this tension, we propose a five-axis decomposition and decision-layer isolation framework, and establish that uniform boundedness of the capacity of the policy-reachable model class is both necessary and sufficient for preserving learnability under distribution-free assumptions. Our theoretical analysis demonstrates that unconstrained growth in model capacity inevitably breaks learnability. Building on this insight, we design a dual-gated policy that simultaneously optimizes utility while strictly constraining the learning space. Numerical experiments confirm that the proposed approach achieves effective self-improvement without sacrificing generalization performance. The core contributions are: (i) a rigorous theoretical characterization of the utility–learning trade-off; and (ii) a formally grounded, engineering-practical regulatory mechanism for balancing utility optimization and learning reliability in self-modifying systems.

Technology Category

Application Category

📝 Abstract
As systems trend toward superintelligence, a natural modeling premise is that agents can self-improve along every facet of their own design. We formalize this with a five-axis decomposition and a decision layer, separating incentives from learning behavior and analyzing axes in isolation. Our central result identifies and introduces a sharp utility--learning tension, the structural conflict in self-modifying systems whereby utility-driven changes that improve immediate or expected performance can also erode the statistical preconditions for reliable learning and generalization. Our findings show that distribution-free guarantees are preserved iff the policy-reachable model family is uniformly capacity-bounded; when capacity can grow without limit, utility-rational self-changes can render learnable tasks unlearnable. Under standard assumptions common in practice, these axes reduce to the same capacity criterion, yielding a single boundary for safe self-modification. Numerical experiments across several axes validate the theory by comparing destructive utility policies against our proposed two-gate policies that preserve learnability.
Problem

Research questions and friction points this paper is trying to address.

Self-modifying agents face utility-learning tension in improvements
Utility-driven changes can erode statistical preconditions for learning
Unbounded capacity growth may render learnable tasks unlearnable
Innovation

Methods, ideas, or system contributions that make the work stand out.

Five-axis decomposition separates incentives from learning behavior
Utility-learning tension arises from performance vs learnability conflict
Two-gate policies preserve learnability while allowing self-modification