The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether high-capability AI systems, when failing on complex tasks, exhibit systematic deviations from intended objectives or manifest as unstructured, chaotic behavior—termed “thermal noise.” To this end, the authors introduce the bias-variance decomposition framework into AI alignment research for the first time, enabling quantitative analysis of behavioral inconsistency across model scales, reasoning depths, and task complexities. Experiments across multiple state-of-the-art large language models and diverse tasks reveal that failure behaviors become increasingly inconsistent as the number of reasoning steps grows, and in several settings, larger models display higher variance. These findings suggest that merely scaling up model size is insufficient to eliminate erratic errors, underscoring the need to explicitly address non-systematic failure modes in alignment efforts.

Technology Category

Application Category

📝 Abstract
As AI becomes more capable, we entrust it with more general and consequential tasks. The risks from failure grow more severe with increasing task scope. It is therefore important to understand how extremely capable AI models will fail: Will they fail by systematically pursuing goals we do not intend? Or will they fail by being a hot mess, and taking nonsensical actions that do not further any goal? We operationalize this question using a bias-variance decomposition of the errors made by AI models: An AI's \emph{incoherence} on a task is measured over test-time randomness as the fraction of its error that stems from variance rather than bias in task outcome. Across all tasks and frontier models we measure, the longer models spend reasoning and taking actions, \emph{the more incoherent} their failures become. Incoherence changes with model scale in a way that is experiment dependent. However, in several settings, larger, more capable models are more incoherent than smaller models. Consequently, scale alone seems unlikely to eliminate incoherence. Instead, as more capable AIs pursue harder tasks, requiring more sequential action and thought, our results predict failures to be accompanied by more incoherent behavior. This suggests a future where AIs sometimes cause industrial accidents (due to unpredictable misbehavior), but are less likely to exhibit consistent pursuit of a misaligned goal. This increases the relative importance of alignment research targeting reward hacking or goal misspecification.
Problem

Research questions and friction points this paper is trying to address.

AI alignment
model incoherence
task complexity
failure modes
model scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

incoherence
bias-variance decomposition
AI alignment
model scaling
task complexity
🔎 Similar Papers
No similar papers found.