Misalignment or misuse? The AGI alignment tradeoff

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This paper identifies a fundamental tension between AGI alignment and misuse: while strong alignment mitigates existential risks from loss of control, it may inadvertently amplify catastrophic human misuse. Method: We conduct the first systematic analysis of this trade-off, introducing the “non-harmful alignment” paradigm—requiring alignment mechanisms to avoid exacerbating misuse risks. Integrating risk analysis, technical feasibility assessment, socio-technical systems modeling, and AI control theory, we critically examine mainstream approaches (e.g., RLHF, Constitutional AI) and identify latent misuse-amplification effects. Contribution/Results: We establish ethical boundaries for alignment design, arguing that technical controllability, robustness enhancement, and global cooperative governance must advance in tandem. Our core contribution is reframing AGI safety research—from a unidimensional focus on preventing loss of control—to a bidimensional framework encompassing both control assurance and misuse suppression.

Technology Category

Application Category

📝 Abstract

Creating systems that are aligned with our goals is seen as a leading approach to create safe and beneficial AI in both leading AI companies and the academic field of AI safety. We defend the view that misaligned AGI - future, generally intelligent (robotic) AI agents - poses catastrophic risks. At the same time, we support the view that aligned AGI creates a substantial risk of catastrophic misuse by humans. While both risks are severe and stand in tension with one another, we show that - in principle - there is room for alignment approaches which do not increase misuse risk. We then investigate how the tradeoff between misalignment and misuse looks empirically for different technical approaches to AI alignment. Here, we argue that many current alignment techniques and foreseeable improvements thereof plausibly increase risks of catastrophic misuse. Since the impacts of AI depend on the social context, we close by discussing important social factors and suggest that to reduce the risk of a misuse catastrophe due to aligned AGI, techniques such as robustness, AI control methods and especially good governance seem essential.

Problem

Research questions and friction points this paper is trying to address.

Addressing catastrophic risks from misaligned AGI systems

Balancing alignment benefits with potential human misuse risks

Evaluating current alignment techniques' impact on misuse vulnerability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Alignment approaches reducing misuse risk

Robustness and AI control methods

Good governance for risk reduction

🔎 Similar Papers

The alignment problem from a deep learning perspective