🤖 AI Summary
A critical gap exists between the rapid advancement of AI capabilities and the lagging development of safety mechanisms: prevailing “making AI safe” paradigms rely on reactive alignment and external safeguards—rendering them fragile and passive—while “building safe AI” approaches prioritize intrinsic safety but lack robustness against open-world, unknown threats. Method: We propose the novel “co-evolutionary safety” paradigm, the first to systematically integrate principles from biological immunity into AI safety, yielding the R²AI framework—designed for both resistance and resilience. It features a dual-speed (fast/slow) safety model, formal-verification-driven safety wind tunnels, and a dynamic testing mechanism unifying adversarial simulation with continual learning. Contribution/Results: R²AI enables concurrent, synergistic evolution of safety and capability, scalably addressing both near-term vulnerabilities and long-term existential risks in AGI development, thereby delivering systematic, long-horizon safety assurance in dynamic environments.
📝 Abstract
In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle and reactive, and ``Make Safe AI'', which emphasizes intrinsic safety but struggles to address unforeseen risks in open-ended environments. We therefore propose extit{safe-by-coevolution} as a new formulation of the ``Make Safe AI'' paradigm, inspired by biological immunity, in which safety becomes a dynamic, adversarial, and ongoing learning process. To operationalize this vision, we introduce exttt{R$^2$AI} -- extit{Resistant and Resilient AI} -- as a practical framework that unites resistance against known threats with resilience to unforeseen risks. exttt{R$^2$AI} integrates extit{fast and slow safe models}, adversarial simulation and verification through a extit{safety wind tunnel}, and continual feedback loops that guide safety and capability to coevolve. We argue that this framework offers a scalable and proactive path to maintain continual safety in dynamic environments, addressing both near-term vulnerabilities and long-term existential risks as AI advances toward AGI and ASI.