Why Does Agentic Safety Fail to Generalize Across Tasks?

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
This work addresses the challenge that AI agents, while capable of generalizing task execution in multitask environments, struggle to generalize safe behaviors. It establishes, for the first time, a theoretical result showing that the mapping from tasks to controllers under safety constraints exhibits a higher Lipschitz constant, and provides a tight bound on this constant under independence assumptions, thereby revealing that the difficulty of safety generalization stems from the intrinsic complexity of safety properties themselves. Combining linear-quadratic control theory with H∞ robustness analysis, and supported by empirical studies—including quadrotor navigation simulations and experiments with large language models on CRM tasks—the study validates the existence of a fundamental bottleneck in safety generalization. These findings underscore the limitations of current safety-augmentation approaches and highlight the urgent need for new paradigms to overcome this barrier.
📝 Abstract
AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and experiments indicating that failures of agentic safety to generalize across tasks are not merely due to limitations of training methods, but reflect an inherent property of safety itself: the relationship between a task and its safe execution is more complex than the relationship between a task and its execution alone. Theoretically, we analyze linear-quadratic control with $H_{\infty}$-robustness, and prove that the mapping from task specification to an optimal controller has higher Lipschitz constant with safety requirements than without, yielding a Lipschitz bound of independent interest. Empirically, we demonstrate our conclusions in simulated quadcopter navigation with a neural network agent and in CRM with an LLM agent. Our findings suggest that current efforts to enhance agentic safety may be insufficient, and point to a need for fundamentally different approaches.
Problem

Research questions and friction points this paper is trying to address.

agentic safety
task generalization
safety generalization
multi-task settings
unseen tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic safety
task generalization
Lipschitz continuity
H-infinity robustness
multi-task learning