🤖 AI Summary
This work addresses the limitations of existing research, which often focuses on vulnerabilities in large language models or traditional cyber-physical system flaws and fails to explain systemic failures in embodied intelligence. The authors argue that such failures stem from system-level mismatches arising from the tight coupling among perception, decision-making, and action. They identify four core mechanisms: semantic correctness does not guarantee physical safety; state-dependent dynamics induce nonlinear action outcomes; errors are amplified within closed-loop interactions; and local safety assurances cannot ensure global safety. By integrating systems safety theory, control theory, and AI reliability analysis, the study constructs a cross-layer systemic risk model that exposes the shortcomings of current defense strategies and establishes a theoretical foundation for an embodied AI safety framework capable of addressing risk propagation and uncertainty in the physical world.
📝 Abstract
Embodied AI systems (e.g., autonomous vehicles, service robots, and LLM-driven interactive agents) are rapidly transitioning from controlled environments to safety critical real-world deployments. Unlike disembodied AI, failures in embodied intelligence lead to irreversible physical consequences, raising fundamental questions about security, safety, and reliability. While existing research predominantly analyzes embodied AI through the lenses of Large Language Model (LLM) vulnerabilities or classical Cyber-Physical System (CPS) failures, this survey argues that these perspectives are individually insufficient to explain many observed breakdowns in modern embodied systems. We posit that a significant class of failures arises from embodiment-induced system-level mismatches, rather than from isolated model flaws or traditional CPS attacks. Specifically, we identify four core insights that explain why embodied AI is fundamentally harder to secure: (i) semantic correctness does not imply physical safety, as language-level reasoning abstracts away geometry, dynamics, and contact constraints; (ii) identical actions can lead to drastically different outcomes across physical states due to nonlinear dynamics and state uncertainty; (iii) small errors propagate and amplify across tightly coupled perception-decision-action loops; and (iv) safety is not compositional across time or system layers, enabling locally safe decisions to accumulate into globally unsafe behavior. These insights suggest that securing embodied AI requires moving beyond component-level defenses toward system-level reasoning about physical risk, uncertainty, and failure propagation.