🤖 AI Summary
This work addresses critical security challenges in enterprise agent deployment—namely, insufficient observability, weak generalization of static defenses, and high costs of large-model-based detection—by proposing the first end-to-end security framework that spans the entire agent reasoning chain. Built upon the Model Context Protocol, the framework enables high-fidelity telemetry collection and systematic red-teaming to generate hard adversarial samples, and introduces a two-stage online detection mechanism: an initial rapid screening followed by context-aware reasoning for high-precision identification. Deployed in Uber’s production environment for over ten months, the system processes more than 10,000 sessions daily, has identified 206 credential leakage incidents across 26 categories with 97.2% precision, and significantly outperforms existing methods on the ADR-Bench and AgentDojo benchmarks.
📝 Abstract
We present the Agentic AI Detection and Response (ADR) system, the first large-scale, production-proven enterprise framework for securing AI agents operating through the Model Context Protocol (MCP). We identify three persistent challenges in this domain: (1) limited observability -- existing Endpoint Detection and Response (EDR) tools see file writes but not the agent reasoning, prompts, or causal chains linking intent to execution; (2) insufficient robustness -- static defenses constrained by pre-defined rules fail to generalize across diverse attack techniques and enterprise contexts; and (3) high detection costs -- LLM-based inference is prohibitively expensive at scale. ADR addresses these challenges via three components: the ADR Sensor for high-fidelity agentic telemetry, the ADR Explorer for systematic pre-deployment red teaming and hard-example generation, and the ADR Detector for scalable, two-tier online detection combining fast triage with context-aware reasoning. Deployed at Uber for over ten months, ADR has sustained reliable detection in production with growing adoption reaching over 7,200 unique hosts and processing over 10,000 agent sessions daily, uncovering hundreds of credential exposures across 26 categories and enabling a shift-left prevention layer (97.2% precision, 206 detected credentials). To validate the approach and enable community adoption, we introduce ADR-Bench (302 tasks, 17 techniques, 133 MCP servers), where ADR achieves zero false positives while detecting 67% of attacks -- outperforming three state-of-the-art baselines (ALRPHFS, GuardAgent, LlamaFirewall) by 2--4x in F1-score. On AgentDojo (public prompt injection benchmark), ADR detects all attacks with only three false alarms out of 93 tasks.