🤖 AI Summary
This work addresses the deployment of embodied foundation models on edge devices, which is constrained by eight interrelated factors: model size, power consumption, memory bandwidth, computational latency, timing jitter, safety margins, and others. To tackle this challenge, the paper introduces the “Deployment Gauntlet” framework, which systematically structures these constraints as eight coupled system-level barriers. It further uncovers a key distinction between autoregressive and diffusion models: the former are primarily bottlenecked by memory bandwidth, while the latter suffer from computational latency and sustained execution overhead. By co-designing model architecture decomposition, memory optimization, real-time scheduling, and communication mechanisms—and by separating fast control loops from slower semantic reasoning pathways—the proposed approach delivers an efficient and reliable deployment solution for edge-based embodied intelligence.
📝 Abstract
Deploying foundation models in embodied edge systems is fundamentally a systems problem, not just a problem of model compression. Real-time control must operate within strict size, weight, and power constraints, where memory traffic, compute latency, timing variability, and safety margins interact directly. The Deployment Gauntlet organizes these constraints into eight coupled barriers that determine whether embodied foundation models can run reliably in practice. Across representative edge workloads, autoregressive Vision-Language-Action policies are constrained primarily by memory bandwidth, whereas diffusion-based controllers are limited more by compute latency and sustained execution cost. Reliable deployment therefore depends on system-level co-design across memory, scheduling, communication, and model architecture, including decompositions that separate fast control from slower semantic reasoning.