🤖 AI Summary
Embodied agents operating in real-world environments face severe image compression challenges due to communication constraints and strict real-time requirements, necessitating a paradigm shift from generic perceptual fidelity to task-driven visual representation.
Method: This work introduces “embodied image compression”—a novel paradigm prioritizing operational effectiveness of visual information under ultra-low bitrates for closed-loop manipulation. We propose EmbodiedComp, the first benchmark explicitly designed for embodied tasks, featuring a compression-execution co-evaluation framework and operation-oriented semantic fidelity metrics. Experiments span both simulation and real-world deployment.
Contribution/Results: We identify a critical embodied bitrate threshold below which state-of-the-art vision-language-action (VLA) models fail catastrophically—even on basic manipulation tasks. Our findings demonstrate that conventional compression objectives are misaligned with embodied control requirements. This work establishes foundational theory and empirical benchmarks for task-aware compression, enabling scalable multi-agent embodied systems.
📝 Abstract
Image Compression for Machines (ICM) has emerged as a pivotal research direction in the field of visual data compression. However, with the rapid evolution of machine intelligence, the target of compression has shifted from task-specific virtual models to Embodied agents operating in real-world environments. To address the communication constraints of Embodied AI in multi-agent systems and ensure real-time task execution, this paper introduces, for the first time, the scientific problem of Embodied Image Compression. We establish a standardized benchmark, EmbodiedComp, to facilitate systematic evaluation under ultra-low bitrate conditions in a closed-loop setting. Through extensive empirical studies in both simulated and real-world settings, we demonstrate that existing Vision-Language-Action models (VLAs) fail to reliably perform even simple manipulation tasks when compressed below the Embodied bitrate threshold. We anticipate that EmbodiedComp will catalyze the development of domain-specific compression tailored for Embodied agents , thereby accelerating the Embodied AI deployment in the Real-world.