🤖 AI Summary
To address the performance-efficiency trade-off in autonomous driving cooperative perception—stemming from MB-scale communication overhead and network constraints—this paper proposes a lightweight communication framework grounded in an extended information bottleneck theory, achieving KB-scale cross-vehicle perceptual information sharing for the first time. Our method introduces three core innovations: (1) an information purification paradigm that suppresses redundant semantic noise; (2) information-aware encoding that jointly optimizes representation discriminability and compression ratio; and (3) zero-cost sparse mask generation with mask-guided multi-scale decoding, enabling structured sparsity without additional computational overhead. Evaluated on multiple benchmark datasets, our approach achieves near-lossless detection accuracy compared to full-feature transmission, at merely 1.2–2.8 KB per frame—reducing communication volume by 440× and 90× relative to Where2comm and ERMVP, respectively.
📝 Abstract
Precise environmental perception is critical for the reliability of autonomous driving systems. While collaborative perception mitigates the limitations of single-agent perception through information sharing, it encounters a fundamental communication-performance trade-off. Existing communication-efficient approaches typically assume MB-level data transmission per collaboration, which may fail due to practical network constraints. To address these issues, we propose InfoCom, an information-aware framework establishing the pioneering theoretical foundation for communication-efficient collaborative perception via extended Information Bottleneck principles. Departing from mainstream feature manipulation, InfoCom introduces a novel information purification paradigm that theoretically optimizes the extraction of minimal sufficient task-critical information under Information Bottleneck constraints. Its core innovations include: i) An Information-Aware Encoding condensing features into minimal messages while preserving perception-relevant information; ii) A Sparse Mask Generation identifying spatial cues with negligible communication cost; and iii) A Multi-Scale Decoding that progressively recovers perceptual information through mask-guided mechanisms rather than simple feature reconstruction. Comprehensive experiments across multiple datasets demonstrate that InfoCom achieves near-lossless perception while reducing communication overhead from megabyte to kilobyte-scale, representing 440-fold and 90-fold reductions per agent compared to Where2comm and ERMVP, respectively.