🤖 AI Summary
Existing learned data compression methods struggle to simultaneously achieve accurate probabilistic modeling and system efficiency: single-stream architectures fail to capture both local syntax and global semantics, while serial processing suffers from device speed mismatches, leading to high latency and low throughput. This work proposes a dual-stream, multi-scale decoupled architecture that separates local and global contextual modeling, replacing deep sequential computation with shallow parallel streams. A hierarchical gated refinement module is introduced to enable adaptive feature optimization and precise probability estimation. Furthermore, a concurrent streaming pipeline is designed to realize end-to-end fully pipelined parallelism. The proposed approach significantly improves compression ratio and system throughput while maintaining minimal latency and memory footprint, achieving state-of-the-art performance.
📝 Abstract
While Learned Data Compression (LDC) has achieved superior compression ratios, balancing precise probability modeling with system efficiency remains challenging. Crucially, uniform single-stream architectures struggle to simultaneously capture micro-syntactic and macro-semantic features, necessitating deep serial stacking that exacerbates latency. Compounding this, heterogeneous systems are constrained by device speed mismatches, where throughput is capped by Amdahl's Law due to serial processing. To this end, we propose a Dual-Stream Multi-Scale Decoupler that disentangles local and global contexts to replace deep serial processing with shallow parallel streams, and incorporate a Hierarchical Gated Refiner for adaptive feature refinement and precise probability modeling. Furthermore, we design a Concurrent Stream-Parallel Pipeline, which overcomes systemic bottlenecks to achieve full-pipeline parallelism. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both compression ratio and throughput, while maintaining the lowest latency and memory usage. The code is available at https://github.com/huidong-ma/FADE.