Efficient Learned Data Compression via Dual-Stream Feature Decoupling

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing learned data compression methods struggle to simultaneously achieve accurate probabilistic modeling and system efficiency: single-stream architectures fail to capture both local syntax and global semantics, while serial processing suffers from device speed mismatches, leading to high latency and low throughput. This work proposes a dual-stream, multi-scale decoupled architecture that separates local and global contextual modeling, replacing deep sequential computation with shallow parallel streams. A hierarchical gated refinement module is introduced to enable adaptive feature optimization and precise probability estimation. Furthermore, a concurrent streaming pipeline is designed to realize end-to-end fully pipelined parallelism. The proposed approach significantly improves compression ratio and system throughput while maintaining minimal latency and memory footprint, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

While Learned Data Compression (LDC) has achieved superior compression ratios, balancing precise probability modeling with system efficiency remains challenging. Crucially, uniform single-stream architectures struggle to simultaneously capture micro-syntactic and macro-semantic features, necessitating deep serial stacking that exacerbates latency. Compounding this, heterogeneous systems are constrained by device speed mismatches, where throughput is capped by Amdahl's Law due to serial processing. To this end, we propose a Dual-Stream Multi-Scale Decoupler that disentangles local and global contexts to replace deep serial processing with shallow parallel streams, and incorporate a Hierarchical Gated Refiner for adaptive feature refinement and precise probability modeling. Furthermore, we design a Concurrent Stream-Parallel Pipeline, which overcomes systemic bottlenecks to achieve full-pipeline parallelism. Extensive experiments demonstrate that our method achieves state-of-the-art performance in both compression ratio and throughput, while maintaining the lowest latency and memory usage. The code is available at https://github.com/huidong-ma/FADE.

Problem

Research questions and friction points this paper is trying to address.

Learned Data Compression

Probability Modeling

System Efficiency

Latency

Throughput

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Stream Decoupling

Learned Data Compression

Stream-Parallel Pipeline