🤖 AI Summary
To address the imbalance in contribution between low-frequency structural and high-frequency textural components under uniform-resolution representation—leading to trade-offs between high compression ratios and high-fidelity reconstruction in LiDAR point cloud compression—this paper proposes a frequency-domain decoupled implicit triplane compression framework. Our key contributions are: (1) a voxel embedding-to-implicit-triplane mapping mechanism; (2) frequency-decoupled encoding with binary component storage, enabling compact, separate representation of low-frequency geometry and high-frequency texture; and (3) a frequency-domain attention module coupled with variable-resolution modulation decoding, facilitating adaptive multi-scale feature fusion and full-spectrum progressive reconstruction. Experimental results on SemanticKITTI and Ford datasets demonstrate BD-rate improvements of 78% and 94%, respectively, over standard encoders, achieving state-of-the-art rate-distortion performance.
📝 Abstract
Point cloud compression methods jointly optimize bitrates and reconstruction distortion. However, balancing compression ratio and reconstruction quality is difficult because low-frequency and high-frequency components contribute differently at the same resolution. To address this, we propose FLaTEC, a frequency-aware compression model that enables the compression of a full scan with high compression ratios. Our approach introduces a frequency-aware mechanism that decouples low-frequency structures and high-frequency textures, while hybridizing latent triplanes as a compact proxy for point cloud. Specifically, we convert voxelized embeddings into triplane representations to reduce sparsity, computational cost, and storage requirements. We then devise a frequency-disentangling technique that extracts compact low-frequency content while collecting high-frequency details across scales. The decoupled low-frequency and high-frequency components are stored in binary format. During decoding, full-spectrum signals are progressively recovered via a modulation block. Additionally, to compensate for the loss of 3D correlation, we introduce an efficient frequency-based attention mechanism that fosters local connectivity and outputs arbitrary resolution points. Our method achieves state-of-the-art rate-distortion performance and outperforms the standard codecs by 78% and 94% in BD-rate on both SemanticKITTI and Ford datasets.