🤖 AI Summary
To address the limited receptive field and significant performance degradation at high bit depths in voxelized point cloud geometry compression, this paper proposes a staged spatial-to-channel (S2C) context modeling framework. The method introduces three key innovations: (1) a novel staged channel-wise autoregressive model that hierarchically captures coarse-grained neighborhood dependencies; (2) integration of spherical coordinate representation with geometric residual coding (GRC) to ensure consistent resolution across hierarchical predictions; and (3) a large-kernel residual probability approximation (RPA) module to enhance entropy estimation accuracy. Experimental results demonstrate that the proposed approach achieves substantial bitrate reduction while maintaining or even improving reconstruction quality. Moreover, its computational complexity is lower than that of state-of-the-art voxelized methods, with particularly pronounced advantages on dense point clouds encoded at high bit depths.
📝 Abstract
Voxel-based methods are among the most efficient for point cloud geometry compression, particularly with dense point clouds. However, they face limitations due to a restricted receptive field, especially when handling high-bit depth point clouds. To overcome this issue, we introduce a stage-wise Space-to-Channel (S2C) context model for both dense point clouds and low-level sparse point clouds. This model utilizes a channel-wise autoregressive strategy to effectively integrate neighborhood information at a coarse resolution. For high-level sparse point clouds, we further propose a level-wise S2C context model that addresses resolution limitations by incorporating Geometry Residual Coding (GRC) for consistent-resolution cross-level prediction. Additionally, we use the spherical coordinate system for its compact representation and enhance our GRC approach with a Residual Probability Approximation (RPA) module, which features a large kernel size. Experimental results show that our S2C context model not only achieves bit savings while maintaining or improving reconstruction quality but also reduces computational complexity compared to state-of-the-art voxel-based compression methods.