Geometric Transformation-Embedded Mamba for Learned Video Compression

πŸ“… 2026-03-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes an efficient end-to-end video compression framework based on a direct transformation strategy, circumventing the need for explicit motion estimation and compensation that complicates traditional learning-based approaches. The method uniquely integrates geometric transformation into the Mamba architecture through a novel Cascaded Mamba Module (CMM) and a locality-refined feed-forward network (LRFFN). It further incorporates a differential convolutional hybrid block and a conditional channel-wise entropy model to enhance coding efficiency. Experimental results demonstrate that the proposed approach significantly improves perceptual quality and temporal consistency at low bitrates, outperforming state-of-the-art methods across multiple evaluation metrics.

Technology Category

Application Category

πŸ“ Abstract
Although learned video compression methods have exhibited outstanding performance, most of them typically follow a hybrid coding paradigm that requires explicit motion estimation and compensation, resulting in a complex solution for video compression. In contrast, we introduce a streamlined yet effective video compression framework founded on a direct transform strategy, i.e., nonlinear transform, quantization, and entropy coding. We first develop a cascaded Mamba module (CMM) with different embedded geometric transformations to effectively explore both long-range spatial and temporal dependencies. To improve local spatial representation, we introduce a locality refinement feed-forward network (LRFFN) that incorporates a hybrid convolution block based on difference convolutions. We integrate the proposed CMM and LRFFN into the encoder and decoder of our compression framework. Moreover, we present a conditional channel-wise entropy model that effectively utilizes conditional temporal priors to accurately estimate the probability distributions of current latent features. Extensive experiments demonstrate that our method outperforms state-of-the-art video compression approaches in terms of perceptual quality and temporal consistency under low-bitrate constraints. Our source codes and models will be available at https://github.com/cshw2021/GTEM-LVC.
Problem

Research questions and friction points this paper is trying to address.

learned video compression
motion estimation
motion compensation
hybrid coding paradigm
video compression complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba
geometric transformation
learned video compression
entropy modeling
temporal dependency
πŸ”Ž Similar Papers
No similar papers found.
Hao Wei
Hao Wei
Xi'an Jiaotong University
Computer Vision
Y
Yanhui Zhou
School of Information and Communications Engineering, Xi’an Jiaotong University, No.28, West Xianning Road, Xi’an, 710049, Shaanxi, China
Chenyang Ge
Chenyang Ge
Professor, Xi'an Jiaotong University
computer vision