Geometric Transformation-Embedded Mamba for Learned Video Compression

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work proposes an efficient end-to-end video compression framework based on a direct transformation strategy, circumventing the need for explicit motion estimation and compensation that complicates traditional learning-based approaches. The method uniquely integrates geometric transformation into the Mamba architecture through a novel Cascaded Mamba Module (CMM) and a locality-refined feed-forward network (LRFFN). It further incorporates a differential convolutional hybrid block and a conditional channel-wise entropy model to enhance coding efficiency. Experimental results demonstrate that the proposed approach significantly improves perceptual quality and temporal consistency at low bitrates, outperforming state-of-the-art methods across multiple evaluation metrics.

Technology Category

Application Category

📝 Abstract

Although learned video compression methods have exhibited outstanding performance, most of them typically follow a hybrid coding paradigm that requires explicit motion estimation and compensation, resulting in a complex solution for video compression. In contrast, we introduce a streamlined yet effective video compression framework founded on a direct transform strategy, i.e., nonlinear transform, quantization, and entropy coding. We first develop a cascaded Mamba module (CMM) with different embedded geometric transformations to effectively explore both long-range spatial and temporal dependencies. To improve local spatial representation, we introduce a locality refinement feed-forward network (LRFFN) that incorporates a hybrid convolution block based on difference convolutions. We integrate the proposed CMM and LRFFN into the encoder and decoder of our compression framework. Moreover, we present a conditional channel-wise entropy model that effectively utilizes conditional temporal priors to accurately estimate the probability distributions of current latent features. Extensive experiments demonstrate that our method outperforms state-of-the-art video compression approaches in terms of perceptual quality and temporal consistency under low-bitrate constraints. Our source codes and models will be available at https://github.com/cshw2021/GTEM-LVC.

Problem

Research questions and friction points this paper is trying to address.

learned video compression

motion estimation

motion compensation

hybrid coding paradigm

video compression complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba

geometric transformation

learned video compression