MambaVF: State Space Model for Efficient Video Fusion

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

Existing video fusion methods rely heavily on optical flow estimation and feature warping, resulting in high computational costs and poor scalability. This work proposes a novel video fusion paradigm based on state space models (SSMs), which formulates the fusion process as sequential state updates without explicit motion estimation. By leveraging a spatiotemporal bidirectional scanning mechanism, the method efficiently aggregates cross-frame information while modeling long-range temporal dependencies with linear complexity. The proposed approach achieves state-of-the-art performance across multiple video fusion tasks—including multi-exposure, multi-focus, infrared-visible, and medical video fusion—while significantly reducing model parameters by 92.25%, lowering FLOPs by 88.79%, and accelerating inference speed by 2.1× compared to existing methods.

Technology Category

Application Category

📝 Abstract

Video fusion is a fundamental technique in various video processing tasks. However, existing video fusion methods heavily rely on optical flow estimation and feature warping, resulting in severe computational overhead and limited scalability. This paper presents MambaVF, an efficient video fusion framework based on state space models (SSMs) that performs temporal modeling without explicit motion estimation. First, by reformulating video fusion as a sequential state update process, MambaVF captures long-range temporal dependencies with linear complexity while significantly reducing computation and memory costs. Second, MambaVF proposes a lightweight SSM-based fusion module that replaces conventional flow-guided alignment via a spatio-temporal bidirectional scanning mechanism. This module enables efficient information aggregation across frames. Extensive experiments across multiple benchmarks demonstrate that our MambaVF achieves state-of-the-art performance in multi-exposure, multi-focus, infrared-visible, and medical video fusion tasks. We highlight that MambaVF enjoys high efficiency, reducing up to 92.25% of parameters and 88.79% of computational FLOPs and a 2.1x speedup compared to existing methods. Project page: https://mambavf.github.io

Problem

Research questions and friction points this paper is trying to address.

video fusion

optical flow estimation

computational overhead

scalability

feature warping

Innovation

Methods, ideas, or system contributions that make the work stand out.

State Space Model

Video Fusion

Optical Flow-Free