MAFNet:Multi-frequency Adaptive Fusion Network for Real-time Stereo Matching

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

284K/year

🤖 AI Summary

Existing stereo matching networks suffer from poor real-time performance and weak non-local contextual modeling on mobile devices. This paper proposes a multi-frequency adaptive fusion network based exclusively on efficient 2D convolutions. Our method addresses these challenges through three key contributions: (1) a frequency-domain adaptive filtering attention module that explicitly decomposes and fuses high- and low-frequency features; (2) integration of the Linformer low-rank attention mechanism to enable adaptive cross-frequency feature aggregation; and (3) a fully 2D convolutional architecture for cost volume construction and regularization, balancing computational efficiency with multi-scale representational capacity. Evaluated on Scene Flow and KITTI 2015 benchmarks, our approach significantly outperforms existing real-time stereo methods, achieving state-of-the-art trade-offs between accuracy and inference speed. The lightweight, convolution-only design ensures strong potential for deployment on resource-constrained mobile platforms.

Technology Category

Application Category

📝 Abstract

Existing stereo matching networks typically rely on either cost-volume construction based on 3D convolutions or deformation methods based on iterative optimization. The former incurs significant computational overhead during cost aggregation, whereas the latter often lacks the ability to model non-local contextual information. These methods exhibit poor compatibility on resource-constrained mobile devices, limiting their deployment in real-time applications. To address this, we propose a Multi-frequency Adaptive Fusion Network (MAFNet), which can produce high-quality disparity maps using only efficient 2D convolutions. Specifically, we design an adaptive frequency-domain filtering attention module that decomposes the full cost volume into high-frequency and low-frequency volumes, performing frequency-aware feature aggregation separately. Subsequently, we introduce a Linformer-based low-rank attention mechanism to adaptively fuse high- and low-frequency information, yielding more robust disparity estimation. Extensive experiments demonstrate that the proposed MAFNet significantly outperforms existing real-time methods on public datasets such as Scene Flow and KITTI 2015, showing a favorable balance between accuracy and real-time performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses computational overhead in stereo matching cost aggregation

Enhances non-local contextual modeling in disparity estimation

Improves real-time performance on resource-constrained mobile devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-frequency adaptive fusion network for stereo matching

Adaptive frequency-domain filtering attention module for cost volume

Linformer-based low-rank attention mechanism for frequency fusion

🔎 Similar Papers

No similar papers found.