PFGNet: A Fully Convolutional Frequency-Guided Peripheral Gating Network for Efficient Spatiotemporal Predictive Learning

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of purely convolutional models in adaptively capturing complex spatiotemporal dynamics due to their fixed receptive fields. To overcome this, we propose PFGNet, which introduces a pixel-wise frequency-guided gating mechanism within a fully convolutional framework for the first time, enabling the construction of spatially adaptive band-pass filters that dynamically modulate receptive fields. The method integrates multi-scale large-kernel surround responses with learnable center suppression, efficiently modeling local spectral characteristics and center-surround structures through separable 1D convolutions—without relying on recurrent or attention mechanisms. PFGNet achieves state-of-the-art or near state-of-the-art performance on benchmark datasets including Moving MNIST, TaxiBJ, Human3.6M, and KTH, while significantly reducing both parameter count and FLOPs.

Technology Category

Application Category

📝 Abstract
Spatiotemporal predictive learning (STPL) aims to forecast future frames from past observations and is essential across a wide range of applications. Compared with recurrent or hybrid architectures, pure convolutional models offer superior efficiency and full parallelism, yet their fixed receptive fields limit their ability to adaptively capture spatially varying motion patterns. Inspired by biological center-surround organization and frequency-selective signal processing, we propose PFGNet, a fully convolutional framework that dynamically modulates receptive fields through pixel-wise frequency-guided gating. The core Peripheral Frequency Gating (PFG) block extracts localized spectral cues and adaptively fuses multi-scale large-kernel peripheral responses with learnable center suppression, effectively forming spatially adaptive band-pass filters. To maintain efficiency, all large kernels are decomposed into separable 1D convolutions ($1 \times k$ followed by $k \times 1$), reducing per-channel computational cost from $O(k^2)$ to $O(2k)$. PFGNet enables structure-aware spatiotemporal modeling without recurrence or attention. Experiments on Moving MNIST, TaxiBJ, Human3.6M, and KTH show that PFGNet delivers SOTA or near-SOTA forecasting performance with substantially fewer parameters and FLOPs. Our code is available at https://github.com/fhjdqaq/PFGNet.
Problem

Research questions and friction points this paper is trying to address.

spatiotemporal predictive learning
convolutional models
receptive fields
motion patterns
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

frequency-guided gating
spatiotemporal predictive learning
fully convolutional network
adaptive receptive field
separable large-kernel convolution
🔎 Similar Papers
No similar papers found.
X
Xinyong Cai
College of Computer Science, Sichuan University
C
Changbin Sun
College of Computer Science, Sichuan University
Yong Wang
Yong Wang
The University of Hong Kong
Large Language ModelNatural Language ProcessingMachine Learning
H
Hongyu Yang
College of Computer Science, Sichuan University
Y
Yuankai Wu
College of Computer Science, Sichuan University