mGRADE: Minimal Recurrent Gating Meets Delay Convolutions for Lightweight Sequence Modeling

πŸ“… 2025-07-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the challenge of memory-constrained temporal modeling on edge devices, this paper proposes mGRADEβ€”a lightweight hybrid architecture. Methodologically, mGRADE integrates learnable-stride dilated convolutions (to capture multi-scale local dynamics) with a minimalist gated recurrent unit (minGRU) for long-range dependency modeling, forming a parallelizable, memory-efficient hybrid memory system. Crucially, it achieves constant-memory complexity during both training and inference and supports fully parallel training. Empirically, mGRADE outperforms pure convolutional and pure RNN baselines on synthetic sequence tasks and pixel-level image classification benchmarks, achieving comparable or superior accuracy with approximately 20% lower memory footprint. By enabling efficient co-modeling of short- and long-range temporal dynamics, mGRADE establishes a cost-effective, edge-deployable paradigm for sequence modeling.

Technology Category

Application Category

πŸ“ Abstract
Edge devices for temporal processing demand models that capture both short- and long- range dynamics under tight memory constraints. While Transformers excel at sequence modeling, their quadratic memory scaling with sequence length makes them impractical for such settings. Recurrent Neural Networks (RNNs) offer constant memory but train sequentially, and Temporal Convolutional Networks (TCNs), though efficient, scale memory with kernel size. To address this, we propose mGRADE (mininally Gated Recurrent Architecture with Delay Embedding), a hybrid-memory system that integrates a temporal 1D-convolution with learnable spacings followed by a minimal gated recurrent unit (minGRU). This design allows the convolutional layer to realize a flexible delay embedding that captures rapid temporal variations, while the recurrent module efficiently maintains global context with minimal memory overhead. We validate our approach on two synthetic tasks, demonstrating that mGRADE effectively separates and preserves multi-scale temporal features. Furthermore, on challenging pixel-by-pixel image classification benchmarks, mGRADE consistently outperforms both pure convolutional and pure recurrent counterparts using approximately 20% less memory footprint, highlighting its suitability for memory-constrained temporal processing at the edge. This highlights mGRADE's promise as an efficient solution for memory-constrained multi-scale temporal processing at the edge.
Problem

Research questions and friction points this paper is trying to address.

Balancing memory and performance in edge device sequence modeling
Combining RNNs and TCNs for efficient temporal feature capture
Reducing memory usage while maintaining multi-scale temporal processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid-memory system with temporal 1D-convolution
Minimal gated recurrent unit for global context
Efficient multi-scale feature separation and preservation
πŸ”Ž Similar Papers
No similar papers found.