pLSTM: parallelizable Linear Source Transition Mark networks

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing linear RNNs (e.g., xLSTM, Mamba) are limited to sequences or pre-ordered multidimensional structures and struggle to efficiently model higher-order structured data such as directed acyclic graphs (DAGs). To address this, we propose the parallel Linear Source-Transition-Label Network (pLSTM), the first linear RNN architecture generalizable to arbitrary DAGs. Its core innovations include: (i) a dual-mode mechanism—P-mode for directional propagation and D-mode for diffusion-based distribution—to mitigate long-range gradient decay; and (ii) block-wise recursion and quasi-parallel scan training via line-graph representation and einsum-based tensor operations. On arrow-direction extrapolation tasks, pLSTM significantly improves size generalization, outperforming Transformers. It achieves state-of-the-art (SOTA) or near-SOTA performance on molecular graph prediction and 2D image benchmarks.

Technology Category

Application Category

📝 Abstract

Modern recurrent architectures, such as xLSTM and Mamba, have recently challenged the Transformer in language modeling. However, their structure constrains their applicability to sequences only or requires processing multi-dimensional data structures, such as images or molecular graphs, in a pre-defined sequential order. In contrast, Multi-Dimensional RNNs (MDRNNs) are well suited for data with a higher level structure, like 2D grids, trees, and directed acyclic graphs (DAGs). In this work, we extend the notion of multi-dimensionality to linear RNNs. We introduce parallelizable Linear Source Transition Mark networks (pLSTMs) using Source, Transition, and Mark gates that act on the line graph of a general DAG. This enables parallelization in analogy to parallel associative scans and the chunkwise-recurrent form of sequential linear RNNs, but for DAGs. For regular grids (1D and 2D), like images, this scheme can be efficiently implemented using einsum operations, concatenations, and padding in logarithmic time. pLSTMs tackle the vanishing/exploding activation/gradient problem for long distances in DAGs via two distinct modes: a directed propagation mode (P-mode) and a diffusive distribution mode (D-mode). To showcase the long-range capabilities of pLSTM, we introduce arrow-pointing extrapolation as a synthetic computer vision task that contains long-distance directional information. We demonstrate that pLSTMs generalize well to larger image sizes, whereas Transformers struggle to extrapolate. On established molecular graph and computer vision benchmarks, pLSTMs also show strong performance. Code and Datasets are available at: https://github.com/ml-jku/plstm_experiments.

Problem

Research questions and friction points this paper is trying to address.

Extends multi-dimensionality to linear RNNs for DAGs

Enables parallel processing of structured data like grids and graphs

Addresses vanishing/exploding gradient issues in long-distance DAGs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallelizable Linear Source Transition Mark networks

Efficient DAG processing with parallel associative scans

Two distinct modes for gradient problem solution

🔎 Similar Papers

Unlocking the Power of LSTM for Long Term Time Series Forecasting