pLSTM: parallelizable Linear Source Transition Mark networks

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing linear RNNs (e.g., xLSTM, Mamba) are limited to sequences or pre-ordered multidimensional structures and struggle to efficiently model higher-order structured data such as directed acyclic graphs (DAGs). To address this, we propose the parallel Linear Source-Transition-Label Network (pLSTM), the first linear RNN architecture generalizable to arbitrary DAGs. Its core innovations include: (i) a dual-mode mechanism—P-mode for directional propagation and D-mode for diffusion-based distribution—to mitigate long-range gradient decay; and (ii) block-wise recursion and quasi-parallel scan training via line-graph representation and einsum-based tensor operations. On arrow-direction extrapolation tasks, pLSTM significantly improves size generalization, outperforming Transformers. It achieves state-of-the-art (SOTA) or near-SOTA performance on molecular graph prediction and 2D image benchmarks.

Technology Category

Application Category

📝 Abstract
Modern recurrent architectures, such as xLSTM and Mamba, have recently challenged the Transformer in language modeling. However, their structure constrains their applicability to sequences only or requires processing multi-dimensional data structures, such as images or molecular graphs, in a pre-defined sequential order. In contrast, Multi-Dimensional RNNs (MDRNNs) are well suited for data with a higher level structure, like 2D grids, trees, and directed acyclic graphs (DAGs). In this work, we extend the notion of multi-dimensionality to linear RNNs. We introduce parallelizable Linear Source Transition Mark networks (pLSTMs) using Source, Transition, and Mark gates that act on the line graph of a general DAG. This enables parallelization in analogy to parallel associative scans and the chunkwise-recurrent form of sequential linear RNNs, but for DAGs. For regular grids (1D and 2D), like images, this scheme can be efficiently implemented using einsum operations, concatenations, and padding in logarithmic time. pLSTMs tackle the vanishing/exploding activation/gradient problem for long distances in DAGs via two distinct modes: a directed propagation mode (P-mode) and a diffusive distribution mode (D-mode). To showcase the long-range capabilities of pLSTM, we introduce arrow-pointing extrapolation as a synthetic computer vision task that contains long-distance directional information. We demonstrate that pLSTMs generalize well to larger image sizes, whereas Transformers struggle to extrapolate. On established molecular graph and computer vision benchmarks, pLSTMs also show strong performance. Code and Datasets are available at: https://github.com/ml-jku/plstm_experiments.
Problem

Research questions and friction points this paper is trying to address.

Extends multi-dimensionality to linear RNNs for DAGs
Enables parallel processing of structured data like grids and graphs
Addresses vanishing/exploding gradient issues in long-distance DAGs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallelizable Linear Source Transition Mark networks
Efficient DAG processing with parallel associative scans
Two distinct modes for gradient problem solution
🔎 Similar Papers
No similar papers found.
K
Korbinian Poppel
ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria; NXAI GmbH
R
Richard Freinschlag
ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
Thomas Schmied
Thomas Schmied
PhD Student, Institute for Machine Learning, Johannes Kepler University Linz
Deep LearningReinforcement LearningNatural Language ProcessingContinual Learning
W
Wei Lin
ELLIS Unit Linz, LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria
Sepp Hochreiter
Sepp Hochreiter
Institute for Machine Learning, Johannes Kepler University Linz
Machine LearningDeep LearningArtificial IntelligenceNeural NetworksBioinformatics