🤖 AI Summary
Transformer self-attention suffers from high computational complexity and inefficient long-range dependency modeling. Method: This paper reparameterizes the Transformer architecture as a directed graph neural network (DGNN), replacing self-attention with unitary directed graph convolutions. Contribution/Results: We introduce, for the first time, a unitary directed graph convolutional operator grounded in directed graph Fourier transform, ensuring numerical stability and theoretical rigor while substantially simplifying model structure. The resulting DGNN preserves global receptive fields yet reduces time complexity and enhances long-range modeling capacity. Experiments on Long-Range Arena, long-document classification, and DNA sequence classification demonstrate that our method outperforms standard Transformers in accuracy, inference speed, and memory efficiency—achieving superior generalization with higher computational efficiency.
📝 Abstract
Recent advances in deep learning have established Transformer architectures as the predominant modeling paradigm. Central to the success of Transformers is the self-attention mechanism, which scores the similarity between query and key matrices to modulate a value matrix. This operation bears striking similarities to digraph convolution, prompting an investigation into whether digraph convolution could serve as an alternative to self-attention. In this study, we formalize this concept by introducing a synthetic unitary digraph convolution based on the digraph Fourier transform. The resulting model, which we term Converter, effectively converts a Transformer into a Directed Graph Neural Network (DGNN) form. We have tested Converter on Long-Range Arena benchmark, long document classification, and DNA sequence-based taxonomy classification. Our experimental results demonstrate that Converter achieves superior performance while maintaining computational efficiency and architectural simplicity, which establishes it as a lightweight yet powerful Transformer variant.