Myosotis: structured computation for attention like layer

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Standard attention mechanisms suffer from quadratic computational and memory complexity in sequence length due to pairwise interaction modeling. Linear-time alternatives—such as sparsity-based approximations or state space models (SSMs)—reduce complexity but compromise expressive capacity or long-range dependency modeling flexibility. This paper introduces TreeAttention, the first attention mechanism leveraging efficient inversion of tree-structured matrices for attention computation, synergistically combining sparsity with cyclic dependency modeling. By employing structured linear transformations and hierarchical tree-recursive computation, TreeAttention achieves near-exact inversion for sequence-to-sequence mappings, preserving strong representational power while reducing complexity to nearly linear. Extensive experiments demonstrate that TreeAttention significantly outperforms standard attention and leading linear-time methods on long-sequence tasks, achieving a superior balance between efficiency and modeling capability.

Technology Category

Application Category

📝 Abstract

Attention layers apply a sequence-to-sequence mapping whose parameters depend on the pairwise interactions of the input elements. However, without any structural assumptions, memory and compute scale quadratically with the sequence length. The two main ways to mitigate this are to introduce sparsity by ignoring a sufficient amount of pairwise interactions or to introduce recurrent dependence along them, as SSM does. Although both approaches are reasonable, they both have disadvantages. We propose a novel algorithm that combines the advantages of both concepts. Our idea is based on the efficient inversion of tree-structured matrices.

Problem

Research questions and friction points this paper is trying to address.

Reducing quadratic complexity in attention layers

Combining sparsity and recurrence for efficient computation

Using tree-structured matrix inversion to optimize interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree-structured matrix inversion algorithm

Combining sparsity and recurrent approaches

Efficient structured computation for attention

🔎 Similar Papers

No similar papers found.