Chimera: State Space Models Beyond Sequences

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transformer-based models treat data as unordered sets, disregarding inherent topological structures—such as sequence order, image grids, or graph connectivity—and thus rely on task-specific inductive biases (e.g., positional encodings, random walks), resulting in complex design and limited generalization. This work introduces Chimera: the first unified framework that naturally embeds arbitrary graph topologies into state space models (SSMs), using data structure itself as a universal inductive bias and eliminating domain-specific architectural engineering. Its core innovation is the first generalization of SSMs to arbitrary graphs, enabling linear-time recurrence over DAG-structured graphs and mathematically principled relaxation-based optimization for general graphs. Experiments demonstrate that Chimera outperforms BERT by 0.7 points on GLUE, exceeds ViT by 2.6% top-1 accuracy on ImageNet-1k, and achieves state-of-the-art performance across long-range graph benchmarks.

Technology Category

Application Category

📝 Abstract
Transformer-based deep learning methods have become the standard approach for modeling diverse data such as sequences, images, and graphs. These methods rely on self-attention, which treats data as an unordered set of elements. This ignores the neighborhood structure or graph topology of the data and requires inductive biases--such as position embeddings in sequences and images, or random walks in graphs--to incorporate topology. However, designing such task-specific biases requires significant effort and can introduce side effects that hinder generalization. We introduce Chimera, a unified model that directly incorporates data topology in a principled way, removing the need for domain-specific biases. The key idea is that state space models--which naturally do not require position embeddings--can be generalized to capture any graph topology. Our experiments show that Chimera achieves strong performance across language, vision, and graph domains, outperforming BERT on GLUE by 0.7 points, ViT on ImageNet-1k by 2.6%, and all baselines on the Long Range Graph Benchmark. We further propose algorithmic optimizations to improve Chimera's efficiency: (1) for Directed Acyclic Graphs, Chimera can be implemented as a linear-time recurrence; (2) for general graphs, a simple mathematical relaxation achieves Transformer's quadratic complexity without domain-specific heuristics. These results validate Chimera's core contribution and support the idea that data topology is a powerful inductive bias across modalities.
Problem

Research questions and friction points this paper is trying to address.

Unifying data topology modeling across sequences, images, and graphs
Eliminating domain-specific inductive biases in neural architectures
Generalizing state space models to capture arbitrary graph structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

State space models capture graph topology directly
Unified model removes need for domain-specific biases
Algorithmic optimizations achieve linear or quadratic complexity
🔎 Similar Papers
No similar papers found.
A
Aakash Lahoti
Carnegie Mellon University
T
Tanya Marwah
Carnegie Mellon University
Ratish Puduppully
Ratish Puduppully
IT University of Copenhagen
Natural Language Processing
Albert Gu
Albert Gu
Carnegie Mellon University
Machine Learning