Mondrian: Transformer Operators via Domain Decomposition

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scalability bottleneck of Transformer operators in high-resolution, multi-scale PDE modeling—arising from their quadratic computational complexity and tight coupling with spatial discretization—this paper proposes a domain-decomposition-based functional Transformer architecture. The physical domain is partitioned into non-overlapping subdomains; local dynamics are modeled independently, while cross-subdomain functional-space attention enables global information exchange, thereby decoupling attention mechanisms from grid discretization for the first time. The architecture integrates hierarchical windowing, neighborhood-aware attention, and neural operator layers, achieving resolution-invariant generalization: models trained at one resolution generalize seamlessly to others without retraining. Extensive experiments on the Allen–Cahn and Navier–Stokes equations demonstrate substantial improvements in prediction accuracy, computational efficiency, and multi-scale generalization capability.

Technology Category

Application Category

📝 Abstract
Operator learning enables data-driven modeling of partial differential equations (PDEs) by learning mappings between function spaces. However, scaling transformer-based operator models to high-resolution, multiscale domains remains a challenge due to the quadratic cost of attention and its coupling to discretization. We introduce extbf{Mondrian}, transformer operators that decompose a domain into non-overlapping subdomains and apply attention over sequences of subdomain-restricted functions. Leveraging principles from domain decomposition, Mondrian decouples attention from discretization. Within each subdomain, it replaces standard layers with expressive neural operators, and attention across subdomains is computed via softmax-based inner products over functions. The formulation naturally extends to hierarchical windowed and neighborhood attention, supporting both local and global interactions. Mondrian achieves strong performance on Allen-Cahn and Navier-Stokes PDEs, demonstrating resolution scaling without retraining. These results highlight the promise of domain-decomposed attention for scalable and general-purpose neural operators.
Problem

Research questions and friction points this paper is trying to address.

Scaling transformer models for high-resolution PDEs
Decoupling attention from discretization in operator learning
Achieving resolution scaling without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain decomposition for scalable transformer operators
Neural operators replace standard layers locally
Hierarchical attention supports local and global interactions
🔎 Similar Papers
No similar papers found.