Mondrian: Transformer Operators via Domain Decomposition

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the scalability bottleneck of Transformer operators in high-resolution, multi-scale PDE modeling—arising from their quadratic computational complexity and tight coupling with spatial discretization—this paper proposes a domain-decomposition-based functional Transformer architecture. The physical domain is partitioned into non-overlapping subdomains; local dynamics are modeled independently, while cross-subdomain functional-space attention enables global information exchange, thereby decoupling attention mechanisms from grid discretization for the first time. The architecture integrates hierarchical windowing, neighborhood-aware attention, and neural operator layers, achieving resolution-invariant generalization: models trained at one resolution generalize seamlessly to others without retraining. Extensive experiments on the Allen–Cahn and Navier–Stokes equations demonstrate substantial improvements in prediction accuracy, computational efficiency, and multi-scale generalization capability.

Technology Category

Application Category

📝 Abstract

Operator learning enables data-driven modeling of partial differential equations (PDEs) by learning mappings between function spaces. However, scaling transformer-based operator models to high-resolution, multiscale domains remains a challenge due to the quadratic cost of attention and its coupling to discretization. We introduce extbf{Mondrian}, transformer operators that decompose a domain into non-overlapping subdomains and apply attention over sequences of subdomain-restricted functions. Leveraging principles from domain decomposition, Mondrian decouples attention from discretization. Within each subdomain, it replaces standard layers with expressive neural operators, and attention across subdomains is computed via softmax-based inner products over functions. The formulation naturally extends to hierarchical windowed and neighborhood attention, supporting both local and global interactions. Mondrian achieves strong performance on Allen-Cahn and Navier-Stokes PDEs, demonstrating resolution scaling without retraining. These results highlight the promise of domain-decomposed attention for scalable and general-purpose neural operators.

Problem

Research questions and friction points this paper is trying to address.

Scaling transformer models for high-resolution PDEs

Decoupling attention from discretization in operator learning

Achieving resolution scaling without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain decomposition for scalable transformer operators

Neural operators replace standard layers locally

Hierarchical attention supports local and global interactions

🔎 Similar Papers

No similar papers found.