🤖 AI Summary
Existing neural operator methods rely heavily on large amounts of PDE-specific training data and exhibit poor generalization across distinct PDE families. To address this, we propose MOFS—a multimodal few-shot learning framework for PDE solution operator modeling. MOFS encodes PDE semantics via text-conditioned embeddings, integrates multimodal information through memory-augmented prompting and cross-modal attention, and leverages Fourier neural operator pretraining, masked field reconstruction, and two-stage contrastive fine-tuning. With only a few examples per target PDE, MOFS achieves strong generalization to unseen equations—including Darcy Flow and Navier–Stokes—significantly outperforming prior methods in cross-family few-shot inference across multiple benchmarks. Our core contribution is the first semantic-driven, multimodal few-shot operator learning paradigm enabling zero-shot or few-shot generalization across diverse PDE families.
📝 Abstract
Learning solution operators for partial differential equations (PDEs) has become a foundational task in scientific machine learning. However, existing neural operator methods require abundant training data for each specific PDE and lack the ability to generalize across PDE families. In this work, we propose MOFS: a unified multimodal framework for multi-operator few-shot learning, which aims to generalize to unseen PDE operators using only a few demonstration examples. Our method integrates three key components: (i) multi-task self-supervised pretraining of a shared Fourier Neural Operator (FNO) encoder to reconstruct masked spatial fields and predict frequency spectra, (ii) text-conditioned operator embeddings derived from statistical summaries of input-output fields, and (iii) memory-augmented multimodal prompting with gated fusion and cross-modal gradient-based attention. We adopt a two-stage training paradigm that first learns prompt-conditioned inference on seen operators and then applies end-to-end contrastive fine-tuning to align latent representations across vision, frequency, and text modalities. Experiments on PDE benchmarks, including Darcy Flow and Navier Stokes variants, demonstrate that our model outperforms existing operator learning baselines in few-shot generalization. Extensive ablations validate the contributions of each modality and training component. Our approach offers a new foundation for universal and data-efficient operator learning across scientific domains.