Maximal Update Parametrization and Zero-Shot Hyperparameter Transfer for Fourier Neural Operators

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale Fourier Neural Operators (FNOs) suffer from explosive parameter growth and prohibitively high computational costs for hyperparameter tuning when increasing the number of Fourier modes. Method: This paper proposes the first zero-shot hyperparameter transfer method for FNOs, introducing maximal update parameterization (μP) theory—previously unexplored in FNOs—into the Fourier domain. Leveraging spectral analysis and gradient dynamics modeling, we derive a cross-scale scalable parameterization that rigorously guarantees theoretical hyperparameter transferability across varying mode counts. Contribution/Results: Our method enables hyperparameters optimized on small FNOs to be directly deployed on billion-parameter-scale models without re-tuning. Extensive evaluation across diverse partial differential equation benchmarks demonstrates substantial reduction in hyperparameter optimization overhead for large models, while maintaining or improving prediction accuracy—establishing a new paradigm for efficient, scalable FNO deployment.

Technology Category

Application Category

📝 Abstract
Fourier Neural Operators (FNOs) offer a principled approach for solving complex partial differential equations (PDEs). However, scaling them to handle more complex PDEs requires increasing the number of Fourier modes, which significantly expands the number of model parameters and makes hyperparameter tuning computationally impractical. To address this, we introduce $μ$Transfer-FNO, a zero-shot hyperparameter transfer technique that enables optimal configurations, tuned on smaller FNOs, to be directly applied to billion-parameter FNOs without additional tuning. Building on the Maximal Update Parametrization ($μ$P) framework, we mathematically derive a parametrization scheme that facilitates the transfer of optimal hyperparameters across models with different numbers of Fourier modes in FNOs, which is validated through extensive experiments on various PDEs. Our empirical study shows that Transfer-FNO reduces computational cost for tuning hyperparameters on large FNOs while maintaining or improving accuracy.
Problem

Research questions and friction points this paper is trying to address.

Scaling Fourier Neural Operators for complex PDEs efficiently
Reducing hyperparameter tuning cost for large FNO models
Transferring optimal hyperparameters across different FNO configurations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot hyperparameter transfer for FNOs
Maximal Update Parametrization framework
Scales FNOs without additional tuning
🔎 Similar Papers
No similar papers found.
Shanda Li
Shanda Li
Carnegie Mellon University
Machine Learning
Shinjae Yoo
Shinjae Yoo
Brookhaven National Lab
Machine Learning
Y
Yiming Yang
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA