PICore: Physics-Informed Unsupervised Coreset Selection for Data Efficient Neural Operator Training

๐Ÿ“… 2025-07-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Neural operator training typically requires large quantities of expensive, high-fidelity PDE numerical simulations for labeled data. To address this, we propose an unsupervised coreset selection framework that eliminates the need for ground-truth solution labels. Instead, it leverages physics-informed neural network (PINN) losses to evaluate the physical consistency and informativeness of input samples, then integrates multiple coreset selection strategies to identify the most representative instancesโ€”only these are subjected to costly numerical solving. By embedding physical constraints directly into the sample selection process, our method significantly reduces annotation overhead and simulation cost. Evaluated on four canonical PDE benchmarks, it achieves an average 78% improvement in training efficiency, with negligible accuracy degradation, and consistently outperforms existing supervised coreset methods.

Technology Category

Application Category

๐Ÿ“ Abstract
Neural operators offer a powerful paradigm for solving partial differential equations (PDEs) that cannot be solved analytically by learning mappings between function spaces. However, there are two main bottlenecks in training neural operators: they require a significant amount of training data to learn these mappings, and this data needs to be labeled, which can only be accessed via expensive simulations with numerical solvers. To alleviate both of these issues simultaneously, we propose PICore, an unsupervised coreset selection framework that identifies the most informative training samples without requiring access to ground-truth PDE solutions. PICore leverages a physics-informed loss to select unlabeled inputs by their potential contribution to operator learning. After selecting a compact subset of inputs, only those samples are simulated using numerical solvers to generate labels, reducing annotation costs. We then train the neural operator on the reduced labeled dataset, significantly decreasing training time as well. Across four diverse PDE benchmarks and multiple coreset selection strategies, PICore achieves up to 78% average increase in training efficiency relative to supervised coreset selection methods with minimal changes in accuracy. We provide code at https://github.com/Asatheesh6561/PICore.
Problem

Research questions and friction points this paper is trying to address.

Reduces expensive labeled data need for neural operator training
Selects informative unlabeled samples using physics-informed loss
Improves training efficiency without significant accuracy loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised coreset selection for neural operators
Physics-informed loss identifies informative samples
Reduces annotation costs and training time
๐Ÿ”Ž Similar Papers
No similar papers found.