Data-driven sparse identification of governing PDEs via knockoff filters and multi-criteria trade-offs

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the challenge of identifying sparse structures of partial differential equations (PDEs) from noisy observations, where collinearity among candidate terms often leads to spurious term selection. To overcome this issue, the authors propose the KO-PDE-IDENT framework, which uniquely integrates model-X knockoff filtering with SHAP values to construct an effective differential statistic. Combined with ℓ₀-constrained adaptive optimal subset selection, recursive feature elimination, and a multi-criteria decision mechanism, the method achieves accurate sparse modeling while rigorously controlling the false discovery rate. Experimental results demonstrate that, even under strong noise, KO-PDE-IDENT successfully eliminates all spurious terms and retains only the true terms across five canonical PDEs, yielding coefficient estimates with low error.

📝 Abstract

We propose KO-PDE-IDENT, a data-driven framework for identifying parsimonious partial differential equations (PDEs) with false discovery rate (FDR) control. PDE discovery from noisy observations is often hindered by extreme multicollinearity among candidate terms, which causes typical sparse-regression methods to select spurious terms. To address this problem, KO-PDE-IDENT initially mines a support set of potential candidate terms via model-X knockoff filters with finite-sample FDR control, then refines and ranks the surviving PDE alternatives. The framework integrates three components. First, knockoff feature statistics are constructed by coupling $\ell_{0}$-constrained adaptive best-subset selection with SHapley Additive exPlanations (SHAP), yielding an effective and computationally efficient difference statistic. Second, a recursive feature elimination (RFE) procedure removes terms whose marginal contributions are dispensable and assesses statistical necessity through knockoff-perturbed hypothesis testing. Third, the final model selection is formulated as a multi-criteria decision-making (MCDM) problem, where the optimal governing equation is the alternative that best balances a wide range of criteria such as predictive accuracy, model complexity and coefficient uncertainty. We validate KO-PDE-IDENT on five canonical PDEs under severe noise corruption. Empirical results show that our framework can exactly recover the true PDE structure, eliminating false discoveries while retaining all true underlying terms, with low coefficient estimation error.

Problem

Research questions and friction points this paper is trying to address.

PDE discovery

false discovery rate

multicollinearity

sparse identification

noisy data

Innovation

Methods, ideas, or system contributions that make the work stand out.

knockoff filters

sparse PDE identification

false discovery rate control