Smooth Flow Matching

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Functional data pose significant statistical modeling challenges due to privacy constraints, sparse and irregular sampling, infinite dimensionality, and non-Gaussian structures. To address these, we propose a novel semi-parametric vine flow generative model that abandons restrictive Gaussianity and low-rank assumptions. By integrating flow matching with vine copula structures, our method explicitly captures functional smoothness, enabling efficient modeling of irregularly sampled observations and high-fidelity generation of infinite-dimensional functions. Experiments on synthetic benchmarks and real-world MIMIC-IV clinical trajectory data demonstrate that the synthesized data achieve superior fidelity, computational efficiency, and downstream statistical utility—including regression and hypothesis testing—compared to state-of-the-art methods. The framework provides a trustworthy, privacy-preserving generative solution for functional data analysis in sensitive domains.

Technology Category

Application Category

📝 Abstract
Functional data, i.e., smooth random functions observed over a continuous domain, are increasingly available in areas such as biomedical research, health informatics, and epidemiology. However, effective statistical analysis for functional data is often hindered by challenges such as privacy constraints, sparse and irregular sampling, infinite dimensionality, and non-Gaussian structures. To address these challenges, we introduce a novel framework named Smooth Flow Matching (SFM), tailored for generative modeling of functional data to enable statistical analysis without exposing sensitive real data. Built upon flow-matching ideas, SFM constructs a semiparametric copula flow to generate infinite-dimensional functional data, free from Gaussianity or low-rank assumptions. It is computationally efficient, handles irregular observations, and guarantees the smoothness of the generated functions, offering a practical and flexible solution in scenarios where existing deep generative methods are not applicable. Through extensive simulation studies, we demonstrate the advantages of SFM in terms of both synthetic data quality and computational efficiency. We then apply SFM to generate clinical trajectory data from the MIMIC-IV patient electronic health records (EHR) longitudinal database. Our analysis showcases the ability of SFM to produce high-quality surrogate data for downstream statistical tasks, highlighting its potential to boost the utility of EHR data for clinical applications.
Problem

Research questions and friction points this paper is trying to address.

Generative modeling for functional data privacy
Handling sparse irregular non-Gaussian functional observations
Producing smooth synthetic clinical trajectory data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric copula flow for infinite-dimensional data
Handles irregular observations with computational efficiency
Guarantees smoothness without Gaussian or low-rank assumptions
🔎 Similar Papers
No similar papers found.
Jianbin Tan
Jianbin Tan
Duke University
BiostatisticsFunctional dataDifferential equation learningFlow-based learning
A
Anru R. Zhang
Department of Biostatistics & Bioinformatics and Department of Computer Science, Duke University, Durham, NC, USA