Latent variable estimation with composite Hilbert space Gaussian processes

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Estimating latent variables from large-scale, multi-source data remains challenging due to computational intractability and poor uncertainty calibration. Method: This paper proposes a scalable Composite Hilbert Space Gaussian Process (CHSGP) framework. It extends Hilbert space approximations to composite Gaussian processes for the first time and introduces a spectral decomposition strategy for derivative covariance functions, integrated within a low-rank approximation and multi-output joint probabilistic inference framework. Contribution/Results: CHSGP enables efficient, high-accuracy inference of both latent variables and their derivatives. It maintains well-calibrated uncertainty estimates even on datasets with thousands of samples and achieves significant speedups in inference. Validated on single-cell gene expression data, CHSGP successfully reconstructs cellular developmental trajectories, demonstrating strong empirical efficacy and scalability in real-world biological applications.

Technology Category

Application Category

📝 Abstract
We develop a scalable class of models for latent variable estimation using composite Gaussian processes, with a focus on derivative Gaussian processes. We jointly model multiple data sources as outputs to improve the accuracy of latent variable inference under a single probabilistic framework. Similarly specified exact Gaussian processes scale poorly with large datasets. To overcome this, we extend the recently developed Hilbert space approximation methods for Gaussian processes to obtain a reduced-rank representation of the composite covariance function through its spectral decomposition. Specifically, we derive and analyze the spectral decomposition of derivative covariance functions and further study their properties theoretically. Using these spectral decompositions, our methods easily scale up to data scenarios involving thousands of samples. We validate our methods in terms of latent variable estimation accuracy, uncertainty calibration, and inference speed across diverse simulation scenarios. Finally, using a real world case study from single-cell biology, we demonstrate the potential of our models in estimating latent cellular ordering given gene expression levels, thus enhancing our understanding of the underlying biological process.
Problem

Research questions and friction points this paper is trying to address.

Scalable latent variable estimation using composite Gaussian processes
Joint modeling of multiple data sources for accurate inference
Efficient spectral decomposition for large datasets with thousands of samples
Innovation

Methods, ideas, or system contributions that make the work stand out.

Composite Gaussian processes for latent variable estimation
Hilbert space approximation for scalable covariance representation
Spectral decomposition of derivative covariance functions for efficiency