Linearized Optimal Transport for Analysis of High-Dimensional Point-Cloud and Single-Cell Data

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Single-cell sequencing yields high-dimensional, irregular cell point clouds, posing challenges for direct quantification of inter-individual biological variation; moreover, existing nonlinear models (e.g., kernel methods, deep networks) lack interpretability. To address this, we propose a unified analytical framework based on Linear Optimal Transport (LOT): patient-level point clouds are embedded into a fixed-dimensional Euclidean space via LOT barycentric averaging for distribution alignment, enabling both linear reconstruction and inverse mapping. Our method achieves high predictive accuracy, biological interpretability—classifier weights are directly attributable to key marker genes—and generative capability—synthesizing biologically plausible, patient-specific organoid-like data. Applied to multi-omics COVID-19 datasets, the model delivers accurate and interpretable disease-state classification and facilitates mechanistic investigation of drug–disease interactions.

Technology Category

Application Category

📝 Abstract

Single-cell technologies generate high-dimensional point clouds of cells, enabling detailed characterization of complex patient states and treatment responses. Yet each patient is represented by an irregular point cloud rather than a simple vector, making it difficult to directly quantify and compare biological differences between individuals. Nonlinear methods such as kernels and neural networks achieve predictive accuracy but act as black boxes, offering little biological interpretability. To address these limitations, we adapt the Linear Optimal Transport (LOT) framework to this setting, embedding irregular point clouds into a fixed-dimensional Euclidean space while preserving distributional structure. This embedding provides a principled linear representation that preserves optimal transport geometry while enabling downstream analysis. It also forms a registration between any two patients, enabling direct comparison of their cellular distributions. Within this space, LOT enables: (i) extbf{accurate and interpretable classification} of COVID-19 patient states, where classifier weights map back to specific markers and spatial regions driving predictions; and (ii) extbf{synthetic data generation} for patient-derived organoids, exploiting the linearity of the LOT embedding. LOT barycenters yield averaged cellular profiles representing combined conditions or samples, supporting drug interaction testing. Together, these results establish LOT as a unified framework that bridges predictive performance, interpretability, and generative modeling. By transforming heterogeneous point clouds into structured embeddings directly traceable to the original data, LOT opens new opportunities for understanding immune variation and treatment effects in high-dimensional biological systems.

Problem

Research questions and friction points this paper is trying to address.

Quantifying biological differences between irregular high-dimensional point clouds

Providing interpretable representations for complex single-cell data

Enabling direct comparison of cellular distributions across different patients

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Optimal Transport embeds point clouds into Euclidean space

LOT enables interpretable classification with traceable biological markers

Generates synthetic data via barycenters for drug testing

🔎 Similar Papers

Low dimensional representation of multi-patient flow cytometry datasets using optimal transport for minimal residual disease detection in leukemia