Efficient Imputation for Patch-based Missing Single-cell Data via Cluster-regularized Optimal Transport

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Single-cell sequencing data often exhibit large-scale structured missingness, posing significant challenges for conventional imputation methods that struggle to balance accuracy and computational efficiency. To address this issue, this work proposes CROT, a novel algorithm that, for the first time, integrates clustering-based regularization into an optimal transport framework. This approach effectively captures the underlying missing-data structure in high-dimensional heterogeneous single-cell datasets. By leveraging the geometric properties of optimal transport and incorporating cluster-level information, CROT achieves high imputation accuracy while substantially reducing computational overhead. Extensive experiments demonstrate that the method scales efficiently to large-scale single-cell data, offering both practical utility and strong performance in real-world applications.

Technology Category

Application Category

📝 Abstract
Missing data in single-cell sequencing datasets poses significant challenges for extracting meaningful biological insights. However, existing imputation approaches, which often assume uniformity and data completeness, struggle to address cases with large patches of missing data. In this paper, we present CROT, an optimal transport-based imputation algorithm designed to handle patch-based missing data in tabular formats. Our approach effectively captures the underlying data structure in the presence of significant missingness. Notably, it achieves superior imputation accuracy while significantly reducing runtime, demonstrating its scalability and efficiency for large-scale datasets. This work introduces a robust solution for imputation in heterogeneous, high-dimensional datasets with structured data absence, addressing critical challenges in both biological and clinical data analysis. Our code is available at Anomalous Github.
Problem

Research questions and friction points this paper is trying to address.

single-cell sequencing
missing data
patch-based missing
data imputation
structured data absence
Innovation

Methods, ideas, or system contributions that make the work stand out.

optimal transport
cluster regularization
patch-based missing data
single-cell imputation
scalable algorithm
🔎 Similar Papers
No similar papers found.