🤖 AI Summary
This study addresses the limited interpretability of existing rare cell identification methods in single-cell transcriptomics, which typically rely on opaque dimensionality reduction techniques (e.g., PCA) and black-box anomaly detectors, offering little insight at the gene level. To overcome this, we propose an end-to-end, interpretable anomaly detection framework that operates directly in the original high-dimensional gene expression space without requiring dimensionality reduction. Our method jointly optimizes anomaly detection and feature attribution, enabling precise identification of rare cells while simultaneously highlighting their key discriminative genes. By integrating state-of-the-art interpretable anomaly detection into single-cell analysis for the first time, our approach not only locates rare cells but also links them to their nearest normal neighbors, providing intuitive, biologically meaningful gene-level explanations that substantially enhance both interpretability and practical utility.
📝 Abstract
The detection of rare cell types in single-cell transcriptomics data is crucial for elucidating disease pathogenesis and tissue development dynamics. However, a critical gap that persists in current methods is their inability to provide an explanation based on genes for each cell they have detected as rare. We identify three primary sources of this deficiency. First, the anomaly detectors often function as"black boxes", designed to detect anomalies but unable to explain why a cell is anomalous. Second, the standard analytical framework hinders interpretability by relying on dimensionality reduction techniques, such as Principal Component Analysis (PCA), which transform meaningful gene expression data into abstract, uninterpretable features. Finally, existing explanation algorithms cannot be readily applied to this domain, as single-cell data is characterized by high dimensionality, noise, and substantial sparsity. To overcome these limitations, we introduce a framework for explainable anomaly detection in single-cell transcriptomics data which not only identifies individual anomalies, but also provides a visual explanation based on genes that makes an instance anomalous. This framework has two key ingredients that are not existed in current methods applied in this domain. First, it eliminates the PCA step which is deemed to be an essential component in previous studies. Second, it employs the state-of-art anomaly detector and explainer as the efficient and effective means to find each rare cell and the relevant gene subspace in order to provide explanations for each rare cell as well as the typical normal cell associated with the rare cell's closest normal cells.