🤖 AI Summary
Zero-shot cell-type annotation in single-cell RNA sequencing (scRNA-seq) suffers from low accuracy and inconsistent predictions across cell types; existing CLIP-style models (e.g., LangCell) avoid training but exhibit limited performance. This paper proposes a training-free, model-agnostic framework for logit optimization via graph regularization: leveraging zero-shot logits from any pre-trained model, it constructs a task-adapted PCA-kNN graph and performs local label signal propagation and consistency enhancement over the graph. The method introduces no additional parameters and is compatible with arbitrary logit-generating models. Evaluated systematically across 14 benchmark scRNA-seq datasets, it significantly improves zero-shot annotation accuracy—achieving up to a 10 percentage-point absolute gain—and notably enhances discriminative consistency for ambiguous cell types. This work establishes an efficient, robust paradigm for large-scale single-cell annotation.
📝 Abstract
Cell type annotation is a fundamental step in the analysis of single-cell RNA sequencing (scRNA-seq) data. In practice, human experts often rely on the structure revealed by principal component analysis (PCA) followed by $k$-nearest neighbor ($k$-NN) graph construction to guide annotation. While effective, this process is labor-intensive and does not scale to large datasets. Recent advances in CLIP-style models offer a promising path toward automating cell type annotation. By aligning scRNA-seq profiles with natural language descriptions, models like LangCell enable zero-shot annotation. While LangCell demonstrates decent zero-shot performance, its predictions remain suboptimal, particularly in achieving consistent accuracy across all cell types. In this paper, we propose to refine the zero-shot logits produced by LangCell through a graph-regularized optimization framework. By enforcing local consistency over the task-specific PCA-based k-NN graph, our method combines the scalability of the pre-trained models with the structural robustness relied upon in expert annotation. We evaluate our approach on 14 annotated human scRNA-seq datasets from 4 distinct studies, spanning 11 organs and over 200,000 single cells. Our method consistently improves zero-shot annotation accuracy, achieving accuracy gains of up to 10%. Further analysis showcase the mechanism by which GRIT effectively propagates correct signals through the graph, pulling back mislabeled cells toward more accurate predictions. The method is training-free, model-agnostic, and serves as a simple yet effective plug-in for enhancing automated cell type annotation in practice.