🤖 AI Summary
This work addresses the limitations of cell type annotation in single-cell RNA sequencing, which are hindered by the tissue- and state-dependency of marker genes and the absence of reference profiles for novel cellular states. To overcome these challenges, we introduce CellMaster—the first large language model–based AI agent (e.g., leveraging GPT-4o) capable of zero-shot, automatic cell annotation without requiring pretraining or a fixed marker gene database. CellMaster emulates expert reasoning to deliver interpretable annotation rationales and enables real-time human-in-the-loop refinement. Evaluated across nine cross-tissue datasets, CellMaster achieves a 7.1% improvement in annotation accuracy over the best baseline in fully automated mode, which further increases to 18.6% with human-AI collaboration, with gains reaching 22.1% for rare subtypes.
📝 Abstract
Single-cell RNA-seq (scRNA-seq) enables atlas-scale profiling of complex tissues, revealing rare lineages and transient states. Yet, assigning biologically valid cell identities remains a bottleneck because markers are tissue- and state-dependent, and novel states lack references. We present CellMaster, an AI agent that mimics expert practice for zero-shot cell-type annotation. Unlike existing automated tools, CellMaster leverages LLM-encoded knowledge (e.g., GPT-4o) to perform on-the-fly annotation with interpretable rationales, without pre-training or fixed marker databases. Across 9 datasets spanning 8 tissues, CellMaster improved accuracy by 7.1% over best-performing baselines (including CellTypist and scTab) in automatic mode. With human-in-the-loop refinement, this advantage increased to 18.6%, with a 22.1% gain on subtype populations. The system demonstrates particular strength in rare and novel cell states where baselines often fail. Source code and the web application are available at \href{https://github.com/AnonymousGym/CellMaster}{https://github.com/AnonymousGym/CellMaster}.