🤖 AI Summary
Single-cell RNA sequencing (scRNA-seq) resolves cellular heterogeneity but faces challenges in functionally interpreting low-annotation gene sets—particularly those with poorly characterized biological roles. Conventional enrichment methods (e.g., GSEA) suffer from limited generalizability due to reliance on predefined gene sets, while large language models (LLMs) struggle to structurally integrate ontological knowledge. To address this, we propose BRAINCELL-AID, a multi-agent system that synergistically combines retrieval-augmented generation (RAG) with PubMed literature retrieval, enabling joint modeling of free-text descriptions and ontology-based labels via collaborative LLMs and domain-specific agents. Evaluated on mouse scRNA-seq data, BRAINCELL-AID achieves 77% top-1 annotation accuracy and functionally annotates 5,322 brain cell clusters. It reveals, for the first time, basal ganglia–specific neuronal subtypes and spatially resolved co-expression patterns, establishing a novel cross-species paradigm for interpretable, collaborative single-cell annotation.
📝 Abstract
Single-cell RNA sequencing has transformed our ability to identify diverse cell types and their transcriptomic signatures. However, annotating these signatures-especially those involving poorly characterized genes-remains a major challenge. Traditional methods, such as Gene Set Enrichment Analysis (GSEA), depend on well-curated annotations and often perform poorly in these contexts. Large Language Models (LLMs) offer a promising alternative but struggle to represent complex biological knowledge within structured ontologies. To address this, we present BRAINCELL-AID (BRAINCELL-AID: https://biodataai.uth.edu/BRAINCELL-AID), a novel multi-agent AI system that integrates free-text descriptions with ontology labels to enable more accurate and robust gene set annotation. By incorporating retrieval-augmented generation (RAG), we developed a robust agentic workflow that refines predictions using relevant PubMed literature, reducing hallucinations and enhancing interpretability. Using this workflow, we achieved correct annotations for 77% of mouse gene sets among their top predictions. Applying this approach, we annotated 5,322 brain cell clusters from the comprehensive mouse brain cell atlas generated by the BRAIN Initiative Cell Census Network, enabling novel insights into brain cell function by identifying region-specific gene co-expression patterns and inferring functional roles of gene ensembles. BRAINCELL-AID also identifies Basal Ganglia-related cell types with neurologically meaningful descriptions. Hence, we create a valuable resource to support community-driven cell type annotation.