A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Single-cell RNA sequencing (scRNA-seq) resolves cellular heterogeneity but faces challenges in functionally interpreting low-annotation gene sets—particularly those with poorly characterized biological roles. Conventional enrichment methods (e.g., GSEA) suffer from limited generalizability due to reliance on predefined gene sets, while large language models (LLMs) struggle to structurally integrate ontological knowledge. To address this, we propose BRAINCELL-AID, a multi-agent system that synergistically combines retrieval-augmented generation (RAG) with PubMed literature retrieval, enabling joint modeling of free-text descriptions and ontology-based labels via collaborative LLMs and domain-specific agents. Evaluated on mouse scRNA-seq data, BRAINCELL-AID achieves 77% top-1 annotation accuracy and functionally annotates 5,322 brain cell clusters. It reveals, for the first time, basal ganglia–specific neuronal subtypes and spatially resolved co-expression patterns, establishing a novel cross-species paradigm for interpretable, collaborative single-cell annotation.

Technology Category

Application Category

📝 Abstract
Single-cell RNA sequencing has transformed our ability to identify diverse cell types and their transcriptomic signatures. However, annotating these signatures-especially those involving poorly characterized genes-remains a major challenge. Traditional methods, such as Gene Set Enrichment Analysis (GSEA), depend on well-curated annotations and often perform poorly in these contexts. Large Language Models (LLMs) offer a promising alternative but struggle to represent complex biological knowledge within structured ontologies. To address this, we present BRAINCELL-AID (BRAINCELL-AID: https://biodataai.uth.edu/BRAINCELL-AID), a novel multi-agent AI system that integrates free-text descriptions with ontology labels to enable more accurate and robust gene set annotation. By incorporating retrieval-augmented generation (RAG), we developed a robust agentic workflow that refines predictions using relevant PubMed literature, reducing hallucinations and enhancing interpretability. Using this workflow, we achieved correct annotations for 77% of mouse gene sets among their top predictions. Applying this approach, we annotated 5,322 brain cell clusters from the comprehensive mouse brain cell atlas generated by the BRAIN Initiative Cell Census Network, enabling novel insights into brain cell function by identifying region-specific gene co-expression patterns and inferring functional roles of gene ensembles. BRAINCELL-AID also identifies Basal Ganglia-related cell types with neurologically meaningful descriptions. Hence, we create a valuable resource to support community-driven cell type annotation.
Problem

Research questions and friction points this paper is trying to address.

Automating annotation of poorly characterized gene signatures in single-cell RNA sequencing
Overcoming limitations of traditional methods and LLMs in biological knowledge representation
Creating accurate brain cell type annotations through multi-agent AI collaboration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent AI system integrates text and ontology labels
Retrieval-augmented generation refines predictions using literature
Workflow achieves 77% accuracy in mouse gene annotation
🔎 Similar Papers
No similar papers found.
R
Rongbin Li
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.
W
Wenbo Chen
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.
Z
Zhao Li
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.
R
Rodrigo Munoz-Castaneda
Appel Alzheimer ’s Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, 10021, USA.
J
Jinbo Li
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.
N
Neha S. Maurya
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.
A
Arnav Solanki
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.
H
Huan He
Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, 06510, USA.
Hanwen Xing
Hanwen Xing
University of Oxford
Bayesian statisticsComputational statistics
M
Meaghan Ramlakhan
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.
Z
Zachary Wise
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.
Z
Zhuhao Wu
Appel Alzheimer ’s Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, New York, 10021, USA.
H
Hua Xu
Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT, 06510, USA.
Michael Hawrylycz
Michael Hawrylycz
Allen Institute for Brain Science, Seattle, Washington, USA.
W
W. Jim Zheng
McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030 , USA.