A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Single-cell RNA sequencing (scRNA-seq) resolves cellular heterogeneity but faces challenges in functionally interpreting low-annotation gene sets—particularly those with poorly characterized biological roles. Conventional enrichment methods (e.g., GSEA) suffer from limited generalizability due to reliance on predefined gene sets, while large language models (LLMs) struggle to structurally integrate ontological knowledge. To address this, we propose BRAINCELL-AID, a multi-agent system that synergistically combines retrieval-augmented generation (RAG) with PubMed literature retrieval, enabling joint modeling of free-text descriptions and ontology-based labels via collaborative LLMs and domain-specific agents. Evaluated on mouse scRNA-seq data, BRAINCELL-AID achieves 77% top-1 annotation accuracy and functionally annotates 5,322 brain cell clusters. It reveals, for the first time, basal ganglia–specific neuronal subtypes and spatially resolved co-expression patterns, establishing a novel cross-species paradigm for interpretable, collaborative single-cell annotation.

Technology Category

Application Category

📝 Abstract

Single-cell RNA sequencing has transformed our ability to identify diverse cell types and their transcriptomic signatures. However, annotating these signatures-especially those involving poorly characterized genes-remains a major challenge. Traditional methods, such as Gene Set Enrichment Analysis (GSEA), depend on well-curated annotations and often perform poorly in these contexts. Large Language Models (LLMs) offer a promising alternative but struggle to represent complex biological knowledge within structured ontologies. To address this, we present BRAINCELL-AID (BRAINCELL-AID: https://biodataai.uth.edu/BRAINCELL-AID), a novel multi-agent AI system that integrates free-text descriptions with ontology labels to enable more accurate and robust gene set annotation. By incorporating retrieval-augmented generation (RAG), we developed a robust agentic workflow that refines predictions using relevant PubMed literature, reducing hallucinations and enhancing interpretability. Using this workflow, we achieved correct annotations for 77% of mouse gene sets among their top predictions. Applying this approach, we annotated 5,322 brain cell clusters from the comprehensive mouse brain cell atlas generated by the BRAIN Initiative Cell Census Network, enabling novel insights into brain cell function by identifying region-specific gene co-expression patterns and inferring functional roles of gene ensembles. BRAINCELL-AID also identifies Basal Ganglia-related cell types with neurologically meaningful descriptions. Hence, we create a valuable resource to support community-driven cell type annotation.

Problem

Research questions and friction points this paper is trying to address.

Automating annotation of poorly characterized gene signatures in single-cell RNA sequencing

Overcoming limitations of traditional methods and LLMs in biological knowledge representation

Creating accurate brain cell type annotations through multi-agent AI collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent AI system integrates text and ontology labels

Retrieval-augmented generation refines predictions using literature

Workflow achieves 77% accuracy in mouse gene annotation

🔎 Similar Papers

Toward a Team of AI-made Scientists for Scientific Discovery from Gene Expression Data