🤖 AI Summary
The explosive growth of single-cell omics has created major challenges in cell type annotation and cross-dataset integration, demanding a standardized, FAIR-compliant, and cross-species ontology framework. To address this, we systematically upgraded the Cell Ontology (CL): (1) integrating classical morphological and transcriptomic definitions of cell types for the first time; (2) incorporating large language models (LLMs) to assist term extraction, logical validation, and relational inference—enhancing both efficiency and consistency in ontology curation; and (3) establishing deep semantic interoperability with international initiatives including the Human Cell Atlas and the Brain Initiative Cell Census Network (BICCN). The optimized CL demonstrates markedly improved compatibility and coverage within widely used single-cell analysis tools such as Scanpy and Seurat. It now underpins data reusability in over 100 studies and accelerates standardization of cell types across modalities and species.
📝 Abstract
Single-cell omics technologies have transformed our understanding of cellular diversity by enabling high-resolution profiling of individual cells. However, the unprecedented scale and heterogeneity of these datasets demand robust frameworks for data integration and annotation. The Cell Ontology (CL) has emerged as a pivotal resource for achieving FAIR (Findable, Accessible, Interoperable, and Reusable) data principles by providing standardized, species-agnostic terms for canonical cell types - forming a core component of a wide range of platforms and tools. In this paper, we describe the wide variety of uses of CL in these platforms and tools and detail ongoing work to improve and extend CL content including the addition of transcriptomically defined types, working closely with major atlasing efforts including the Human Cell Atlas and the Brain Initiative Cell Atlas Network to support their needs. We cover the challenges and future plans for harmonising classical and transcriptomic cell type definitions, integrating markers and using Large Language Models (LLMs) to improve content and efficiency of CL workflows.