🤖 AI Summary
Current RNA-seq clustering interpretation methods suffer from two major limitations: (1) enrichment-based approaches yield overly broad, mechanistically shallow results; and (2) LLM-only methods frequently generate unsupported assertions and citation hallucinations, undermining reproducibility and biological plausibility. To address these issues, we propose the first literature-anchored multi-agent framework—comprising retrieval, explanation, and critique agents—that integrates large language models with PubMed/UniProt knowledge retrieval and rigorous evidence validation. Our framework enables evidence-driven hypothesis generation and quantifies uncertainty in functional interpretations. It substantially suppresses spurious associations and fabricated citations, producing auditable, reproducible, and literature-supported functional annotations for RNA-seq clusters. Evaluated on *Salmonella* RNA-seq data, it advances clustering interpretation from statistical description toward mechanistically traceable, biologically grounded hypothesis generation.
📝 Abstract
We propose CITE V.1, an agentic, evidence-grounded framework that leverages Large Language Models (LLMs) to provide transparent and reproducible interpretations of RNA-seq clusters. Unlike existing enrichment-based approaches that reduce results to broad statistical associations and LLM-only models that risk unsupported claims or fabricated citations, CITE V.1 transforms cluster interpretation by producing biologically coherent explanations explicitly anchored in the biomedical literature. The framework orchestrates three specialized agents: a Retriever that gathers domain knowledge from PubMed and UniProt, an Interpreter that formulates functional hypotheses, and Critics that evaluate claims, enforce evidence grounding, and qualify uncertainty through confidence and reliability indicators. Applied to Salmonella enterica RNA-seq data, CITE V.1 generated biologically meaningful insights supported by the literature, while an LLM-only Gemini baseline frequently produced speculative results with false citations. By moving RNA-seq analysis from surface-level enrichment to auditable, interpretable, and evidence-based hypothesis generation, CITE V.1 advances the transparency and reliability of AI in biomedicine.