Interpretable RNA-Seq Clustering with an LLM-Based Agentic Evidence-Grounded Framework

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current RNA-seq clustering interpretation methods suffer from two major limitations: (1) enrichment-based approaches yield overly broad, mechanistically shallow results; and (2) LLM-only methods frequently generate unsupported assertions and citation hallucinations, undermining reproducibility and biological plausibility. To address these issues, we propose the first literature-anchored multi-agent framework—comprising retrieval, explanation, and critique agents—that integrates large language models with PubMed/UniProt knowledge retrieval and rigorous evidence validation. Our framework enables evidence-driven hypothesis generation and quantifies uncertainty in functional interpretations. It substantially suppresses spurious associations and fabricated citations, producing auditable, reproducible, and literature-supported functional annotations for RNA-seq clusters. Evaluated on *Salmonella* RNA-seq data, it advances clustering interpretation from statistical description toward mechanistically traceable, biologically grounded hypothesis generation.

Technology Category

Application Category

📝 Abstract
We propose CITE V.1, an agentic, evidence-grounded framework that leverages Large Language Models (LLMs) to provide transparent and reproducible interpretations of RNA-seq clusters. Unlike existing enrichment-based approaches that reduce results to broad statistical associations and LLM-only models that risk unsupported claims or fabricated citations, CITE V.1 transforms cluster interpretation by producing biologically coherent explanations explicitly anchored in the biomedical literature. The framework orchestrates three specialized agents: a Retriever that gathers domain knowledge from PubMed and UniProt, an Interpreter that formulates functional hypotheses, and Critics that evaluate claims, enforce evidence grounding, and qualify uncertainty through confidence and reliability indicators. Applied to Salmonella enterica RNA-seq data, CITE V.1 generated biologically meaningful insights supported by the literature, while an LLM-only Gemini baseline frequently produced speculative results with false citations. By moving RNA-seq analysis from surface-level enrichment to auditable, interpretable, and evidence-based hypothesis generation, CITE V.1 advances the transparency and reliability of AI in biomedicine.
Problem

Research questions and friction points this paper is trying to address.

Providing transparent RNA-seq cluster interpretations using LLM agents
Generating evidence-grounded biological explanations from literature sources
Replacing statistical associations with auditable hypothesis generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic framework using LLMs for RNA-seq interpretation
Specialized agents retrieve and evaluate biomedical evidence
Generates biologically coherent explanations with uncertainty indicators
🔎 Similar Papers
No similar papers found.