🤖 AI Summary
Clinical research is often hindered by cumbersome workflows, reliance on programming expertise, and restricted access to sensitive data, limiting its accessibility to non-technical investigators. This work proposes CARIS (Clinical AI Research System), the first framework integrating large language models with the Model Context Protocol (MCP) to enable end-to-end, code-free, and privacy-preserving automated clinical research—from study design and analysis to report generation—while supporting human-in-the-loop iterative refinement. CARIS incorporates Vibe machine learning, literature retrieval, and the TRIPOD+AI reporting framework. Evaluated across three heterogeneous clinical datasets, the system generated study protocols and ethics documentation within 3–4 iterative rounds, achieving 96% coverage in LLM-based report assessments and 82% in manual evaluations, substantially lowering technical barriers while safeguarding data privacy.
📝 Abstract
Clinical research involves labor-intensive processes such as study design, cohort construction, model development, and documentation, requiring domain expertise, programming skills, and access to sensitive patient data. These demands create barriers for clinicians and external researchers conducting data-driven studies. To overcome these limitations, we developed a Clinical Agentic Research Intelligence System (CARIS) that automates the clinical research workflow while preserving data privacy, enabling comprehensive studies without direct access to raw data. CARIS integrates Large Language Models (LLMs) with modular tools via the Model Context Protocol (MCP), enabling natural language-driven orchestration of appropriate tools. Databases remain securely within the MCP server, and users access only the outputs and final research reports. Based on user intent, CARIS automatically executes the full pipeline: research planning, literature search, cohort construction, Institutional Review Board (IRB) documentation, Vibe Machine Learning (ML), and report generation, with iterative human-in-the-loop refinement. We evaluated CARIS on three heterogeneous datasets with distinct clinical tasks. Research plans and IRB documents were finalized within three to four iterations, using evidence from literature and data. The system supported Vibe ML by exploring feature-model combinations, ranking the top ten models, and generating performance visualizations. Final reports showed high completeness based on a checklist derived from the TRIPOD+AI framework, achieving 96% coverage in LLM evaluation and 82% in human evaluation. CARIS demonstrates that agentic AI can transform clinical hypotheses into executable research workflows across heterogeneous datasets. By eliminating the need for coding and direct data access, the system lowers barriers and bridges public and private clinical data environments.