RiskAgent: Autonomous Medical AI Copilot for Generalist Risk Prediction

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Clinical risk prediction demands scalable, generalizable, and evidence-based AI support across diverse specialties and rare conditions. Method: We introduce the first large language model (LLM)-driven clinical AI assistant supporting comprehensive, multi-disease (387 risk scenarios), cross-specialty risk assessment. Our approach comprises: (1) constructing MedRisk, a domain-specific benchmark for medical risk prediction; (2) releasing an open-source, scalable family of medical risk prediction models (1B–70B parameters); and (3) pioneering a deep synergy paradigm integrating LLMs with hundreds of evidence-based clinical calculators and scoring systems—via tool calling, evidence-aware retrieval, risk logic verification, and domain-adaptive fine-tuning. Results: Our method achieves 76.33% accuracy on MedRisk, significantly outperforming commercial models including GPT-4.5 and o1. It improves performance by over 27 percentage points on rare diseases such as idiopathic pulmonary fibrosis (IPF) and attains state-of-the-art results on external distributed diagnostic benchmarks.

Technology Category

Application Category

📝 Abstract

The application of Large Language Models (LLMs) to various clinical applications has attracted growing research attention. However, real-world clinical decision-making differs significantly from the standardized, exam-style scenarios commonly used in current efforts. In this paper, we present the RiskAgent system to perform a broad range of medical risk predictions, covering over 387 risk scenarios across diverse complex diseases, e.g., cardiovascular disease and cancer. RiskAgent is designed to collaborate with hundreds of clinical decision tools, i.e., risk calculators and scoring systems that are supported by evidence-based medicine. To evaluate our method, we have built the first benchmark MedRisk specialized for risk prediction, including 12,352 questions spanning 154 diseases, 86 symptoms, 50 specialties, and 24 organ systems. The results show that our RiskAgent, with 8 billion model parameters, achieves 76.33% accuracy, outperforming the most recent commercial LLMs, o1, o3-mini, and GPT-4.5, and doubling the 38.39% accuracy of GPT-4o. On rare diseases, e.g., Idiopathic Pulmonary Fibrosis (IPF), RiskAgent outperforms o1 and GPT-4.5 by 27.27% and 45.46% accuracy, respectively. Finally, we further conduct a generalization evaluation on an external evidence-based diagnosis benchmark and show that our RiskAgent achieves the best results. These encouraging results demonstrate the great potential of our solution for diverse diagnosis domains. To improve the adaptability of our model in different scenarios, we have built and open-sourced a family of models ranging from 1 billion to 70 billion parameters. Our code, data, and models are all available at https://github.com/AI-in-Health/RiskAgent.

Problem

Research questions and friction points this paper is trying to address.

Develops RiskAgent for broad medical risk prediction.

Creates MedRisk benchmark for evaluating risk prediction.

Outperforms commercial LLMs in accuracy and rare disease prediction.

Innovation

Methods, ideas, or system contributions that make the work stand out.

RiskAgent uses 8 billion parameters for predictions.

Integrates with 387 risk scenarios across diseases.

Outperforms GPT-4.5 with 76.33% accuracy.

🔎 Similar Papers

No similar papers found.