Agentic AI in Healthcare and Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-Based Agents

📅 2026-02-04

🏛️ IEEE Access

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study addresses the absence of a systematic evaluation framework for large language model (LLM) agents in healthcare. The authors propose the first seven-dimensional assessment framework tailored to medical AI agents, encompassing cognition, knowledge management, interaction, adaptive learning, safety and ethics, agent architecture, and core clinical tasks. This framework is operationalized into 29 measurable sub-dimensions and applied through a systematic literature review of 49 studies, using a three-tier annotation scheme (fully/partially/not implemented) for quantitative mapping and co-occurrence analysis. Findings reveal that external knowledge integration is widely implemented (76% fully), whereas event-triggered activation (92% not implemented) and drift detection (98% not implemented) are critically underdeveloped. Multi-agent architectures dominate (82% fully), yet action-oriented tasks such as treatment planning remain notably underexplored.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented ✓, Partially Implemented $\Delta $ , Not Implemented ✗), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% ✓) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% ✗) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% ✗). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% ✓) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% ✗). Together, these findings provide an empirical baseline indicating that current agents excel at retrieval-grounded advising but require stronger adaptation and compliance platforms to move from early-stage systems to dependable systems.

Problem

Research questions and friction points this paper is trying to address.

Agentic AI

Healthcare

Large Language Models

Evaluation Framework

Medical AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic AI

LLM-based agents

seven-dimensional taxonomy