Agentic Automation of BT-RADS Scoring: End-to-End Multi-Agent System for Standardized Brain Tumor Follow-up Assessment

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the limitations of the BT-RADS scoring system, which relies on multidimensional clinical information and suffers from poor inter-rater consistency and low efficiency in manual assessment. To overcome these challenges, the authors propose the first end-to-end multi-agent system that integrates a large language model (LLM) with a convolutional neural network (CNN) to automatically extract clinical text variables, segment tumor volumes, and embed BT-RADS decision logic for context-aware, fully automated classification. Evaluated on 492 follow-up MRI scans of glioma patients, the system achieved an accuracy of 76.0%, representing a statistically significant improvement of 18.5 percentage points over initial clinical assessments (P<0.001). Notably, it attained a positive predictive value of 92.9% for BT-RADS category 4, demonstrating both the efficacy and novelty of the proposed approach.

Technology Category

Application Category

📝 Abstract

The Brain Tumor Reporting and Data System (BT-RADS) standardizes post-treatment MRI response assessment in patients with diffuse gliomas but requires complex integration of imaging trends, medication effects, and radiation timing. This study evaluates an end-to-end multi-agent large language model (LLM) and convolutional neural network (CNN) system for automated BT-RADS classification. A multi-agent LLM system combined with automated CNN-based tumor segmentation was retrospectively evaluated on 509 consecutive post-treatment glioma MRI examinations from a single high-volume center. An extractor agent identified clinical variables (steroid status, bevacizumab status, radiation date) from unstructured clinical notes, while a scorer agent applied BT-RADS decision logic integrating extracted variables with volumetric measurements. Expert reference standard classifications were established by an independent board-certified neuroradiologist. Of 509 examinations, 492 met inclusion criteria. The system achieved 374/492 (76.0%; 95% CI, 72.1%-79.6%) accuracy versus 283/492 (57.5%; 95% CI, 53.1%-61.8%) for initial clinical assessments (+18.5 percentage points; P<.001). Context-dependent categories showed high sensitivity (BT-1b 100%, BT-1a 92.7%, BT-3a 87.5%), while threshold-dependent categories showed moderate sensitivity (BT-3c 74.8%, BT-2 69.2%, BT-4 69.3%, BT-3b 57.1%). For BT-4, positive predictive value was 92.9%. The multi-agent LLM system achieved higher BT-RADS classification agreement with expert reference standard compared to initial clinical scoring, with high accuracy for context-dependent scores and high positive predictive value for BT-4 detection.

Problem

Research questions and friction points this paper is trying to address.

BT-RADS

brain tumor follow-up

automated scoring

diffuse gliomas

MRI response assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent LLM

BT-RADS automation

CNN tumor segmentation