How Far Has AI Come in Liver Fibrosis Staging? A Large-Scale Real-World Dataset and Benchmark

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic evaluation of AI performance in staging liver fibrosis within real-world, multicenter, and heterogeneous clinical settings. To this end, we constructed LiFS—the first large-scale multicenter dataset comprising complete gadoxetic acid–enhanced multiphase MRI sequences paired with histopathological reference standards—and leveraged the MICCAI 2025 CARE-Liver Challenge to systematically benchmark nine AI approaches. Through strategies including multiseries registration, multimodal fusion, and diverse backbone architectures with varying input dimensionalities, the top-performing model achieved diagnostic accuracy comparable to that of experienced radiologists and significantly outperformed junior readers. Our findings highlight inter-center heterogeneity, label imbalance, and variability in contrast-enhancement protocols as key challenges, offering critical benchmarks and insights for future clinical deployment of AI in liver fibrosis assessment.
📝 Abstract
Despite years of methodological progress, how far AI has come in liver fibrosis staging has never been systematically evaluated under the heterogeneous, multi-center conditions that define clinical practice. To address this gap, we introduce LiFS, a large-scale dataset and benchmark derived from the MICCAI 2025 CARE-Liver challenge, comprising 610 patients across multiple centers and scanners with multi-sequence MRI. To the best of our knowledge, LiFS is the first benchmark providing complete gadoxetic acid-enhanced sequences with histopathology-confirmed annotations from diverse real-world scanners. Through systematic evaluation of 9 independently developed methods selected from 96 registered teams against in-cohort radiologist reference results, our findings address how far current AI has progressed toward clinical-level liver fibrosis staging from three complementary perspectives. First, against radiologists, the best AI methods were broadly comparable to the senior radiologist and significantly exceeded the junior radiologist in selected settings, while median AI performance generally approached junior-radiologist levels. Second, from a data perspective, cross-center heterogeneity, label imbalance, and contrast-enhanced sequence variability emerge as the dominant challenges for AI methods. Third, from a technical perspective, methodological design choices, including spatial registration, input dimensionality, multi-modal fusion strategy, and backbone architecture, appear to modulate cross-center robustness, although no single choice alone closes the gap. Overall, LiFS provides a rigorous real-world benchmark for positioning the current state of AI in liver fibrosis staging and for enabling future research on the key challenges that limit clinically reliable deployment.
Problem

Research questions and friction points this paper is trying to address.

liver fibrosis staging
AI evaluation
multi-center heterogeneity
real-world benchmark
clinical deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

liver fibrosis staging
multi-center benchmark
gadoxetic acid-enhanced MRI
real-world AI evaluation
cross-center robustness
🔎 Similar Papers
No similar papers found.
Yuanye Liu
Yuanye Liu
Fudan University
Computer VisionMedical Image Analysis
N
Nannan Shi
Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
Z
Zhejia Zhang
Department of Electrical and Computer Engineering, Northwestern University, Evanston, USA
Hanxiao Zhang
Hanxiao Zhang
Nanjing University
Boya Wang
Boya Wang
HHWF Postdoctoral Fellow, California Institute of Technology
Molecular programmingDNA computingThermodynamicsDNA nanotechnology
D
Derong Yu
Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
N
Nao Wang
College of Computer Science and Technology, Huaqiao University, Xiamen, China
Y
Yuxin Jin
School of Electronic Information (School of Artificial Intelligence), Northwest University, Xi’an, China
Y
Yang Zhou
Department of Mechanical Engineering, University College London, London, UK
K
Kunhao Yuan
Institute of Neuroscience and Cardiovascular Research, University of Edinburgh, Edinburgh, UK
S
Siqi Wang
CAS Center for Excellence in Nanoscience, National Center for Nanoscience and Technology, Beijing, China
L
Lida Yang
School of Control Science and Engineering, Shandong University, Jinan, China
X
Xu Qiao
School of Control Science and Engineering, Shandong University, Jinan, China
Wentao Liu
Wentao Liu
School of Artificial Intelligence, Beijing University of Posts and Telecommunications,
Medical image analysisSurgical navigation
X
Xuelei He
School of Electronic Information (School of Artificial Intelligence), Northwest University, Xi’an, China
Xin Hong
Xin Hong
University of Technology Sydney
Quantum computing
G
Guoyan Zheng
Institute of Medical Robotics, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
Xin Chen
Xin Chen
Associate Professor, University of Nottingham
Medical Image AnalysisComputer VisionMachine Learning
G
Guang-Zhong Yang
Shanghai Key Laboratory of Flexible Medical Robotics, Tongren Hospital, Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
Le Zhang
Le Zhang
Assistant Professor, University of Birmingham
Medical Image ComputingGenerative AIMedical LLMsDigital Healthcare
Lei Li
Lei Li
Digital Heart Lab, NUS
AI for HealthcareDigital TwinsMedical ImagingMultimodal AIComputational Cardiology
Y
Yuxin Shi
Department of Radiology, Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
Xiahai Zhuang
Xiahai Zhuang
Professor, School of Data Science, Fudan University
medical image analysisAI in MedicineInterpretabilityExplainable AI