Transplant-Ready? Evaluating AI Lung Segmentation Models in Candidates with Severe Lung Disease

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This study systematically evaluates, for the first time, the generalizability of three publicly available deep learning models—UNet-R231, TotalSegmentator, and MedSAM—for lung segmentation in lung transplant candidates, with emphasis on disease severity, pathological subtypes (e.g., COPD, ILD, CF), and inter-lung asymmetry. Performance is assessed using Dice coefficient, volume similarity, Hausdorff distance, and a clinician-rated acceptability scale. Results show UNet-R231 achieves the best overall performance; however, all models exhibit significant degradation in moderate-to-severe cases—particularly in volume similarity (>15% reduction)—limiting their reliability for preoperative quantitative assessment. The key contribution is the empirical demonstration that current general-purpose models lack robustness to severe pulmonary structural deformation. This underscores the necessity of pathology- and severity-informed model optimization using high-quality, multi-pathology, pre-transplant CT data—providing critical evidence for clinically viable AI deployment in thoracic transplantation.

Technology Category

Application Category

📝 Abstract

This study evaluates publicly available deep-learning based lung segmentation models in transplant-eligible patients to determine their performance across disease severity levels, pathology categories, and lung sides, and to identify limitations impacting their use in preoperative planning in lung transplantation. This retrospective study included 32 patients who underwent chest CT scans at Duke University Health System between 2017 and 2019 (total of 3,645 2D axial slices). Patients with standard axial CT scans were selected based on the presence of two or more lung pathologies of varying severity. Lung segmentation was performed using three previously developed deep learning models: Unet-R231, TotalSegmentator, MedSAM. Performance was assessed using quantitative metrics (volumetric similarity, Dice similarity coefficient, Hausdorff distance) and a qualitative measure (four-point clinical acceptability scale). Unet-R231 consistently outperformed TotalSegmentator and MedSAM in general, for different severity levels, and pathology categories (p<0.05). All models showed significant performance declines from mild to moderate-to-severe cases, particularly in volumetric similarity (p<0.05), without significant differences among lung sides or pathology types. Unet-R231 provided the most accurate automated lung segmentation among evaluated models with TotalSegmentator being a close second, though their performance declined significantly in moderate-to-severe cases, emphasizing the need for specialized model fine-tuning in severe pathology contexts.

Problem

Research questions and friction points this paper is trying to address.

Evaluating AI lung segmentation for transplant candidates

Assessing model performance across disease severity levels

Identifying limitations for preoperative planning use

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated three deep learning segmentation models

Assessed performance across disease severity levels

Identified Unet-R231 as most accurate model

🔎 Similar Papers

Developing a Dual-Stage Vision Transformer Model for Lung Disease Classification