Universal CT Representations from Anatomy to Disease Phenotype through Agglomerative Pretraining

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work proposes FlexiCT, a general-purpose foundation model for computed tomography (CT) imaging that overcomes the limitations of existing AI models confined to single tasks and lacking universal representational capacity. FlexiCT leverages a three-stage progressive clustering pretraining strategy on over 260,000 CT scans, integrating 2D axial views, 3D anatomical structures, and report-guided vision–language semantic alignment to learn a unified representation spanning anatomy to disease phenotypes. Evaluated across five diverse downstream tasks, FlexiCT matches or exceeds the performance of specialized models. Its learned embedding space effectively captures clinically relevant phenotypes, such as tumor staging, and the model is accompanied by a large-scale, publicly available resource for CT representation learning.

📝 Abstract

Computed tomography (CT) is a central to three-dimensional medical imaging, yet CT-based artificial intelligence remains fragmented across task-specific models for segmentation, classification, registration, and report analysis. Here we present FlexiCT, a family of CT foundation models trained by agglomerative continual pretraining on 266,227 CT volumes from 56 publicly available datasets, forming a large-scale public resource for CT representation learning. FlexiCT uses agglomerative pretraining across three stages: two-dimensional axial pretraining, three-dimensional anatomical pretraining and report-guided semantic alignment. This training strategy supports slice-level, volume-level and vision-language analysis. Across five downstream task families (segmentation, classification, registration, vision-language understanding and clinical retrieval), FlexiCT matches or exceeds prior task-specific approaches on multiple benchmarks. Its embeddings further organize CT scans along gradients associated with various tumor stages, suggesting that CT foundation models can capture imaging features relevant to disease phenotype characterization. Code is available at https://github.com/ricklisz/FlexiCT

Problem

Research questions and friction points this paper is trying to address.

CT foundation models

medical imaging

representation learning

disease phenotype

universal representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

agglomerative pretraining

CT foundation model

vision-language alignment