OWT: A Foundational Organ-Wise Tokenization Framework for Medical Imaging

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical imaging models often rely on opaque, holistic embeddings, leading to semantic entanglement, poor interpretability, and limited generalization. To address this, we propose the Organ-aware Tokenization (OWT) framework—an organ-level tokenization approach that explicitly disentangles input images into independent token groups corresponding to anatomical organs. OWT introduces an organ-perceptive token grouping mechanism and a novel Token Group-based Reconstruction training paradigm, integrating organ-aware masked reconstruction, contrastive semantic alignment, and joint CT/MRI multimodal optimization. Evaluated across multiple benchmarks, OWT significantly improves image reconstruction fidelity and organ segmentation accuracy. Notably, it enables—for the first time—organ-controllable semantic generation and cross-modal semantic retrieval. Compared to conventional embedding methods, OWT delivers superior clinical interpretability and enhanced generalization to downstream tasks, establishing a new foundation for anatomy-aware representation learning in medical imaging.

Technology Category

Application Category

📝 Abstract
Recent advances in representation learning often rely on holistic, black-box embeddings that entangle multiple semantic components, limiting interpretability and generalization. These issues are especially critical in medical imaging. To address these limitations, we propose an Organ-Wise Tokenization (OWT) framework with a Token Group-based Reconstruction (TGR) training paradigm. Unlike conventional approaches that produce holistic features, OWT explicitly disentangles an image into separable token groups, each corresponding to a distinct organ or semantic entity. Our design ensures each token group encapsulates organ-specific information, boosting interpretability, generalization, and efficiency while allowing fine-grained control in downstream tasks. Experiments on CT and MRI datasets demonstrate the effectiveness of OWT in not only achieving strong image reconstruction and segmentation performance, but also enabling novel semantic-level generation and retrieval applications that are out of reach for standard holistic embedding methods. These findings underscore the potential of OWT as a foundational framework for semantically disentangled representation learning, offering broad scalability and applicability to real-world medical imaging scenarios and beyond.
Problem

Research questions and friction points this paper is trying to address.

Disentangling medical images into organ-specific tokens for interpretability
Improving generalization and efficiency in medical imaging tasks
Enabling fine-grained control in downstream medical applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Organ-Wise Tokenization for semantic disentanglement
Token Group-based Reconstruction training paradigm
Organ-specific token groups boost interpretability
🔎 Similar Papers
No similar papers found.
Sifan Song
Sifan Song
Post-Doc, Massachusetts General Hospital
Medical Image Analysis
S
Siyeop Yoon
Center for Advanced Medical Computing and Analysis (CAMCA), Massachusetts General Hospital and Harvard Medical School, Boston, USA
Pengfei Jin
Pengfei Jin
Peking University
Image processingDeep learning
Sekeun Kim
Sekeun Kim
Massachusetts General Hospital / Harvard Medical School
Medical imaging computingCardiovascular AIFoundation ModelGenerative Model
M
Matthew Tivnan
Center for Advanced Medical Computing and Analysis (CAMCA), Massachusetts General Hospital and Harvard Medical School, Boston, USA
Yujin Oh
Yujin Oh
Harvard Medical School & Massachusetts General Hospital
Medical Image AnalysisArtificial IntelligenceLarge Language ModelMultimodal AI
R
Runqi Meng
Center for Advanced Medical Computing and Analysis (CAMCA), Massachusetts General Hospital and Harvard Medical School, Boston, USA
L
Ling Chen
Center for Advanced Medical Computing and Analysis (CAMCA), Massachusetts General Hospital and Harvard Medical School, Boston, USA
Z
Zhiliang Lyu
Center for Advanced Medical Computing and Analysis (CAMCA), Massachusetts General Hospital and Harvard Medical School, Boston, USA
Dufan Wu
Dufan Wu
Associate Professor of Radiology, The Ohio State University
Machine LearningPhoton Counting CTImage ReconstructionMedical Image Analysis
N
Ning Guo
Center for Advanced Medical Computing and Analysis (CAMCA), Massachusetts General Hospital and Harvard Medical School, Boston, USA
X
Xiang Li
Center for Advanced Medical Computing and Analysis (CAMCA), Massachusetts General Hospital and Harvard Medical School, Boston, USA
Quanzheng Li
Quanzheng Li
Massachusetts General Hospital, Harvard Medical School
Image ReconstructionMedical Image AnalysisDeep Learning in MedicineMultimodality Medical Data Analysis