InvCoSS: Inversion-driven Continual Self-supervised Learning in Medical Multi-modal Image Pre-training

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Addressing the conflicting challenges of catastrophic forgetting and data privacy in continual self-supervised pretraining for medical multimodal imaging, this paper proposes a privacy-preserving continual learning framework that eliminates the need for replaying historical real-world data. Our method introduces three key innovations: (1) a novel model-inversion mechanism that synthesizes proxy images to replace real-data replay; (2) InvUNet, a multi-scale reconstruction network designed to enhance the fidelity of generated images; and (3) class-agnostic contrastive learning with repulsion regularization to improve representation diversity and discriminability. Evaluated on nine downstream tasks, our approach matches or surpasses replay-based baselines while reducing storage overhead by 72%. Crucially, it entirely avoids transmission and storage of raw patient data, thereby eliminating privacy leakage risks associated with data sharing or replay.

Technology Category

Application Category

📝 Abstract

Continual self-supervised learning (CSSL) in medical imaging trains a foundation model sequentially, alleviating the need for collecting multi-modal images for joint training and offering promising improvements in downstream performance while preserving data privacy. However, most existing methods still rely on replaying data from previous stages to prevent catastrophic forgetting, which compromises privacy and limits their applicability in real-world scenarios where data transfer across sites is often restricted. In this work, we propose InvCoSS, an inversion-driven continual self-supervised learning framework for medical multi-modal image pre-training. Specifically, after training on a previous task, InvCoSS inverts the pre-trained self-supervised model to generate synthetic images that approximate the original training distribution. These synthetic images are then combined with data from the new task for joint optimization, which effectively mitigates catastrophic forgetting while strictly adhering to the constraint of no access to previous real data. Furthermore, to improve the fidelity of synthetic images, we introduce a novel InvUNet with a multi-scale fusion architecture to restore both high- and low-frequency components of the inverted images. To enhance diversity and prevent mode collapse, we design a repulsive representation-learning mechanism that encourages a diverse feature space for synthetic images without class guidance. Extensive experiments across nine downstream tasks validate the effectiveness of InvCoSS, achieving performance comparable to or even superior to prior data-replay methods while significantly reducing storage requirements and eliminating data privacy constraints.

Problem

Research questions and friction points this paper is trying to address.

Prevents catastrophic forgetting without replaying real data

Generates synthetic images to preserve training distribution

Enhances synthetic image fidelity and diversity for privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic images via model inversion

Uses multi-scale fusion architecture for fidelity

Implements repulsive representation learning for diversity

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Research Scientist — Privacy-Preserving Large-Scale Model Training & Architecture Optimization

TikTok

San Jose, California

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)