🤖 AI Summary
This work addresses catastrophic forgetting in continual learning for pathology report generation caused by sequential fine-tuning. It proposes the first exemplar-free continual learning framework that operates without storing original whole-slide images or sample reports. The method constructs compact “domain footprints” in a frozen patch embedding space by integrating a morphological token codebook, co-occurrence statistics, and prior knowledge to synthesize pseudo-whole-slide image representations. Pseudo-reports generated via teacher snapshots provide supervision for training, while language style descriptors are introduced to accommodate evolving reporting conventions. The framework further enables adaptive inference without explicit domain labels. Experiments demonstrate that the approach significantly outperforms existing exemplar-free and limited-buffer replay baselines across multiple public continual learning benchmarks, validating the effectiveness of footprint-guided generative replay for dynamic clinical deployment.
📝 Abstract
Rapid progress in vision-language modeling has enabled pathology report generation from gigapixel whole-slide images, but most approaches assume static training with simultaneous access to all data. In clinical deployment, however, new organs, institutions, and reporting conventions emerge over time, and sequential fine-tuning can cause catastrophic forgetting. We introduce an exemplar-free continual learning framework for WSI-to-report generation that avoids storing raw slides or patch exemplars. The core idea is a compact domain footprint built in a frozen patch-embedding space: a small codebook of representative morphology tokens together with slide-level co-occurrence summaries and lightweight patch-count priors. These footprints support generative replay by synthesizing pseudo-WSI representations that reflect domain-specific morphological mixtures, while a teacher snapshot provides pseudo-reports to supervise the updated model without retaining past data. To address shifting reporting conventions, we distill domain-specific linguistic characteristics into a compact style descriptor and use it to steer generation. At inference, the model identifies the most compatible descriptor directly from the slide signal, enabling domain-agnostic setup without requiring explicit domain identifiers. Evaluated across multiple public continual learning benchmarks, our approach outperforms exemplar-free and limited-buffer rehearsal baselines, highlighting footprint-based generative replay as a practical solution for deployment in evolving clinical settings.