🤖 AI Summary
Medical imaging data sharing is severely constrained by privacy regulations and institutional barriers, while existing data distillation methods largely neglect higher-order intermediate states during optimization. To address this, we propose a trajectory-aware medical image data distillation framework. First, we design a shape-aware potential function to explicitly model the geometric structure of parameter optimization trajectories. Second, we introduce a progressive, difficulty-aware higher-order trajectory matching strategy that jointly leverages gradient, curvature, and other higher-order dynamical information. Third, we integrate latent-space distillation with shape regularization to produce compact, high-fidelity, and privacy-preserving synthetic datasets. Evaluated across multiple medical image classification tasks, models trained on our distilled data achieve accuracy within 0.8% (on average) of those trained on full-scale real data—significantly outperforming state-of-the-art distillation methods. Our approach establishes a new paradigm for efficient, privacy-compliant model training in sensitive clinical settings.
📝 Abstract
Medical image analysis faces significant challenges in data sharing due to privacy regulations and complex institutional protocols. Dataset distillation offers a solution to address these challenges by synthesizing compact datasets that capture essential information from real, large medical datasets. Trajectory matching has emerged as a promising methodology for dataset distillation; however, existing methods primarily focus on terminal states, overlooking crucial information in intermediate optimization states. We address this limitation by proposing a shape-wise potential that captures the geometric structure of parameter trajectories, and an easy-to-complex matching strategy that progressively addresses parameters based on their complexity. Experiments on medical image classification tasks demonstrate that our method improves distillation performance while preserving privacy and maintaining model accuracy comparable to training on the original datasets. Our code is available at https://github.com/Bian-jh/HoP-TM.