Democratising Clinical AI through Dataset Condensation for Classical Clinical Models

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of privacy-preserving medical data sharing by proposing the first differentially private dataset condensation framework tailored for non-differentiable clinical models—such as decision trees and Cox regression—which are widely used in practice but incompatible with existing gradient-based condensation methods. Leveraging zeroth-order optimization, the approach generates synthetic data using only model outputs, eliminating the need for gradient information. Evaluated across six datasets spanning classification and survival analysis tasks, the method demonstrates strong performance preservation under rigorous privacy guarantees. This study achieves, for the first time, model-agnostic, high-fidelity medical data condensation that is both privacy-safe and clinically deployable, thereby advancing data democratization within real-world clinical modeling ecosystems.

Technology Category

Application Category

📝 Abstract
Dataset condensation (DC) learns a compact synthetic dataset that enables models to match the performance of full-data training, prioritising utility over distributional fidelity. While typically explored for computational efficiency, DC also holds promise for healthcare data democratisation, especially when paired with differential privacy, allowing synthetic data to serve as a safe alternative to real records. However, existing DC methods rely on differentiable neural networks, limiting their compatibility with widely used clinical models such as decision trees and Cox regression. We address this gap using a differentially private, zero-order optimisation framework that extends DC to non-differentiable models using only function evaluations. Empirical results across six datasets, including both classification and survival tasks, show that the proposed method produces condensed datasets that preserve model utility while providing effective differential privacy guarantees - enabling model-agnostic data sharing for clinical prediction tasks without exposing sensitive patient information.
Problem

Research questions and friction points this paper is trying to address.

dataset condensation
clinical AI
non-differentiable models
differential privacy
data democratisation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dataset condensation
differential privacy
zero-order optimization
non-differentiable models
clinical AI democratization
🔎 Similar Papers
No similar papers found.