🤖 AI Summary
To address the challenges of strong training-data dependency and the privacy–utility trade-off in synthetic data generation (SDG) for few-shot biomedical applications, this paper pioneers a reinforcement learning (RL) formulation of SDG. We propose a stable training paradigm leveraging policy-gradient optimization with discriminator-based reward feedback, employing proximal policy optimization (PPO) to directly optimize stochastic generation policies—eliminating the need for large-scale pretraining or intricate loss design. Evaluated on the AI-READI few-shot benchmark, our method significantly outperforms both GANs and diffusion models; on MIMIC-IV, it achieves utility comparable to diffusion models and superior to GANs, while attaining optimal balance across privacy preservation, data utility, and fidelity. Our core contribution is establishing RL as a novel, principled paradigm for SDG and empirically demonstrating its effectiveness and robustness in resource-constrained clinical settings.
📝 Abstract
Synthetic data generation (SDG) is a promising approach for enabling data sharing in biomedical studies while preserving patient privacy. Yet, state-of-the-art generative models often require large datasets and complex training procedures, limiting their applicability in small-sample settings. In this work, we reframe SDG as a reinforcement learning (RL) problem and introduce RLSyn, a novel framework that models the data generator as a stochastic policy over patient records and optimizes it using Proximal Policy Optimization with discriminator-derived rewards, yielding more stable and data-efficient training. We evaluate RLSyn on two biomedical datasets - AI-READI and MIMIC-IV- and benchmark it against state-of-the-art generative adversarial networks (GANs) and diffusion-based methods across extensive privacy, utility, and fidelity evaluations. RL-Syn performs comparably to diffusion models and outperforms GANs on MIMIC-IV, while outperforming both diffusion models and GANs on the smaller AI-READI dataset. These results demonstrate that reinforcement learning provides a principled and effective alternative for synthetic biomedical data generation, particularly in data-scarce regimes.