🤖 AI Summary
Existing diffusion models for solving imaging inverse problems rely heavily on heuristic strategies to schedule data consistency guidance, classifier-free guidance, and stochasticity, which limits reconstruction performance. This work formulates posterior sampling as a time-varying control problem and introduces, for the first time, a ternary dynamic co-scheduling framework. By analyzing the temporal interactions among these three components, the framework employs a time-adaptive scheduling policy that initially suppresses data consistency and stochasticity while progressively strengthening classifier-free guidance. Coupled with template-based function prior search and Group Relative Policy Optimization (GRPO) reinforcement learning, the method automatically learns optimal time-dependent scheduling curves, significantly outperforming existing approaches in both data fidelity and perceptual quality.
📝 Abstract
Generative posterior sampling using diffusion models has emerged as a dominant paradigm for solving inverse problems in imaging, which usually consists of three main components: data consistency (DC) guidance, classifier-free guidance (CFG) and stochasticity. While prior arts have focused on how to develop each or all components, less attention has given to how to schedule them, leading to heuristically fixed or partially adjusted suboptimal schedules. In this work, we argue that the interactions among all three components in terms of scheduling are crucial for significantly improved performance in solving inverse problems in imaging. Our analysis shows that aggressive CFG early in sampling conflict with DC guidance, while stochasticity brings the trajectory back to higher-probability regions. Based on these findings, we propose Triadic Dynamics Aware Posterior Sampling (TriPS), which reformulates posterior sampling as a time-varying control problem and optimizes schedules following a triadic trend of decreasing DC and stochasticity scales alongside increasing CFG scale. TriPS achieves this through two strategies: template-based search over functional priors for reliable baseline schedules, and Group Relative Policy Optimization (GRPO)-based reinforcement learning for more flexible temporal curves. Experiments demonstrate TriPS outperforms state-of-the-art baselines in data fidelity and perceptual realism.