🤖 AI Summary
To address the high latency of diffusion-based detectors (e.g., DiffusionDet) caused by multi-step stochastic denoising—rendering them unsuitable for real-time clinical applications such as Crohn’s disease diagnosis via magnetic resonance enterography (MRE)—this paper proposes DeFloMat: the first object detection framework integrating conditional flow matching (CFM) and rectified flow. Its core innovation lies in a deterministic, single-step generative localization mechanism grounded in conditional optimal transport theory, effectively breaking the inherent accuracy–speed trade-off. Leveraging an ordinary differential equation (ODE) solver for inference, DeFloMat achieves 43.32% AP₁₀:₅₀ on the MRE dataset in merely three solver steps—1.4× faster than DiffusionDet—while significantly improving localization stability and recall under low-step regimes.
📝 Abstract
We propose DeFloMat (Detection with Flow Matching), a novel generative object detection framework that addresses the critical latency bottleneck of diffusion-based detectors, such as DiffusionDet, by integrating Conditional Flow Matching (CFM). Diffusion models achieve high accuracy by formulating detection as a multi-step stochastic denoising process, but their reliance on numerous sampling steps ($T gg 60$) makes them impractical for time-sensitive clinical applications like Crohn's Disease detection in Magnetic Resonance Enterography (MRE). DeFloMat replaces this slow stochastic path with a highly direct, deterministic flow field derived from Conditional Optimal Transport (OT) theory, specifically approximating the Rectified Flow. This shift enables fast inference via a simple Ordinary Differential Equation (ODE) solver. We demonstrate the superiority of DeFloMat on a challenging MRE clinical dataset. Crucially, DeFloMat achieves state-of-the-art accuracy ($43.32% ext{ } AP_{10:50}$) in only $3$ inference steps, which represents a $1.4 imes$ performance improvement over DiffusionDet's maximum converged performance ($31.03% ext{ } AP_{10:50}$ at $4$ steps). Furthermore, our deterministic flow significantly enhances localization characteristics, yielding superior Recall and stability in the few-step regime. DeFloMat resolves the trade-off between generative accuracy and clinical efficiency, setting a new standard for stable and rapid object localization.