RF-HiT: Rectified Flow Hierarchical Transformer for General Medical Image Segmentation

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the challenges in medical image segmentation of simultaneously modeling long-range contextual dependencies and preserving fine boundary details, while overcoming the high computational cost and slow inference of existing Transformer- and diffusion-based approaches. The authors propose a novel architecture that integrates a hierarchical hourglass-shaped Transformer with Rectified Flow. By leveraging multi-scale feature encoding, learnable interpolation-based fusion, and low-step discrete-time inference, the model achieves highly accurate segmentation with linear computational complexity for the first time. It requires only 10.14 GFLOPs and 13.6M parameters, attaining average Dice scores of 91.27% on ACDC and 87.40% on BraTS 2021, and enables rapid inference within as few as three steps.

Technology Category

Application Category

📝 Abstract
Accurate medical image segmentation requires both long-range contextual reasoning and precise boundary delineation, a task where existing transformer- and diffusion-based paradigms are frequently bottlenecked by quadratic computational complexity and prohibitive inference latency. We propose RF-HiT, a Rectified Flow Hierarchical Transformer that integrates an hourglass transformer backbone with a multi-scale hierarchical encoder for anatomically guided feature conditioning. Unlike prior diffusion-based approaches, RF-HiT leverages rectified flow with efficient transformer blocks to achieve linear complexity while requiring only a few discretization steps. The model further fuses conditioning features across resolutions via learnable interpolation, enabling effective multi-scale representation with minimal computational overhead. As a result, RF-HiT achieves a strong efficiency-performance trade-off, requiring only 10.14 GFLOPs, 13.6M parameters, and inference in as few as three steps. Despite its compact design, RF-HiT attains 91.27% mean Dice on ACDC and 87.40% on BraTS 2021, achieving performance comparable to or exceeding that of significantly more intensive architectures. This demonstrates its strong potential as a robust, computationally efficient foundation for real-time clinical segmentation.
Problem

Research questions and friction points this paper is trying to address.

medical image segmentation
long-range contextual reasoning
precise boundary delineation
computational complexity
inference latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rectified Flow
Hierarchical Transformer
Medical Image Segmentation
Linear Complexity
Multi-scale Feature Fusion
🔎 Similar Papers
No similar papers found.