🤖 AI Summary
This work addresses the challenges of topological sensitivity, high annotation cost, and poor cross-domain generalization in the segmentation of slender structures such as cracks and blood vessels. The authors propose FMS², the first framework to introduce flow matching to this task. Its core components include SegFlow, a model that leverages ODE-driven continuous flow matching for trajectory-level supervision, enhanced by multi-scale mask injection and edge-aware gating; and SynFlow, a module that generates pixel-aligned image–mask pairs with controllable structural properties—such as sparsity, width, and branching—to alleviate annotation scarcity and domain shift. Experiments show that SegFlow improves mIoU by 17.2% and reduces Betti error by 37.3% across five benchmarks. When combined with SynFlow, it achieves near full-supervision performance using only 25% of the annotations and boosts average cross-domain mIoU by 0.11.
📝 Abstract
Segmenting thin structures like infrastructure cracks and anatomical vessels is a task hampered by topology-sensitive geometry, high annotation costs, and poor generalization across domains. Existing methods address these challenges in isolation. We propose FMS$^2$, a flow-matching framework with two modules. (1) SegFlow is a 2.96M-parameter segmentation model built on a standard encoder-decoder backbone that recasts prediction as continuous image $\rightarrow$ mask transport. It learns a time-indexed velocity field with a flow-matching regression loss and outputs the mask via ODE integration, rather than supervising only end-state logits. This trajectory-level supervision improves thin-structure continuity and sharpness, compared with tuned topology-aware loss baselines, without auxiliary topology heads, post-processing, or multi-term loss engineering. (2) SynFlow is a mask-conditioned mask $\rightarrow$ image generator that produces pixel-aligned synthetic image-mask pairs. It injects mask geometry at multiple scales and emphasizes boundary bands via edge-aware gating, while a controllable mask generator expands sparsity, width, and branching regimes. On five crack and vessel benchmarks, SegFlow alone outperforms strong CNN, Transformer, Mamba, and generative baselines, improving the volumetric metric (mean IoU) from 0.511 to 0.599 (+17.2%) and reducing the topological metric (Betti matching error) from 82.145 to 51.524 (-37.3%). When training with limited labels, augmenting SegFlow with SynFlow-generated pairs recovers near-full performance using 25% of real annotations and improves cross-domain IoU by 0.11 on average. Unlike classical data augmentation that promotes invariance via label-preserving transforms, SynFlow provides pixel-aligned paired supervision with controllable structural shifts (e.g., sparsity, width, branching), which is particularly effective under domain shift.