🤖 AI Summary
This work addresses the challenge of generating sharp, post-processing-free edge maps in real-world scenarios where training samples are limited. To this end, it introduces image generative foundation models into edge detection for the first time and proposes an edge-specific fine-tuning strategy that combines an edge-aware objective function with pixel-level supervision loss. Furthermore, an unconditional dynamic guidance mechanism is designed to enable controllable adjustment of edge density. By leveraging the iterative refinement capability and data-efficient transfer priors of generative models, the proposed method significantly outperforms existing approaches across multiple benchmarks—including BSDS500, NYUDv2, BIPED, and CubiCasa—achieving notable improvements particularly in few-shot training settings and in sharpness metrics without post-processing.
📝 Abstract
We propose EasyControlEdge, adapting an image-generation foundation model to edge detection. In real-world edge detection (e.g., floor-plan walls, satellite roads/buildings, and medical organ boundaries), crispness and data efficiency are crucial, yet producing crisp raw edge maps with limited training samples remains challenging. Although image-generation foundation models perform well on many downstream tasks, their pretrained priors for data-efficient transfer and iterative refinement for high-frequency detail preservation remain underexploited for edge detection. To enable crisp and data-efficient edge detection using these capabilities, we introduce an edge-specialized adaptation of image-generation foundation models. To better specialize the foundation model for edge detection, we incorporate an edge-oriented objective with an efficient pixel-space loss. At inference, we introduce guidance based on unconditional dynamics, enabling a single model to control the edge density through a guidance scale. Experiments on BSDS500, NYUDv2, BIPED, and CubiCasa compare against state-of-the-art methods and show consistent gains, particularly under no-post-processing crispness evaluation and with limited training data.