🤖 AI Summary
This work addresses the common issue in learning-based edge detection where standard cross-entropy loss often yields overly thick predictions that fail to align with the single-pixel sharp contours annotated by humans. The authors propose MEMO, a method that leverages only the standard cross-entropy loss but achieves human-like edge sharpness through synthetic edge pretraining followed by lightweight fine-tuning (adding merely 1.2% extra parameters). By incorporating training with varying input mask ratios and a confidence-driven progressive inference strategy, MEMO iteratively refines edge localization. Notably, it avoids complex loss functions, architectural modifications, or post-processing steps, yet outperforms existing approaches on metrics emphasizing edge sharpness.
📝 Abstract
Learning-based edge detection models trained with cross-entropy loss often suffer from thick edge predictions, which deviate from the crisp, single-pixel annotations typically provided by humans. While previous approaches to achieving crisp edges have focused on designing specialized loss functions or modifying network architectures, we show that a carefully designed training and inference strategy alone is sufficient to achieve human-like edge quality. In this work, we introduce the Masked Edge Prediction MOdel (MEMO), which produces both accurate and crisp edges using only cross-entropy loss. We first construct a large-scale synthetic edge dataset to pre-train MEMO, enhancing its generalization ability. Subsequent fine-tuning on downstream datasets requires only a lightweight module comprising 1.2\% additional parameters. During training, MEMO learns to predict edges under varying ratios of input masking. A key insight guiding our inference is that thick edge predictions typically exhibit a confidence gradient: high in the center and lower toward the boundaries. Leveraging this, we propose a novel progressive prediction strategy that sequentially finalizes edge predictions in order of prediction confidence, resulting in thinner and more precise contours. Our method achieves visually appealing, post-processing-free, human-like edge maps and outperforms prior methods on crispness-aware evaluations.