Diffusion-Driven Two-Stage Active Learning for Low-Budget Semantic Segmentation

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses active learning for semantic segmentation under extremely low annotation budgets. We propose a two-stage pixel selection framework: in the first stage, multi-scale features are extracted from a pre-trained diffusion model, and hierarchical diversity is enforced via MaxHerding; in the second stage, noise perturbation is introduced to enhance uncertainty modeling, and an entropy-weighted disagreement scoring metric (eDALD) is designed to quantify pixel informativeness. Our approach is the first to decouple and jointly optimize diversity and uncertainty—two complementary criteria long conflated in prior work. Extensive experiments on CamVid, ADE-Bed, Cityscapes, and Pascal-Context demonstrate substantial improvements over state-of-the-art methods: superior segmentation accuracy is achieved with drastically fewer annotated pixels, significantly reducing the cost of dense pixel-level labeling.

Technology Category

Application Category

📝 Abstract

Semantic segmentation demands dense pixel-level annotations, which can be prohibitively expensive - especially under extremely constrained labeling budgets. In this paper, we address the problem of low-budget active learning for semantic segmentation by proposing a novel two-stage selection pipeline. Our approach leverages a pre-trained diffusion model to extract rich multi-scale features that capture both global structure and fine details. In the first stage, we perform a hierarchical, representation-based candidate selection by first choosing a small subset of representative pixels per image using MaxHerding, and then refining these into a diverse global pool. In the second stage, we compute an entropy-augmented disagreement score (eDALD) over noisy multi-scale diffusion features to capture both epistemic uncertainty and prediction confidence, selecting the most informative pixels for annotation. This decoupling of diversity and uncertainty lets us achieve high segmentation accuracy with only a tiny fraction of labeled pixels. Extensive experiments on four benchmarks (CamVid, ADE-Bed, Cityscapes, and Pascal-Context) demonstrate that our method significantly outperforms existing baselines under extreme pixel-budget regimes. Our code is available at https://github.com/jn-kim/two-stage-edald.

Problem

Research questions and friction points this paper is trying to address.

Active learning for semantic segmentation with limited labeling budgets

Selecting informative pixels using diffusion features and uncertainty

Achieving high accuracy with minimal labeled pixel annotations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage active learning pipeline for segmentation

Diffusion model extracts multi-scale feature representations

Decouples diversity and uncertainty for pixel selection

🔎 Similar Papers

ESA: Annotation-Efficient Active Learning for Semantic Segmentation