LawDIS: Language-Window-based Controllable Dichotomous Image Segmentation

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the task of controllable binary image segmentation (DIS). We propose a language-window co-driven latent diffusion framework featuring a macro-micro dual-mode control mechanism: at the macro level, natural language prompts guide coarse segmentation initialization; at the micro level, an adjustable spatial window refines the mask via localized optimization—both modes can be deployed independently or jointly. Crucially, we unify linguistic semantics and geometric window constraints within the latent diffusion process, enabling synergistic enhancement of semantic controllability and spatial precision. Evaluated on the DIS5K benchmark, our method outperforms 11 state-of-the-art approaches across all subsets. Notably, on the DIS-TE test set, it achieves a 4.6% improvement in the Fₐᵦ^ω metric over the second-best method, MVANet. The framework significantly advances personalized segmentation accuracy and interactive flexibility.

Technology Category

Application Category

📝 Abstract

We present LawDIS, a language-window-based controllable dichotomous image segmentation (DIS) framework that produces high-quality object masks. Our framework recasts DIS as an image-conditioned mask generation task within a latent diffusion model, enabling seamless integration of user controls. LawDIS is enhanced with macro-to-micro control modes. Specifically, in macro mode, we introduce a language-controlled segmentation strategy (LS) to generate an initial mask based on user-provided language prompts. In micro mode, a window-controlled refinement strategy (WR) allows flexible refinement of user-defined regions (i.e., size-adjustable windows) within the initial mask. Coordinated by a mode switcher, these modes can operate independently or jointly, making the framework well-suited for high-accuracy, personalised applications. Extensive experiments on the DIS5K benchmark reveal that our LawDIS significantly outperforms 11 cutting-edge methods across all metrics. Notably, compared to the second-best model MVANet, we achieve $F_β^ω$ gains of 4.6% with both the LS and WR strategies and 3.6% gains with only the LS strategy on DIS-TE. Codes will be made available at https://github.com/XinyuYanTJU/LawDIS.

Problem

Research questions and friction points this paper is trying to address.

Enables precise object segmentation using language prompts

Integrates macro and micro control modes for refinement

Outperforms existing methods in image segmentation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-controlled segmentation for initial masks

Window-controlled refinement for user-defined regions

Macro-to-micro control modes for flexible operation

🔎 Similar Papers

Bilateral Reference for High-Resolution Dichotomous Image Segmentation