🤖 AI Summary
This work addresses the fine-grained, robust selection of spatially varying materials in images under complex illumination and reflectance variations, targeting texture- and sub-texture-level segmentation. Methodologically, we propose DuMaS (Dual-level Material Selection), a novel framework featuring a ViT-based multi-resolution feature fusion module and a dual-level decoding architecture. To enable end-to-end supervised learning, we introduce the first million-scale synthetic dataset with hierarchical annotations—covering both texture and sub-texture levels. Experimental results demonstrate that DuMaS significantly outperforms state-of-the-art methods in segmentation accuracy and boundary stability, especially in challenging scenarios involving subtle material transitions and strong illumination interference. The framework thus delivers high-fidelity, spatially coherent material priors essential for downstream image editing tasks.
📝 Abstract
Selection is the first step in many image editing processes, enabling faster and simpler modifications of all pixels sharing a common modality. In this work, we present a method for material selection in images, robust to lighting and reflectance variations, which can be used for downstream editing tasks. We rely on vision transformer (ViT) models and leverage their features for selection, proposing a multi-resolution processing strategy that yields finer and more stable selection results than prior methods. Furthermore, we enable selection at two levels: texture and subtexture, leveraging a new two-level material selection (DuMaS) dataset which includes dense annotations for over 800,000 synthetic images, both on the texture and subtexture levels.