🤖 AI Summary
This work addresses text-guided region-controllable style editing—specifically, applying a target style (e.g., cyberpunk) to a designated object (e.g., a building) while preserving other regions (e.g., people, trees). To this end, we propose a novel method integrating semantic segmentation with state-space modeling. We introduce fine-grained segmentation guidance into text-driven editing for the first time, designing region-conditioned text embeddings and region-directed adversarial loss to jointly ensure semantic boundary consistency and precise alignment with user intent. Our framework builds upon Mask2Former for segmentation and StyleMamba—a state-space model—for stylization. Evaluated on real-world complex scenes, our approach significantly improves edit controllability and visual fidelity: PSNR increases by 2.1 dB over global style transfer, and user satisfaction rises by 37%.
📝 Abstract
We present a novel approach for controllable, region-specific style editing driven by textual prompts. Building upon the state-space style alignment framework introduced by emph{StyleMamba}, our method integrates a semantic segmentation model into the style transfer pipeline. This allows users to selectively apply text-driven style changes to specific segments (e.g., ``turn the building into a cyberpunk tower'') while leaving other regions (e.g., ``people'' or ``trees'') unchanged. By incorporating region-wise condition vectors and a region-specific directional loss, our method achieves high-fidelity transformations that respect both semantic boundaries and user-driven style descriptions. Extensive experiments demonstrate that our approach can flexibly handle complex scene stylizations in real-world scenarios, improving control and quality over purely global style transfer methods.