🤖 AI Summary
This work addresses the challenges of modeling object interactions and the scarcity of high-quality layered data in real-world image layer decomposition. To this end, it proposes a high-fidelity layered decomposition framework that introduces an Agent-driven Data Decomposition (ADD) pipeline, enabling the automatic creation of LiWi-100k—the first large-scale real-image layered dataset. The method further incorporates shadow-guided learning and a degradation–restoration boundary refinement mechanism to jointly enhance photometric fidelity and alpha matte boundary accuracy. Experimental results demonstrate that the proposed approach significantly outperforms existing models in terms of RGB L1 error and Alpha IoU, thereby advancing the state of the art in real-image layer decomposition and high-precision editing.
📝 Abstract
Recent advances in generative models have empowered impressive layered image generation, yet their success is largely confined to graphic design domains. The layering of in-the-wild images remains an underexplored problem, limiting fine-grained editing and applications of images in real-world scenarios. Specifically, challenges remain in scalable layered data and the modeling of object interaction in natural images, such as illumination effects and structural boundary. To address these bottlenecks, we propose a novel framework for high-fidelity natural image decomposition. First, we introduce an Agent-driven Data Decomposition (ADD) pipeline that orchestrates agents and tools to synthesize layered data without manual intervention. Utilizing this pipeline, we construct a large-scale dataset, named LiWi-100k, with over 100,000 high-quality layered in-the-wild images. Second, we present a novel framework that jointly improves photometric fidelity and alpha boundary accuracy. Specifically, shadow-guided learning explicitly models the illumination effects, and degradation-restoration objective provides boundary-correction supervision by recovering clean foreground image from degraded one. Extensive experiments demonstrate that our framework achieves state-of-the-art (SoTA) performance in natural image decomposition, outperforming existing models in RGB L1 and Alpha IoU metrics. We will soon release our code and dataset.