Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges in image layer decomposition arising from the absence of paired supervision and the reliance on unreliable visual-language model (VLM) scores for policy optimization. To overcome these limitations, the authors propose a reinforcement learning approach built upon the pretrained Qwen-Image-Layered model, employing Flow-GRPO with LoRA-based efficient fine-tuning. A two-stage VLM evaluation mechanism is introduced: first, structured scoring according to five edit-oriented criteria, followed by grid-based juxtaposition for recalibration, which substantially enhances score discriminability and training stability. Evaluated on the Crello dataset, the method produces cleaner layer separation with fewer artifacts and achieves significantly lower single-layer reconstruction error compared to existing baselines.
📝 Abstract
We present Stable-Layers, a reinforcement learning framework that eliminates the need for paired supervision by fine-tuning a pretrained layer decomposition model using only feedback from a vision-language model (VLM). Starting from Qwen-Image-Layered, we apply Flow-GRPO with LoRA adaptation, sampling multiple candidate decompositions per image, scoring them with a VLM, and optimising the policy from group-relative advantages. The key challenge lies in designing a reliable reward signal: VLMs scoring samples in isolation tend to compress their judgements into a narrow band, leaving GRPO with little within-group variance to learn from. We address this with a two-stage evaluation pipeline that pairs structured per-sample scoring across five edit-centric criteria with a grid-based calibration step in which the VLM re-scores all candidates side-by-side. Stable-Layers produces decompositions with stronger layer separation, fewer blank or artifact-heavy layers, and lower per-layer reconstruction error on the Crello dataset compared to the base model.
Problem

Research questions and friction points this paper is trying to address.

image layer decomposition
reinforcement learning
vision-language model
unsupervised fine-tuning
reward signal design
Innovation

Methods, ideas, or system contributions that make the work stand out.

reinforcement learning
layer decomposition
vision-language model
unsupervised fine-tuning
Flow-GRPO
🔎 Similar Papers
No similar papers found.