Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the Portrait Collection Generation (PCG) task, which aims to generate diverse, high-quality portrait collections by editing multiple attributes—such as pose, viewpoint, and layout—of a reference portrait through natural language instructions, while preserving identity and fine-grained details like clothing and accessories. To support this task, the authors introduce CHEESE, a large-scale PCG dataset comprising 24K collections and 573K samples, along with the SCheese framework. SCheese integrates adaptive feature fusion and a ConsistencyNet-based fine-grained feature injection mechanism, leveraging large vision-language models for data construction and inversion-based validation. Experimental results demonstrate that SCheese significantly outperforms existing methods in both detail fidelity and identity consistency, advancing the state of controllable portrait editing.

Technology Category

Application Category

📝 Abstract
As social media platforms proliferate, users increasingly demand intuitive ways to create diverse, high-quality portrait collections. In this work, we introduce Portrait Collection Generation (PCG), a novel task that generates coherent portrait collections by editing a reference portrait image through natural language instructions. This task poses two unique challenges to existing methods: (1) complex multi-attribute modifications such as pose, spatial layout, and camera viewpoint; and (2) high-fidelity detail preservation including identity, clothing, and accessories. To address these challenges, we propose CHEESE, the first large-scale PCG dataset containing 24K portrait collections and 573K samples with high-quality modification text annotations, constructed through an Large Vison-Language Model-based pipeline with inversion-based verification. We further propose SCheese, a framework that combines text-guided generation with hierarchical identity and detail preservation. SCheese employs adaptive feature fusion mechanism to maintain identity consistency, and ConsistencyNet to inject fine-grained features for detail consistency. Comprehensive experiments validate the effectiveness of CHEESE in advancing PCG, with SCheese achieving state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

Portrait Collection Generation
Natural Language Editing
Detail Preservation
Multi-attribute Modification
Identity Consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Portrait Collection Generation
Natural Language Editing
Detail Preservation
Identity Consistency
Large Vision-Language Model
🔎 Similar Papers
No similar papers found.
Z
Zelong Sun
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
J
Jiahui Wu
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
Y
Ying Ba
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China
Dong Jing
Dong Jing
Renmin University of China
Computer VisionEmbodied AI
Zhiwu Lu
Zhiwu Lu
Professor, Renmin University of China
Machine LearningComputer VisionLarge Multimodal ModelsVideo Generation