Say Cheese! Detail-Preserving Portrait Collection Generation via Natural Language Edits

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work proposes the Portrait Collection Generation (PCG) task, which aims to generate diverse, high-quality portrait collections by editing multiple attributes—such as pose, viewpoint, and layout—of a reference portrait through natural language instructions, while preserving identity and fine-grained details like clothing and accessories. To support this task, the authors introduce CHEESE, a large-scale PCG dataset comprising 24K collections and 573K samples, along with the SCheese framework. SCheese integrates adaptive feature fusion and a ConsistencyNet-based fine-grained feature injection mechanism, leveraging large vision-language models for data construction and inversion-based validation. Experimental results demonstrate that SCheese significantly outperforms existing methods in both detail fidelity and identity consistency, advancing the state of controllable portrait editing.

Technology Category

Application Category

📝 Abstract

As social media platforms proliferate, users increasingly demand intuitive ways to create diverse, high-quality portrait collections. In this work, we introduce Portrait Collection Generation (PCG), a novel task that generates coherent portrait collections by editing a reference portrait image through natural language instructions. This task poses two unique challenges to existing methods: (1) complex multi-attribute modifications such as pose, spatial layout, and camera viewpoint; and (2) high-fidelity detail preservation including identity, clothing, and accessories. To address these challenges, we propose CHEESE, the first large-scale PCG dataset containing 24K portrait collections and 573K samples with high-quality modification text annotations, constructed through an Large Vison-Language Model-based pipeline with inversion-based verification. We further propose SCheese, a framework that combines text-guided generation with hierarchical identity and detail preservation. SCheese employs adaptive feature fusion mechanism to maintain identity consistency, and ConsistencyNet to inject fine-grained features for detail consistency. Comprehensive experiments validate the effectiveness of CHEESE in advancing PCG, with SCheese achieving state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Portrait Collection Generation

Natural Language Editing

Detail Preservation

Multi-attribute Modification

Identity Consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Portrait Collection Generation

Natural Language Editing

Detail Preservation