CareCom: Generative Image Composition with Calibrated Reference Features

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative image synthesis methods struggle to simultaneously preserve foreground detail fidelity and enable controllable pose/view manipulation. To address this, we propose a multi-reference image synthesis framework centered on a cross-reference feature calibration mechanism: it explicitly aligns local detail features from multiple reference images with global background context, enabling consistent inter-reference modeling and background-aware fusion. Our method adopts a generative model architecture and jointly optimizes three modules—feature extraction, cross-reference calibration, and background adaptation—thereby preserving texture and structural details while supporting flexible pose and viewpoint editing. Experiments on MVImgNet and MureCom demonstrate that our approach significantly outperforms state-of-the-art methods in FID, LPIPS, and user study metrics, achieving substantial improvements in visual realism, geometric plausibility, and detail completeness of synthesized images.

Technology Category

Application Category

📝 Abstract
Image composition aims to seamlessly insert foreground object into background. Despite the huge progress in generative image composition, the existing methods are still struggling with simultaneous detail preservation and foreground pose/view adjustment. To address this issue, we extend the existing generative composition model to multi-reference version, which allows using arbitrary number of foreground reference images. Furthermore, we propose to calibrate the global and local features of foreground reference images to make them compatible with the background information. The calibrated reference features can supplement the original reference features with useful global and local information of proper pose/view. Extensive experiments on MVImgNet and MureCom demonstrate that the generative model can greatly benefit from the calibrated reference features.
Problem

Research questions and friction points this paper is trying to address.

Seamlessly inserting foreground objects into background images
Preserving details while adjusting foreground pose and view
Making reference features compatible with background information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-reference generative model for image composition
Calibrates global and local foreground features
Makes reference features compatible with background
🔎 Similar Papers
No similar papers found.
J
Jiaxuan Chen
MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University
B
Bo Zhang
MoE Key Lab of Artificial Intelligence, Shanghai Jiao Tong University
Qingdong He
Qingdong He
Tencent Youtu Lab
Computer visionGenerative AI3D Vision
Jinlong Peng
Jinlong Peng
Tencent Youtu Lab
Computer VisionDeep Learning
Li Niu
Li Niu
Shanghai Jiao Tong University
computer visionmachine learningdeep learning