HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of faithfully preserving fine-grained product details in reference-guided image inpainting for human–product imagery, a task hindered by insufficient training data, limited model sensitivity to high-frequency details, and coarse supervision signals. To overcome these limitations, we propose HiFi-Inpaint, a novel framework that introduces a shared augmented attention mechanism and a detail-aware loss function. Additionally, we construct HP-Image-40K, a large-scale, high-quality dataset specifically curated for this task. By integrating reference-guided inpainting architecture, explicit high-frequency detail supervision, and an automated synthetic data filtering strategy, HiFi-Inpaint significantly outperforms existing methods, achieving markedly more realistic and intricate reconstruction of product details in human–product image generation.

Technology Category

Application Category

📝 Abstract
Human-product images, which showcase the integration of humans and products, play a vital role in advertising, e-commerce, and digital marketing. The essential challenge of generating such images lies in ensuring the high-fidelity preservation of product details. Among existing paradigms, reference-based inpainting offers a targeted solution by leveraging product reference images to guide the inpainting process. However, limitations remain in three key aspects: the lack of diverse large-scale training data, the struggle of current models to focus on product detail preservation, and the inability of coarse supervision for achieving precise guidance. To address these issues, we propose HiFi-Inpaint, a novel high-fidelity reference-based inpainting framework tailored for generating human-product images. HiFi-Inpaint introduces Shared Enhancement Attention (SEA) to refine fine-grained product features and Detail-Aware Loss (DAL) to enforce precise pixel-level supervision using high-frequency maps. Additionally, we construct a new dataset, HP-Image-40K, with samples curated from self-synthesis data and processed with automatic filtering. Experimental results show that HiFi-Inpaint achieves state-of-the-art performance, delivering detail-preserving human-product images.
Problem

Research questions and friction points this paper is trying to address.

reference-based inpainting
human-product images
detail preservation
high-fidelity generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

reference-based inpainting
Shared Enhancement Attention
Detail-Aware Loss
high-fidelity image generation
human-product image synthesis
Y
Yichen Liu
University of Chinese Academy of Sciences
Donghao Zhou
Donghao Zhou
The Chinese University of Hong Kong
Machine LearningComputer Vision
J
Jie Wang
ByteDance
X
Xin Gao
ByteDance
G
Guisheng Liu
ByteDance
Jiatong Li
Jiatong Li
PhD candidate, Hong Kong Polytechnic University
Natural Language ProcessingBioinformaticsMolecule Discovery
Q
Quanwei Zhang
Zhejiang University
Q
Qiang Lyu
University of Chinese Academy of Sciences
L
Lanqing Guo
UT Austin
Shilei Wen
Shilei Wen
bytedance.com
computer visionmachine learning
Weiqiang Wang
Weiqiang Wang
The University of Chinese Academy of Sciences, CAS
P
Pheng-Ann Heng
The Chinese University of Hong Kong