PHAC: Promptable Human Amodal Completion

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing methods for occluded human image completion struggle to simultaneously preserve the appearance of visible regions and support controllable generation under user-specified constraints, such as target poses or spatial regions. This work proposes the first framework capable of completing non-visible human body parts guided by diverse user prompts—including keypoints and bounding boxes—by integrating these cues into a pre-trained diffusion model via ControlNet. Only the cross-attention layers are fine-tuned, enabling high-fidelity and controllable synthesis while maintaining efficiency. Additionally, a novel inpainting-based refinement module is introduced to enhance the seamless blending of occlusion boundaries. Evaluated on the HAC and PGPIS benchmarks, the proposed method significantly outperforms existing approaches, achieving marked improvements in visual quality, physical plausibility, and alignment with user-provided prompts.

Technology Category

Application Category

📝 Abstract

Conditional image generation methods are increasingly used in human-centric applications, yet existing human amodal completion (HAC) models offer users limited control over the completed content. Given an occluded person image, they hallucinate invisible regions while preserving visible ones, but cannot reliably incorporate user-specified constraints such as a desired pose or spatial extent. As a result, users often resort to repeatedly sampling the model until they obtain a satisfactory output. Pose-guided person image synthesis (PGPIS) methods allow explicit pose conditioning, but frequently fail to preserve the instance-specific visible appearance and tend to be biased toward the training distribution, even when built on strong diffusion model priors. To address these limitations, we introduce promptable human amodal completion (PHAC), a new task that completes occluded human images while satisfying both visible appearance constraints and multiple user prompts. Users provide simple point-based prompts, such as additional joints for the target pose or bounding boxes for desired regions; these prompts are encoded using ControlNet modules specialized for each prompt type. These modules inject the prompt signals into a pre-trained diffusion model, and we fine-tune only the cross-attention blocks to obtain strong prompt alignment without degrading the underlying generative prior. To further preserve visible content, we propose an inpainting-based refinement module that starts from a slightly noised coarse completion, faithfully preserves the visible regions, and ensures seamless blending at occlusion boundaries. Extensive experiments on the HAC and PGPIS benchmarks show that our approach yields more physically plausible and higher-quality completions, while significantly improving prompt alignment compared with existing amodal completion and pose-guided synthesis methods.

Problem

Research questions and friction points this paper is trying to address.

human amodal completion

pose-guided synthesis

user control

occluded human images

conditional image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Promptable Human Amodal Completion

ControlNet

Diffusion Model