Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic Guidance

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Current amodal completion methods suffer from strong data dependency, poor generalization, and error accumulation in progressive inpainting. To address these issues, this paper proposes a multi-agent collaborative reasoning framework that jointly models occlusion relationships and boundary expansion to achieve precise mask restoration and semantically consistent image synthesis. We innovatively introduce a fine-grained semantic-guided mechanism coupled with a Diffusion Transformer-driven attention–visible-mask joint guidance scheme, effectively preventing erroneous redrawing of occluders. Moreover, our method directly outputs hierarchical RGBA representations, eliminating the need for post-hoc segmentation. Extensive experiments demonstrate state-of-the-art visual quality across multiple benchmarks, with particularly notable improvements in structural coherence and semantic fidelity for large-occluded regions.

Technology Category

Application Category

📝 Abstract

Amodal completion, generating invisible parts of occluded objects, is vital for applications like image editing and AR. Prior methods face challenges with data needs, generalization, or error accumulation in progressive pipelines. We propose a Collaborative Multi-Agent Reasoning Framework based on upfront collaborative reasoning to overcome these issues. Our framework uses multiple agents to collaboratively analyze occlusion relationships and determine necessary boundary expansion, yielding a precise mask for inpainting. Concurrently, an agent generates fine-grained textual descriptions, enabling Fine-Grained Semantic Guidance. This ensures accurate object synthesis and prevents the regeneration of occluders or other unwanted elements, especially within large inpainting areas. Furthermore, our method directly produces layered RGBA outputs guided by visible masks and attention maps from a Diffusion Transformer, eliminating extra segmentation. Extensive evaluations demonstrate our framework achieves state-of-the-art visual quality.

Problem

Research questions and friction points this paper is trying to address.

Overcoming data dependency and error accumulation in amodal completion

Preventing regeneration of occluders during object synthesis

Eliminating need for extra segmentation in layered output generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative multi-agent reasoning for occlusion analysis

Fine-grained semantic guidance via textual descriptions

Direct RGBA synthesis using Diffusion Transformer attention

🔎 Similar Papers

No similar papers found.