Multi-Agent Amodal Completion: Direct Synthesis with Fine-Grained Semantic Guidance

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current amodal completion methods suffer from strong data dependency, poor generalization, and error accumulation in progressive inpainting. To address these issues, this paper proposes a multi-agent collaborative reasoning framework that jointly models occlusion relationships and boundary expansion to achieve precise mask restoration and semantically consistent image synthesis. We innovatively introduce a fine-grained semantic-guided mechanism coupled with a Diffusion Transformer-driven attention–visible-mask joint guidance scheme, effectively preventing erroneous redrawing of occluders. Moreover, our method directly outputs hierarchical RGBA representations, eliminating the need for post-hoc segmentation. Extensive experiments demonstrate state-of-the-art visual quality across multiple benchmarks, with particularly notable improvements in structural coherence and semantic fidelity for large-occluded regions.

Technology Category

Application Category

📝 Abstract
Amodal completion, generating invisible parts of occluded objects, is vital for applications like image editing and AR. Prior methods face challenges with data needs, generalization, or error accumulation in progressive pipelines. We propose a Collaborative Multi-Agent Reasoning Framework based on upfront collaborative reasoning to overcome these issues. Our framework uses multiple agents to collaboratively analyze occlusion relationships and determine necessary boundary expansion, yielding a precise mask for inpainting. Concurrently, an agent generates fine-grained textual descriptions, enabling Fine-Grained Semantic Guidance. This ensures accurate object synthesis and prevents the regeneration of occluders or other unwanted elements, especially within large inpainting areas. Furthermore, our method directly produces layered RGBA outputs guided by visible masks and attention maps from a Diffusion Transformer, eliminating extra segmentation. Extensive evaluations demonstrate our framework achieves state-of-the-art visual quality.
Problem

Research questions and friction points this paper is trying to address.

Overcoming data dependency and error accumulation in amodal completion
Preventing regeneration of occluders during object synthesis
Eliminating need for extra segmentation in layered output generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative multi-agent reasoning for occlusion analysis
Fine-grained semantic guidance via textual descriptions
Direct RGBA synthesis using Diffusion Transformer attention
🔎 Similar Papers
No similar papers found.
H
Hongxing Fan
School of Computer Science and Engineering, Beihang University, Beijing, China
L
Lipeng Wang
School of Software, Beihang University, Beijing, China
H
Haohua Chen
School of Software, Beihang University, Beijing, China
Zehuan Huang
Zehuan Huang
Beihang University
Generative ModelComputer Vision
Jiangtao Wu
Jiangtao Wu
PhD Student of Solid Mechanics,Georgia Institute of Technology
Solid mechanics3D printingShape memory polymerMolecular dynamicsDensity functional theory
Lu Sheng
Lu Sheng
School of Software, Beihang University
Embodied AI3D VisionMachine Learning