Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Non-modal completion requires simultaneously ensuring semantic consistency and structural integrity in missing regions—a challenge that existing methods struggle to address holistically. This paper proposes MAC, a multi-agent collaborative reasoning framework that innovatively decouples the task into two stages: semantic planning and visual synthesis. An LLM-driven hypothesis generator produces diverse, structured completion hypotheses, which are then rigorously evaluated and refined by a Chain-of-Thought–guided self-validation agent to select the optimal solution. Furthermore, we introduce MAC-Score, a human-aligned evaluation metric that transcends pixel-level assessment limitations by incorporating perceptual and semantic fidelity. Extensive experiments demonstrate that MAC significantly outperforms state-of-the-art methods across multiple benchmarks. MAC-Score achieves strong correlation with human judgments (Spearman’s ρ > 0.92) and improves both structural integrity and semantic consistency by over 27%.

Technology Category

Application Category

📝 Abstract
Amodal completion, the task of inferring invisible object parts, faces significant challenges in maintaining semantic consistency and structural integrity. Prior progressive approaches are inherently limited by inference instability and error accumulation. To tackle these limitations, we present a Collaborative Multi-Agent Reasoning Framework that explicitly decouples Semantic Planning from Visual Synthesis. By employing specialized agents for upfront reasoning, our method generates a structured, explicit plan before pixel generation, enabling visually and semantically coherent single-pass synthesis. We integrate this framework with two critical mechanisms: (1) a self-correcting Verification Agent that employs Chain-of-Thought reasoning to rectify visible region segmentation and identify residual occluders strictly within the Semantic Planning phase, and (2) a Diverse Hypothesis Generator that addresses the ambiguity of invisible regions by offering diverse, plausible semantic interpretations, surpassing the limited pixel-level variations of standard random seed sampling. Furthermore, addressing the limitations of traditional metrics in assessing inferred invisible content, we introduce the MAC-Score (MLLM Amodal Completion Score), a novel human-aligned evaluation metric. Validated against human judgment and ground truth, these metrics establish a robust standard for assessing structural completeness and semantic consistency with visible context. Extensive experiments demonstrate that our framework significantly outperforms state-of-the-art methods across multiple datasets. Our project is available at: https://fanhongxing.github.io/remac-page.
Problem

Research questions and friction points this paper is trying to address.

Amodal completion struggles with semantic consistency and structural integrity
Prior methods suffer from inference instability and error accumulation
Traditional metrics inadequately assess inferred invisible content quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative Multi-Agent Reasoning Framework decouples planning from synthesis
Self-correcting Verification Agent uses Chain-of-Thought reasoning for corrections
Diverse Hypothesis Generator offers multiple plausible semantic interpretations
🔎 Similar Papers
No similar papers found.
H
Hongxing Fan
School of Computer Science and Engineering, Beihang University, Beijing 100191, China
S
Shuyu Zhao
School of Software, Beihang University, Beijing 100191, China
Jiayang Ao
Jiayang Ao
School of Computing and Information Systems, The University of Melbourne
Object ReconstructionAmodal PerceptionComputer Vision
Lu Sheng
Lu Sheng
School of Software, Beihang University
Embodied AI3D VisionMachine Learning