🤖 AI Summary
Implicit Attribute Value Extraction (Implicit AVE) in multimodal e-commerce faces challenges stemming from cross-modal data complexity and the vision–language understanding gap, leading to inaccurate and brittle inference.
Method: We propose a multi-agent debate framework for AVE, wherein multiple multimodal large language models (MLLMs) serve as specialized agents that engage in structured, iterative debates and response refinement to explicitly model cross-modal semantic alignment and uncertainty resolution.
Contribution/Results: Unlike single-agent paradigms, our approach significantly improves extraction accuracy—especially for low-performing attributes—while ensuring strong scalability and stable convergence. Experiments on the ImplicitAVE benchmark demonstrate that only 3–5 debate rounds yield substantial overall accuracy gains, with pronounced improvements for initially weak attributes. These results validate the framework’s effectiveness, robustness, and generalization capability in challenging implicit AVE scenarios.
📝 Abstract
Implicit Attribute Value Extraction (AVE) is essential for accurately representing products in e-commerce, as it infers lantent attributes from multimodal data. Despite advances in multimodal large language models (MLLMs), implicit AVE remains challenging due to the complexity of multidimensional data and gaps in vision-text understanding. In this work, we introduce extsc{modelname}, a multi-agent debate framework that employs multiple MLLM agents to iteratively refine inferences. Through a series of debate rounds, agents verify and update each other's responses, thereby improving inference performance and robustness. Experiments on the ImplicitAVE dataset demonstrate that even a few rounds of debate significantly boost accuracy, especially for attributes with initially low performance. We systematically evaluate various debate configurations, including identical or different MLLM agents, and analyze how debate rounds affect convergence dynamics. Our findings highlight the potential of multi-agent debate strategies to address the limitations of single-agent approaches and offer a scalable solution for implicit AVE in multimodal e-commerce.