🤖 AI Summary
This study addresses the challenge in art history of reliably attributing implicit artistic influences, where reliance on visual similarity alone often leads to erroneous conclusions. To this end, the authors propose an evidence-based multimodal agent framework that reframes influence discovery as a probabilistic adjudication process. The framework implements a four-stage protocol—investigation, corroboration, falsification, and judgment—to construct verifiable evidence chains by integrating visual and biographical data while strictly adhering to established art-historical axioms. Key innovations include an adversarial falsification mechanism and theory-driven operators (StyleComparator and ConceptRetriever), orchestrated by a ReAct controller and grounded in Wölfflinian formal analysis, ICONCLASS iconographic retrieval, and multimodal large language models. Evaluated on the WIB-100 benchmark, the approach achieves 83.7% positive-class F1, 0.666 Matthews correlation coefficient, and 0.910 ROC-AUC, demonstrating robust performance even when explicit influence statements are masked.
📝 Abstract
Implicit artistic influence, although visually plausible, is often undocumented and thus poses a historically constrained attribution problem: resemblance is necessary but not sufficient evidence. Most prior systems reduce influence discovery to embedding similarity or label-driven graph completion, while recent multimodal large language models (LLMs) remain vulnerable to temporal inconsistency and unverified attributions. This paper introduces M-ArtAgent, an evidence-based multimodal agent that reframes implicit influence discovery as probabilistic adjudication. It follows a four-phase protocol consisting of Investigation, Corroboration, Falsification, and Verdict governed by a Reasoning and Acting (ReAct)-style controller that assembles verifiable evidence chains from images and biographies, enforces art-historical axioms, and subjects each hypothesis to adversarial falsification via a prompt-isolated critic. Two theory-grounded operators, StyleComparator for Wolfflin formal analysis and ConceptRetriever for ICONCLASS-based iconographic grounding, ensure that intermediate claims are formally auditable. On the balanced WikiArt Influence Benchmark-100 (WIB-100) of 100 artists and 2,000 directed pairs, M-ArtAgent achieves 83.7% positive-class F1, 0.666 Matthews correlation coefficient (MCC), and 0.910 area under the receiver operating characteristic curve (ROC-AUC), with leakage-control and robustness checks confirming that the gains persist when explicit influence phrases are masked. By coupling multimodal perception with domain-constrained falsification, M-ArtAgent demonstrates that implicit influence analysis benefits from historically grounded adjudication rather than pattern matching alone.