Graph Neural Networks for Knowledge Enhanced Visual Representation of Paintings

📅 2021-05-17

🏛️ ACM Multimedia

📈 Citations: 11

✨ Influential: 1

🤖 AI Summary

To address the visual-semantic representation disjunction, severe semantic bottlenecks, and scarce labeled data in multi-task fine-grained analysis of artworks (i.e., style classification, artist attribution, dating estimation, and tag generation), this paper proposes ArtSAGENet—the first multimodal architecture integrating Graph Neural Networks (GNNs) into fine-grained art analysis. ArtSAGENet jointly extracts visual features via CNNs and models structured artist–artwork relationships using GNNs, while enforcing cross-modal alignment between vision and semantics through a knowledge-graph-guided mechanism. Trained end-to-end on multiple tasks, it significantly reduces data and computational requirements (10× faster training) yet consistently outperforms strong CNN baselines across all four tasks, achieving state-of-the-art performance. Moreover, the model exhibits superior generalization capability and inherent interpretability through its graph-based relational reasoning and knowledge-grounded alignment.

📝 Abstract

We propose ArtSAGENet, a novel multimodal architecture that integrates Graph Neural Networks (GNNs) and Convolutional Neural Networks (CNNs), to jointly learn visual and semantic-based artistic representations. First, we illustrate the significant advantages of multi-task learning for fine art analysis and argue that it is conceptually a much more appropriate setting in the fine art domain than the single-task alternatives. We further demonstrate that several GNN architectures can outperform strong CNN baselines in a range of fine art analysis tasks, such as style classification, artist attribution, creation period estimation, and tag prediction, while training them requires an order of magnitude less computational time and only a small amount of labeled data. Finally, through extensive experimentation we show that our proposed ArtSAGENet captures and encodes valuable relational dependencies between the artists and the artworks, surpassing the performance of traditional methods that rely solely on the analysis of visual content. Our findings underline a great potential of integrating visual content and semantics for fine art analysis and curation.

Problem

Research questions and friction points this paper is trying to address.

Integrating visual and semantic artistic representations using GNNs and CNNs

Advancing multi-task learning for fine art analysis over single-task approaches

Enhancing relational dependencies between artists and artworks for better analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates GNNs and CNNs for art analysis

Multi-task learning enhances fine art representation

Captures artist-artwork relational dependencies effectively

🔎 Similar Papers

No similar papers found.

Authors to Follow