🤖 AI Summary
This study investigates the mechanistic relationship between five-century evolution of Western painting and societal transformation. We propose a multimodal analytical framework based on latent-space disentanglement in Stable Diffusion, jointly modeling formal features (e.g., color, composition) and contextual semantics (e.g., subject matter, historical context) to enable cross-century style representation alignment and causal counterfactual inference. Our key contributions are threefold: (1) First empirical evidence that contextual semantic representations significantly outperform formal features in discriminating artistic movements, periods, and authors (p < 0.001); (2) A validated, interpretable causal model demonstrating “societal context drives visual generation”; (3) Precise identification of critical stylistic transition points—from Baroque to Modernism—faithfully reconstructing the 16th–20th century evolutionary trajectory and establishing statistically robust associations between sociocultural variables and visual expression patterns.
📝 Abstract
The rise of multimodal generative AI is transforming the intersection of technology and art, offering deeper insights into large-scale artwork. Although its creative capabilities have been widely explored, its potential to represent artwork in latent spaces remains underexamined. We use cutting-edge generative AI, specifically Stable Diffusion, to analyze 500 years of Western paintings by extracting two types of latent information with the model: formal aspects (e.g., colors) and contextual aspects (e.g., subject). Our findings reveal that contextual information differentiates between artistic periods, styles, and individual artists more successfully than formal elements. Additionally, using contextual keywords extracted from paintings, we show how artistic expression evolves alongside societal changes. Our generative experiment, infusing prospective contexts into historical artworks, successfully reproduces the evolutionary trajectory of artworks, highlighting the significance of mutual interaction between society and art. This study demonstrates how multimodal AI expands traditional formal analysis by integrating temporal, cultural, and historical contexts.