Attention of a Kiss: Exploring Attention Maps in Video Diffusion for XAIxArts

📅 2025-08-30

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Cross-attention mechanisms in video diffusion models (e.g., Wan) remain poorly interpretable, hindering artistic understanding and control over text-to-video generation. Method: We propose a novel explainability paradigm for artistic practice, systematically extracting and visualizing spatiotemporal cross-attention maps during generation—treating them both as analytical tools for model behavior and as direct creative素材. Through targeted probe experiments and diverse artistic case studies, we validate their dual utility in revealing semantic alignment mechanisms and enabling creative intervention. Contribution/Results: (1) We introduce XAIxArts—the first explainable AI (XAI) framework that integrates attention maps directly into artistic workflows; (2) We establish a substantive pathway for leveraging generative models’ internal representations as expressive creative media; (3) We provide artists with a new technical interface that supports human-in-the-loop, interpretable, and controllable AI generation. This work bridges XAI and digital art practice, advancing both model transparency and creative agency.

Technology Category

Application Category

📝 Abstract

This paper presents an artistic and technical investigation into the attention mechanisms of video diffusion transformers. Inspired by early video artists who manipulated analog video signals to create new visual aesthetics, this study proposes a method for extracting and visualizing cross-attention maps in generative video models. Built on the open-source Wan model, our tool provides an interpretable window into the temporal and spatial behavior of attention in text-to-video generation. Through exploratory probes and an artistic case study, we examine the potential of attention maps as both analytical tools and raw artistic material. This work contributes to the growing field of Explainable AI for the Arts (XAIxArts), inviting artists to reclaim the inner workings of AI as a creative medium.

Problem

Research questions and friction points this paper is trying to address.

Extracting attention maps in video diffusion models

Visualizing temporal and spatial attention behavior

Exploring attention mechanisms as artistic analytical tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracting cross-attention maps in video diffusion

Visualizing temporal and spatial attention behavior

Using attention maps as artistic analytical tools

🔎 Similar Papers

No similar papers found.