AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative Probe

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Current video generation models suffer from a lack of interpretability, which hinders artists’ understanding and creative control. This work proposes AttentionBender, the first approach to extend the concept of Network Bending to the cross-attention layers of video diffusion Transformers by applying 2D geometric transformations—such as rotation, scaling, and translation—to enable intuitive intervention in the generative process. The method functions both as an interpretable AI probe and as a novel tool for aesthetic creation. Through visual analysis of over 4,500 generated video clips, the study reveals the highly entangled nature of cross-attention mechanisms and demonstrates how such interventions produce distributed distortions and glitch aesthetics that transcend the original representational space.

Technology Category

Application Category

📝 Abstract

We present AttentionBender, a tool that manipulates cross-attention in Video Diffusion Transformers to help artists probe the internal mechanics of black-box video generation. While generative outputs are increasingly realistic, prompt-only control limits artists' ability to build intuition for the model's material process or to work beyond its default tendencies. Using an autobiographical research-through-design approach, we built on Network Bending to design AttentionBender, which applies 2D transforms (rotation, scaling, translation, etc.) to cross-attention maps to modulate generation. We assess AttentionBender by visualizing 4,500+ video generations across prompts, operations, and layer targets. Our results suggest that cross-attention is highly entangled: targeted manipulations often resist clean, localized control, producing distributed distortions and glitch aesthetics over linear edits. AttentionBender contributes a tool that functions both as an Explainable AI style probe of transformer attention mechanisms, and as a creative technique for producing novel aesthetics beyond the model's learned representational space.

Problem

Research questions and friction points this paper is trying to address.

video generation

prompt control

black-box models

artist intuition

generative limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-attention manipulation

video diffusion transformers

creative probe