HilbertA: Hilbert Attention for Image Generation with Diffusion Models

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of designing sparse attention for 2D image generation in diffusion models—where balancing spatial locality and GPU memory access efficiency remains difficult—this paper proposes HilbertA, a hardware-friendly 2D sparse attention mechanism. Its core innovation lies in applying the Hilbert curve to spatially preservingly reorder image tokens, enabling efficient locality-aware computation while supporting long-range interactions and inter-block communication via a sliding-window scheduling strategy and a center-shared region design. All components are implemented in Triton to ensure memory-contiguous access and compute-aligned GPU acceleration. Experiments demonstrate that HilbertA achieves 2.3× and 4.17× speedups in attention computation at 1024×1024 and 2048×2048 resolutions, respectively, while maintaining image quality on par with or superior to state-of-the-art sparse attention methods.

Technology Category

Application Category

📝 Abstract
Designing sparse attention for diffusion transformers requires reconciling two-dimensional spatial locality with GPU efficiency, a trade-off that current methods struggle to achieve. Existing approaches enforce two-dimensional spatial locality but often incur uncoalesced memory access. We present HilbertA, a 2D-aware and GPU-efficient sparse attention mechanism. HilbertA reorders image tokens along Hilbert curves to achieve a contiguous memory layout while preserving spatial neighborhoods, and employs a sliding schedule across layers to enable long-range information propagation without repeated or uncoalesced memory access. To further enhance cross-tile communication and positional awareness, HilbertA introduces a small central shared region. Implemented in Triton, HilbertA delivers comparable image quality with significant acceleration over prior methods on Flux.1-dev, demonstrating the feasibility of hardware-aligned two-dimensional sparse attention for high-resolution image generation. HilbertA delivers attention speedups of $2.3 imes$ when generating $1024 imes 1024$ images, and up to $4.17 imes$ at $2048 imes 2048$, while achieving image quality comparable to or surpassing baselines.
Problem

Research questions and friction points this paper is trying to address.

Designing GPU-efficient sparse attention for diffusion transformers
Reconciling 2D spatial locality with coalesced memory access
Enabling long-range information propagation without memory inefficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

HilbertA uses Hilbert curves for efficient memory layout
It employs sliding schedule for long-range information propagation
It introduces central shared region for positional awareness
🔎 Similar Papers
No similar papers found.