TIDE: Text-Informed Dynamic Extrapolation with Step-Aware Temperature Control for Diffusion Transformers

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses structural degradation in high-resolution image extrapolation with Diffusion Transformers, which often arises from attention dilution and the difficulty of simultaneously preserving fine-grained semantics and suppressing artifacts. The authors propose a training-free text-to-image extrapolation method that introduces a novel text anchoring mechanism to dynamically balance textual and visual tokens, thereby mitigating cross-modal imbalance. Additionally, they design a step-aware dynamic temperature control strategy that adaptively modulates generation stability according to the spectral evolution characteristics of the diffusion process. The approach supports arbitrary resolutions and aspect ratios, seamlessly integrates with state-of-the-art models, and achieves significant improvements over prior extrapolation methods in both overall generation quality and fine-detail fidelity.

Technology Category

Application Category

📝 Abstract
Diffusion Transformer (DiT) faces challenges when generating images with higher resolution compared at training resolution, causing especially structural degradation due to attention dilution. Previous approaches attempt to mitigate this by sharpening attention distributions, but fail to preserve fine-grained semantic details and introduce obvious artifacts. In this work, we analyze the characteristics of DiTs and propose TIDE, a training-free text-to-image (T2I) extrapolation method that enables generation with arbitrary resolution and aspect ratio without additional sampling overhead. We identify the core factor for prompt information loss, and introduce a text anchoring mechanism to correct the imbalance between text and image tokens. To further eliminate artifacts, we design a dynamic temperature control mechanism that leverages the pattern of spectral progression in the diffusion process. Extensive evaluations demonstrate that TIDE delivers high-quality resolution extrapolation capability and integrates seamlessly with existing state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Transformer
resolution extrapolation
attention dilution
text-to-image generation
structural degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Transformer
resolution extrapolation
text anchoring
dynamic temperature control
training-free
Y
Yihua Liu
Independent Researcher
F
Fanjiang Ye
Rice University
B
Bowen Lin
University of Houston
R
Rongyu Fang
Independent Researcher
Chengming Zhang
Chengming Zhang
University of Houston
Deep learningNLPHPC