Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image inpainting Transformers suffer from degraded convergence and generalization due to attention mechanisms that break translation equivariance. To address this, we propose TEAFormer—an Adaptive Translation-Equivariant Transformer—that introduces an adaptive sliding-index mechanism to dynamically select key-value pairs while incorporating globally aggregated information, thereby preserving strict translation equivariance without compromising between fixed receptive fields and computational overhead. TEAFormer further integrates sliding-window attention with stackable equivariant components to form an efficient, scalable equivariant architecture. Experiments demonstrate that TEAFormer significantly accelerates training convergence (1.8× faster on average) across diverse image restoration tasks and achieves state-of-the-art generalization performance under cross-dataset evaluation, validating the critical importance of explicit translation-equivariant modeling for low-level vision tasks.

Technology Category

Application Category

📝 Abstract
Translation equivariance is a fundamental inductive bias in image restoration, ensuring that translated inputs produce translated outputs. Attention mechanisms in modern restoration transformers undermine this property, adversely impacting both training convergence and generalization. To alleviate this issue, we propose two key strategies for incorporating translation equivariance: slide indexing and component stacking. Slide indexing maintains operator responses at fixed positions, with sliding window attention being a notable example, while component stacking enables the arrangement of translation-equivariant operators in parallel or sequentially, thereby building complex architectures while preserving translation equivariance. However, these strategies still create a dilemma in model design between the high computational cost of self-attention and the fixed receptive field associated with sliding window attention. To address this, we develop an adaptive sliding indexing mechanism to efficiently select key-value pairs for each query, which are then concatenated in parallel with globally aggregated key-value pairs. The designed network, called the Translation Equivariance Adaptive Transformer (TEAFormer), is assessed across a variety of image restoration tasks. The results highlight its superiority in terms of effectiveness, training convergence, and generalization.
Problem

Research questions and friction points this paper is trying to address.

Addressing loss of translation equivariance in restoration transformers
Balancing computational cost and fixed receptive field in attention
Improving image restoration model effectiveness and generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Slide indexing maintains fixed operator responses
Component stacking preserves translation equivariance
Adaptive sliding indexing selects key-value pairs efficiently
🔎 Similar Papers
No similar papers found.