Texture Image Synthesis Using Spatial GAN Based on Vision Transformers

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously achieving visual realism and structural consistency in synthesizing complex or irregular texture images, this paper proposes ViT-SGAN, a hybrid generative framework. The method innovatively incorporates mean-variance statistical features and texton-based texture descriptors into the self-attention mechanism of Vision Transformers (ViT), thereby constructing a texture-aware attention module that enhances spatial dependency modeling. Furthermore, it tightly couples ViT with a spatially aware Generative Adversarial Network (Spatial GAN) to jointly represent local details and global structure. Quantitative evaluation demonstrates that ViT-SGAN consistently outperforms state-of-the-art methods across multiple metrics—including FID, Inception Score (IS), SSIM, and LPIPS—particularly excelling in generating high-complexity irregular textures, where it achieves marked improvements in structural coherence and visual fidelity.

Technology Category

Application Category

📝 Abstract
Texture synthesis is a fundamental task in computer vision, whose goal is to generate visually realistic and structurally coherent textures for a wide range of applications, from graphics to scientific simulations. While traditional methods like tiling and patch-based techniques often struggle with complex textures, recent advancements in deep learning have transformed this field. In this paper, we propose ViT-SGAN, a new hybrid model that fuses Vision Transformers (ViTs) with a Spatial Generative Adversarial Network (SGAN) to address the limitations of previous methods. By incorporating specialized texture descriptors such as mean-variance (mu, sigma) and textons into the self-attention mechanism of ViTs, our model achieves superior texture synthesis. This approach enhances the model's capacity to capture complex spatial dependencies, leading to improved texture quality that is superior to state-of-the-art models, especially for regular and irregular textures. Comparison experiments with metrics such as FID, IS, SSIM, and LPIPS demonstrate the substantial improvement of ViT-SGAN, which underlines its efficiency in generating diverse realistic textures.
Problem

Research questions and friction points this paper is trying to address.

Improve texture synthesis quality
Fuse Vision Transformers with GAN
Capture complex spatial dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

ViT-SGAN hybrid model
Vision Transformers with SGAN
Enhanced texture synthesis quality
🔎 Similar Papers
No similar papers found.