Enhancing Text-to-Image Generation via End-Edge Collaborative Hybrid Super-Resolution

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the trade-off between fidelity and latency in high-resolution text-to-image generation under resource-constrained edge computing scenarios. The authors propose a novel edge-cloud collaborative, region-aware hybrid super-resolution framework: the edge device first generates a low-resolution image, then applies a diffusion model to meticulously restore details in foreground regions while employing a lightweight learning model for efficient upsampling of background regions, ultimately fusing both to produce a high-resolution output. This approach uniquely integrates the strengths of diffusion models and lightweight architectures to jointly optimize generation quality and computational efficiency. Experimental results demonstrate that the proposed method reduces service latency by 33% compared to baseline approaches while maintaining competitive image quality.

Technology Category

Application Category

📝 Abstract
Artificial Intelligence-Generated Content (AIGC) has made significant strides, with high-resolution text-to-image (T2I) generation becoming increasingly critical for improving users'Quality of Experience (QoE). Although resource-constrained edge computing adequately supports fast low-resolution T2I generations, achieving high-resolution output still faces the challenge of ensuring image fidelity at the cost of latency. To address this, we first investigate the performance of super-resolution (SR) methods for image enhancement, confirming a fundamental trade-off that lightweight learning-based SR struggles to recover fine details, while diffusion-based SR achieves higher fidelity at a substantial computational cost. Motivated by these observations, we propose an end-edge collaborative generation-enhancement framework. Upon receiving a T2I generation task, the system first generates a low-resolution image based on adaptively selected denoising steps and super-resolution scales at the edge side, which is then partitioned into patches and processed by a region-aware hybrid SR policy. This policy applies a diffusion-based SR model to foreground patches for detail recovery and a lightweight learning-based SR model to background patches for efficient upscaling, ultimately stitching the enhanced ones into the high-resolution image. Experiments show that our system reduces service latency by 33% compared with baselines while maintaining competitive image quality.
Problem

Research questions and friction points this paper is trying to address.

text-to-image generation
super-resolution
edge computing
image fidelity
latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

end-edge collaboration
hybrid super-resolution
text-to-image generation
region-aware processing
diffusion-based super-resolution
🔎 Similar Papers
No similar papers found.
C
Chongbin Yi
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Y
Yuxin Liang
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China
Ziqi Zhou
Ziqi Zhou
Huazhong University of Science and Technology (HUST)
Trustworthy AI
P
Peng Yang
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China