🤖 AI Summary
This work addresses the trade-off between fidelity and latency in high-resolution text-to-image generation under resource-constrained edge computing scenarios. The authors propose a novel edge-cloud collaborative, region-aware hybrid super-resolution framework: the edge device first generates a low-resolution image, then applies a diffusion model to meticulously restore details in foreground regions while employing a lightweight learning model for efficient upsampling of background regions, ultimately fusing both to produce a high-resolution output. This approach uniquely integrates the strengths of diffusion models and lightweight architectures to jointly optimize generation quality and computational efficiency. Experimental results demonstrate that the proposed method reduces service latency by 33% compared to baseline approaches while maintaining competitive image quality.
📝 Abstract
Artificial Intelligence-Generated Content (AIGC) has made significant strides, with high-resolution text-to-image (T2I) generation becoming increasingly critical for improving users'Quality of Experience (QoE). Although resource-constrained edge computing adequately supports fast low-resolution T2I generations, achieving high-resolution output still faces the challenge of ensuring image fidelity at the cost of latency. To address this, we first investigate the performance of super-resolution (SR) methods for image enhancement, confirming a fundamental trade-off that lightweight learning-based SR struggles to recover fine details, while diffusion-based SR achieves higher fidelity at a substantial computational cost. Motivated by these observations, we propose an end-edge collaborative generation-enhancement framework. Upon receiving a T2I generation task, the system first generates a low-resolution image based on adaptively selected denoising steps and super-resolution scales at the edge side, which is then partitioned into patches and processed by a region-aware hybrid SR policy. This policy applies a diffusion-based SR model to foreground patches for detail recovery and a lightweight learning-based SR model to background patches for efficient upscaling, ultimately stitching the enhanced ones into the high-resolution image. Experiments show that our system reduces service latency by 33% compared with baselines while maintaining competitive image quality.