🤖 AI Summary
To address the challenges of high QoS requirements, resource heterogeneity, cold-start latency, and the inherent trade-off between inference delay and generation quality in AIGC tasks at the edge, this paper proposes a collaborative scheduling algorithm based on attention-guided diffusion reinforcement learning. It is the first work to incorporate diffusion models into AIGC task scheduling: an attention mechanism dynamically perceives real-time load and queue states across heterogeneous edge servers, guiding a policy network to perform fine-grained task partitioning and cross-server collaborative inference, while enabling model reuse and adaptive load balancing. Experiments demonstrate that our approach reduces inference latency by up to 56% over baseline methods, significantly improves resource utilization and system throughput, and maintains high generation fidelity. This work establishes a novel paradigm for efficient, quality-aware AIGC service delivery in heterogeneous edge computing environments.
📝 Abstract
The growth of Artificial Intelligence (AI) and large language models has enabled the use of Generative AI (GenAI) in cloud data centers for diverse AI-Generated Content (AIGC) tasks. Models like Stable Diffusion introduce unavoidable delays and substantial resource overhead, which are unsuitable for users at the network edge with high QoS demands. Deploying AIGC services on edge servers reduces transmission times but often leads to underutilized resources and fails to optimally balance inference latency and quality. To address these issues, this paper introduces a QoS-aware underline{E}dge-collaborative underline{A}IGC underline{T}ask scheduling (EAT) algorithm. Specifically: 1) We segment AIGC tasks and schedule patches to various edge servers, formulating it as a gang scheduling problem that balances inference latency and quality while considering server heterogeneity, such as differing model distributions and cold start issues. 2) We propose a reinforcement learning-based EAT algorithm that uses an attention layer to extract load and task queue information from edge servers and employs a diffusion-based policy network for scheduling, efficiently enabling model reuse. 3) We develop an AIGC task scheduling system that uses our EAT algorithm to divide tasks and distribute them across multiple edge servers for processing. Experimental results based on our system and large-scale simulations show that our EAT algorithm can reduce inference latency by up to 56% compared to baselines. We release our open-source code at https://github.com/zzf1955/EAT.