$E^3$-Agent: An Executable and Evolving Agent for Resource Management of Edge Generative Inference

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generative inference at the edge, where unknown and time-varying device and model performance render static resource management strategies ineffective in dynamic environments. To overcome this, the paper proposes $E^3$-Agent, which innovatively integrates a fast-slow path architecture: a millisecond-level fast router enables low-latency scheduling, while an event-driven LLM-based meta-controller continuously learns evolving performance mappings through tool interfaces, risk gating, and online feedback mechanisms. Evaluated across diverse dynamic scenarios, the approach reduces average latency by 65%–73% compared to the best static baselines, closely approaching the performance of a full-information oracle (within only 7%–10%) and significantly mitigating stalls caused by semantic degradation.
📝 Abstract
Edge deployments of generative inference increasingly face two practical realities: per-device per-model performance is often unknown at deployment time, and it is non-stationary due to user-driven semantic events, background load, and device churn. Consequently, a resource manager that is tuned offline under a fixed regime can become brittle and expensive to maintain. This paper presents $E^3$-Agent, an executable and evolving agent for edge artificial intelligence generated content (AIGC) resource management. $E^3$-Agent separates a fast-path router that makes millisecond-level dispatch decisions from a slow-path, event-driven large language model (LLM) meta-controller that mitigates regime shifts through a small, explicit control surface exposed via a tool interface, including risk gating, router configuration, and rapid performance calibration. The agent learns online from execution feedback and continuously adapts to unknown and time-varying service-time mappings. We evaluate $E^3$-Agent in a discrete-event simulator driven by MLPerf-derived device-model measurement priors, covering cold-start warmup and three dynamic regimes: semantic dynamics, device churn, and hidden drift. Across the dynamic scenarios, $E^3$-Agent reduces average latency by 65%-73% compared to the best static baseline, stays within 7%-10% of an online full-information Oracle used for evaluation, and effectively suppresses stutter rate under semantic degradation.
Problem

Research questions and friction points this paper is trying to address.

edge generative inference
non-stationary performance
resource management
device churn
semantic dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

executable agent
evolving resource management
edge generative inference
LLM meta-controller
online adaptation
🔎 Similar Papers
No similar papers found.