🤖 AI Summary
This work addresses the challenges faced by learning-based systems in heterogeneous, dynamic, and long-running environments, where environmental shifts often lead to high retraining costs, substantial labeling overhead, performance degradation, and sluggish adaptation. To tackle these issues, the paper introduces EMA—a lightweight, adaptive framework that supports diverse system and model architectures through a system-driven, data-centric approach. EMA employs a state transformer to align representations between old and new environments, enabling warm-start model adaptation, and intelligently prioritizes high-utility data samples for labeling based on their expected contribution to performance. Experimental evaluation across eight representative systems demonstrates that EMA reduces adaptation costs—such as GPU training time—by 14.9% to 42.4% while simultaneously improving system performance metrics, including network throughput, by 6.9% to 31.3%.
📝 Abstract
Machine learning (ML) is increasingly applied to optimize system performance in tasks such as resource management and network simulation. Unlike traditional ML tasks (e.g., image classification), networked systems often operate in heterogeneous, long-running, and dynamic environment states, where input conditions (e.g., network loads) and operational objectives can shift over time and across settings. Existing learning-based systems offer little support for adaptation, resulting in costly model training, extensive data collection, degraded system performance, and slow responsiveness.
This paper presents EMA, the first model adaptation system supporting learning-based systems to adapt to evolving environments with minimal operational overhead. EMA takes a system-driven, data-centric approach that accommodates diverse system and model designs while addressing two key deployment challenges. First, it reduces expensive model training by introducing state transformers that align the input state of a new environment with previously similar states, allowing models to warm-start adaptation. Second, it addresses the often-overlooked yet costly process of data labeling--collecting ground truth for exploring and training on various system decisions--by prioritizing labeling high-utility data while balancing the tradeoff between training and labeling cost. Evaluations on eight representative learning-based systems show that EMA reduces adaptation costs (e.g., GPU training time) by 14.9-42.4% while improving system performance (e.g., network throughput) by 6.9-31.3%.