🤖 AI Summary
This work addresses the challenge of efficiently tuning machine learning systems, whose configuration spaces are vast, heterogeneous, and governed by hierarchical conditional dependencies. The authors propose a general-purpose automated configuration method that formulates the problem as a mixed discrete-continuous optimization task with conditional constraints, enabling joint optimization of model architectures and execution parameters within a unified framework. Their approach integrates hierarchical parameter modeling, adaptive feature prioritization, multi-fidelity simulation, and a hybrid optimization algorithm to simultaneously handle sparse structural choices and dense hyperparameter tuning while dynamically balancing exploration cost and benefit. Experimental results demonstrate that the method consistently discovers high-performance configurations across diverse models, hardware platforms, and tasks, achieving 2.7–3.0× faster training compared to expert-tuned baselines.
📝 Abstract
Machine learning (ML) systems expose a rapidly expanding configuration space spanning model-parallelism strategies, communication optimizations, and low-level runtime parameters. End-to-end system efficiency is highly sensitive to these choices, yet identifying high-performance configurations is challenging due to heterogeneous feature types (e.g., sparse and dense parameters), conditional dependencies (e.g., valid execution parameters only under specific upstream decisions), and the high search (profiling) cost. Existing approaches either optimize a narrow subset of configuration dimensions or rely on ad-hoc heuristics that fail to generalize as configuration spaces continue to grow. We present AutoScout, a general-purpose systems configurator for ML training, fine-tuning, and inference. It formulates the system configuration as a mixed-discrete/continuous optimization problem with hierarchical dependencies and introduces a hybrid optimization framework that jointly refines sparse structural decisions and dense execution parameters. To reduce profiling cost, AutoScout adaptively prioritizes high-impact configuration features and ensembles simulators with varying fidelity. Across diverse models, hardware platforms, and deployment objectives, AutoScout consistently identifies high-performance configurations, achieving 2.7-3.0$\times$ training speedup over expert-tuned settings.