🤖 AI Summary
Exponential growth in AI workloads exacerbates energy consumption challenges, yet existing energy-efficiency optimizations remain reactive, isolated “knob-tuning” approaches lacking holistic system design. Method: This work pioneers energy efficiency as a first-class, cross-stack design principle—spanning data, models, training, systems, and inference—and introduces an orthogonal, multi-stage optimization framework that coherently integrates quantization, pruning, and hardware-aware adaptation. Unlike conventional post-hoc tuning, it systematically models combinatorial effects among optimization knobs to enable cascaded energy savings. Contribution/Results: End-to-end optimization achieves up to 94.6% energy reduction while preserving 95.95% of the original F1 score. The approach shifts green AI development from empirical parameter tuning toward proactive, energy-driven design, delivering a reusable methodology and practical paradigm for sustainable AI.
📝 Abstract
AI's exponential growth intensifies computational demands and energy challenges. While practitioners employ various optimization techniques, that we refer as "knobs" in this paper, to tune model efficiency, these are typically afterthoughts and reactive ad-hoc changes applied in isolation without understanding their combinatorial effects on energy efficiency. This paper emphasizes on treating energy efficiency as the first-class citizen and as a fundamental design consideration for a compute-intensive pipeline. We show that strategic selection across five AI pipeline phases (data, model, training, system, inference) creates cascading efficiency. Experimental validation shows orthogonal combinations reduce energy consumption by up to $94.6$% while preserving $95.95$% of the original F1 score of non-optimized pipelines. This curated approach provides actionable frameworks for informed sustainable AI that balance efficiency, performance, and environmental responsibility.