š¤ AI Summary
Current AI sustainability evaluations lack standardized, model-agnostic long-term assessment protocols, remaining largely confined to short-term batch learning settings and failing to capture the resourceāperformance trade-offs inherent in real-world system lifecycles.
Method: We propose the first general long-term sustainability evaluation framework applicable across learning paradigmsābatch and streamingāincorporating dynamic data evolution simulation, multi-round model updating tracking, fine-grained resource monitoring, and cross-model comparative experiments, validated on classification tasks.
Contribution/Results: (1) A model-agnostic long-term evaluation protocol; (2) Empirical evidence that higher environmental cost does not necessarily yield substantial performance gains; (3) Significant inter-model variation in long-term sustainability, demonstrating that conventional static evaluation risks misleading deployment decisions; (4) A reproducible, comparable benchmark for green AI assessment.
š Abstract
Sustainability and efficiency have become essential considerations in the development and deployment of Artificial Intelligence systems, yet existing regulatory and reporting practices lack standardized, model-agnostic evaluation protocols. Current assessments often measure only short-term experimental resource usage and disproportionately emphasize batch learning settings, failing to reflect real-world, long-term AI lifecycles. In this work, we propose a comprehensive evaluation protocol for assessing the long-term sustainability of ML models, applicable to both batch and streaming learning scenarios. Through experiments on diverse classification tasks using a range of model types, we demonstrate that traditional static train-test evaluations do not reliably capture sustainability under evolving data and repeated model updates. Our results show that long-term sustainability varies significantly across models, and in many cases, higher environmental cost yields little performance benefit.