🤖 AI Summary
Environmental configuration significantly impacts AI system stability, yet its multidimensional influence mechanisms remain poorly understood. Method: This study conducts the first systematic investigation into how operating systems, Python versions, and CPU architectures affect stability across 30 open-source AI systems, using large-scale cross-configuration experiments on Travis CI. We jointly evaluate three critical metrics: model accuracy, inference latency, and cloud resource consumption. Contribution/Results: Experimental results reveal that environmental changes perturb inference time and resource usage far more severely than model accuracy—instability rates under Linux/macOS transitions reach 23%, 96.7%, and 100% for accuracy, latency, and resource consumption, respectively. Based on these findings, we propose the first quantitative framework for configuration sensitivity analysis, providing empirical evidence and methodological support for environment selection and stability assurance in AI system deployment.
📝 Abstract
Nowadays, software systems tend to include Artificial Intelligence (AI) components. Changes in the operational environment have been known to negatively impact the stability of AI-enabled software systems by causing unintended changes in behavior. However, how an environment configuration impacts the behavior of such systems has yet to be explored. Understanding and quantifying the degree of instability caused by different environment settings can help practitioners decide the best environment configuration for the most stable AI systems. To achieve this goal, we performed experiments with eight different combinations of three key environment variables (operating system, Python version, and CPU architecture) on $30$ open-source AI-enabled systems using the Travis CI platform. We determine the existence and the degree of instability introduced by each configuration using three metrics: the output of an AI component of the system (model performance), the time required to build and run the system (processing time), and the cost associated with building and running the system (expense). Our results indicate that changes in environment configurations lead to instability across all three metrics; however, it is observed more frequently with respect to processing time and expense rather than model performance. For example, between Linux and MacOS, instability is observed in 23%, 96.67%, and 100% of the studied projects in model performance, processing time, and expense, respectively. Our findings underscore the importance of identifying the optimal combination of configuration settings to mitigate drops in model performance and reduce the processing time and expense before deploying an AI-enabled system.