TUNA: Tuning Unstable and Noisy Cloud Applications

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Performance noise in cloud environments impedes automatic database configuration tuning, causing slow convergence, unstable configurations, and frequent selection of spurious optima—configurations exhibiting >30% post-deployment performance degradation. To address this, we propose a novel tuning framework that synergistically integrates statistical anomaly detection, machine learning–based performance denoising, Bayesian optimization, and cloud-native observability analysis—the first to jointly orchestrate these components. Our method identifies anomalous configurations, refines noisy performance measurements via regression modeling, and guides robust optimization using cleaned signals. Evaluated on PostgreSQL under the production mssales workload, it achieves 1.88× reduction in average query latency, 2.58× lower performance standard deviation, and 2.5× faster convergence under 5% noise. This establishes a verifiable, noise-resilient paradigm for reliable automated tuning in high-noise cloud deployments.

Technology Category

Application Category

📝 Abstract

Autotuning plays a pivotal role in optimizing the performance of systems, particularly in large-scale cloud deployments. One of the main challenges in performing autotuning in the cloud arises from performance variability. We first investigate the extent to which noise slows autotuning and find that as little as $5%$ noise can lead to a $2.5$x slowdown in converging to the best-performing configuration. We measure the magnitude of noise in cloud computing settings and find that while some components (CPU, disk) have almost no performance variability, there are still sources of significant variability (caches, memory). Furthermore, variability leads to autotuning finding emph{unstable} configurations. As many as $63.3%$ of the configurations selected as"best"during tuning can have their performance degrade by $30%$ or more when deployed. Using this as motivation, we propose a novel approach to improve the efficiency of autotuning systems by (a) detecting and removing outlier configurations and (b) using ML-based approaches to provide a more stable emph{true} signal of de-noised experiment results to the optimizer. The resulting system, TUNA (underline{T}uning underline{U}nstable and underline{N}oisy Cloud underline{A}pplications) enables faster convergence and robust configurations. Tuning postgres running exttt{mssales}, an enterprise production workload, we find that TUNA can lead to $1.88$x lower running time on average with $2.58x$ lower standard deviation compared to traditional sampling methodologies.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance variability in cloud autotuning systems.

Identifies unstable configurations causing significant performance degradation.

Proposes ML-based methods to improve autotuning efficiency and stability.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects and removes outlier configurations

Uses ML-based approaches for de-noised signals

Enables faster convergence and robust configurations

🔎 Similar Papers

No similar papers found.