Redbench: Workload Synthesis From Cloud Traces

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing TPC-style benchmarks (e.g., TPC-H/DS) fail to capture two critical characteristics of real-world cloud data warehouse workloads—query repetitiveness and string-intensive operations—leading to inaccurate system evaluation. To address this, we propose Redbench, a novel synthetic benchmark generation framework grounded in empirical cloud workload trace analysis. Rather than relying on superficial execution metrics, Redbench models intrinsic workload signals—including query pattern distributions, repetition cycles, and string-operation intensity—via integrated techniques for query pattern extraction, repetitiveness modeling, and string-aware enhancement. It enables end-to-end, reproducible, and customizable synthesis of realistic workloads from production traces. Experimental evaluation demonstrates that Redbench-generated workloads significantly improve fidelity and effectively expose performance disparities across four major commercial cloud data warehouses under diverse optimization strategies, thereby providing a high-fidelity, reproducible foundation for rigorous system assessment and optimization.

Technology Category

Application Category

📝 Abstract
Workload traces from cloud data warehouse providers reveal that standard benchmarks such as TPC-H and TPC-DS fail to capture key characteristics of real-world workloads, including query repetition and string-heavy queries. In this paper, we introduce Redbench, a novel benchmark featuring a workload generator that reproduces real-world workload characteristics derived from traces released by cloud providers. Redbench integrates multiple workload generation techniques to tailor workloads to specific objectives, transforming existing benchmarks into realistic query streams that preserve intrinsic workload characteristics. By focusing on inherent workload signals rather than execution-specific metrics, Redbench bridges the gap between synthetic and real workloads. Our evaluation shows that (1) Redbench produces more realistic and reproducible workloads for cloud data warehouse benchmarking, and (2) Redbench reveals the impact of system optimizations across four commercial data warehouse platforms. We believe that Redbench provides a crucial foundation for advancing research on optimization techniques for modern cloud data warehouses.
Problem

Research questions and friction points this paper is trying to address.

Standard benchmarks fail to capture real cloud workload characteristics
Redbench synthesizes realistic workloads from actual cloud provider traces
It bridges the gap between synthetic and real data warehouse workloads
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates realistic workloads from cloud traces
Transforms existing benchmarks into realistic query streams
Focuses on inherent workload signals not execution metrics
🔎 Similar Papers
No similar papers found.