Multi-Metric Adaptive Experimental Design under Fixed Budget with Validation

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Online multi-metric A/B testing faces dual challenges: insufficient statistical power and unreliable causal effect estimation (e.g., Average Treatment Effect, ATE), especially under heteroscedastic variances and high-dimensional metrics. To address this, we propose a two-stage fixed-budget framework. In Stage I, adaptive exploration is performed via Sequential Halving (SH), enhanced by our novel SHRVar algorithm—integrating relative-variance-aware sampling with z-statistic–driven elimination. In Stage II, a classical A/B test is conducted on the surviving arm to ensure valid statistical inference. We theoretically establish exponential decay of error probability and, for the first time, unify and generalize SH/SHVar complexity bounds to the heteroscedastic multi-metric setting. Experiments demonstrate substantial improvements in both optimal-arm identification accuracy and ATE estimation precision, with particularly pronounced gains in high-dimensional, heteroscedastic scenarios.

Technology Category

Application Category

📝 Abstract
Standard A/B tests in online experiments face statistical power challenges when testing multiple candidates simultaneously, while adaptive experimental designs (AED) alone fall short in inferring experiment statistics such as the average treatment effect, especially with many metrics (e.g., revenue, safety) and heterogeneous variances. This paper proposes a fixed-budget multi-metric AED framework with a two-phase structure: an adaptive exploration phase to identify the best treatment, and a validation phase with an A/B test to verify the treatment's quality and infer statistics. We propose SHRVar, which generalizes sequential halving (SH) (Karnin et al., 2013) with a novel relative-variance-based sampling and an elimination strategy built on reward z-values. It achieves a provable error probability that decreases exponentially, where the exponent generalizes the complexity measure for SH (Karnin et al., 2013) and SHVar (Lalitha et al., 2023) with homogeneous and heterogeneous variances, respectively. Numerical experiments verify our analysis and demonstrate the superior performance of this new framework.
Problem

Research questions and friction points this paper is trying to address.

Addresses statistical power challenges in multi-candidate A/B tests
Proposes adaptive design for accurate treatment effect inference
Introduces SHRVar for improved performance with heterogeneous variances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase adaptive and validation framework
Relative-variance-based sampling strategy
Exponentially decreasing error probability
🔎 Similar Papers
No similar papers found.