🤖 AI Summary
This study addresses the problem of constructing a nonparametric sequential hypothesis test with unit power when only historical offline data are available to implicitly define a null hypothesis and multiple alternative distributions. To this end, the work introduces— for the first time—a multiclass classifier into the sequential testing framework, proposing a procedure that controls the significance level at α and almost surely identifies the true underlying distribution. Under mild separability conditions, the method is theoretically shown to possess a tight upper bound on its stopping time, thereby achieving optimal stopping performance. The proposed approach naturally accommodates both distribution identification and scenarios involving train-test distribution mismatch. Its effectiveness is validated through comprehensive experiments on both synthetic and real-world datasets.
📝 Abstract
We consider the problem of constructing sequential power-one tests where the null and alternative classes are specified indirectly through historical or offline data. More specifically, given an offline dataset consisting of observations from $L+1$ distributions $\{P_0, P_1, \ldots, P_L\}$, and a new unlabeled data stream $\{X_t: t \geq 1\} \overset{i.i.d}{\sim} P_θ$, the goal is to decide between the null $H_0: θ= 0$, against the alternative $H_1: θ\in [L]:=\{1,\ldots,L\}$. Our main methodological contribution is a general approach for designing a level-$α$ power-one test for this problem using a multi-class classifier trained on the given offline dataset.
Working under a mild "separability" condition on the distributions and the trained classifier, we obtain an upper bound on the expected stopping time of our proposed level-$α$ test, and then show that in general this cannot be improved. In addition to rejecting the null, we show that our procedure can also identify the true underlying distribution almost surely. We then establish a sufficient condition to ensure the required separability of the classifier, and provide some converse results to investigate the role of the size of the offline dataset and the family of classifiers among classifier-based tests that satisfy the level-$α$ power-one criterion. Finally, we present an extension of our analysis for the training-and-testing distribution mismatch and illustrate an application to sequential change detection. Empirical results using both synthetic and real data provide support for our theoretical results.