🤖 AI Summary
This paper addresses finite-sample inference for the average treatment effect (ATE) on binary outcomes in stratified randomized experiments. We propose three exact inference methods: (1) conservative confidence intervals based on Bonferroni correction; (2) permutation-based p-value maximization coupled with inversion; and (3) a stratified composite permutation test. To our knowledge, this is the first systematic extension of exact inference to stratified binary-response settings. A key contribution is a computationally efficient permutation-inversion algorithm, which leverages weighted difference analysis and structured enumeration to reduce worst-case complexity from $O(prod_k n_k^4)$ to $Oig(sum_k n_k cdot prod_k n_k^2ig)$. Simulation and empirical studies demonstrate that the permutation-inversion method achieves superior statistical power—particularly under stratification balance—significantly enhancing both feasibility and practical applicability.
📝 Abstract
We extend methods for finite-sample inference about the average treatment effect (ATE) in randomized experiments with binary outcomes to accommodate stratification (blocking). We present three valid methods that differ in their computational and statistical efficiency. The first method constructs conservative, Bonferroni-adjusted confidence intervals separately for the mean response in the treatment and control groups in each stratum, then takes appropriate weighted differences of their endpoints to find a confidence interval for the ATE. The second method inverts permutation tests for the overall ATE, maximizing the $P$-value over all ways a given ATE can be attained. The third method applies permutation tests for the ATE in separate strata, then combines those tests to form a confidence interval for the overall ATE. We compare the statistical and computational performance of the methods using simulations and a case study. The second approach is most efficient statistically in the simulations, but a naive implementation requires O(Π_{k=1}^{K} n_{k}^{4}) permutation tests, the highest computational burden among the three methods. That computational burden can be reduced to O(sum_{k=1}^K n_k imesΠ_{k=1}^{K} n_{k}^{2}) if all strata are balanced and to O(Π_{k=1}^{K} n_{k}^{3}) otherwise.