Learning to Bet for Horizon-Aware Anytime-Valid Testing

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of constructing anytime-valid confidence sequences for bounded means under a fixed time horizon \(N\), incorporating explicit awareness of the remaining time. The authors formulate the betting problem as a finite-horizon optimal control problem with state space \((t, \log W_t)\), where \(W_t\) denotes the wealth at time \(t\). They introduce a phase-diagram-based state-partitioning theory that delineates the regimes in which Kelly, fractional Kelly, and aggressive betting strategies are optimal. Furthermore, they propose the first application of a general deep Q-network (DQN) to this setting, trained on synthetic data to enable adaptive betting across varying horizons and null hypotheses. Experimental results demonstrate that the learned policy significantly outperforms existing methods within finite time horizons, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
We develop horizon-aware anytime-valid tests and confidence sequences for bounded means under a strict deadline $N$. Using the betting/e-process framework, we cast horizon-aware betting as a finite-horizon optimal control problem with state space $(t, \log W_t)$, where $t$ is the time and $W_t$ is the test martingale value. We first show that in certain interior regions of the state space, policies that deviate significantly from Kelly betting are provably suboptimal, while Kelly betting reaches the threshold with high probability. We then identify sufficient conditions showing that outside this region, more aggressive betting than Kelly can be better if the bettor is behind schedule, and less aggressive can be better if the bettor is ahead. Taken together these results suggest a simple phase diagram in the $(t, \log W_t)$ plane, delineating regions where Kelly, fractional Kelly, and aggressive betting may be preferable. Guided by this phase diagram, we introduce a Deep Reinforcement Learning approach based on a universal Deep Q-Network (DQN) agent that learns a single policy from synthetic experience and maps simple statistics of past observations to bets across horizons and null values. In limited-horizon experiments, the learned DQN policy yields state-of-the-art results.
Problem

Research questions and friction points this paper is trying to address.

anytime-valid testing
horizon-aware
confidence sequences
bounded means
optimal betting
Innovation

Methods, ideas, or system contributions that make the work stand out.

horizon-aware testing
anytime-valid inference
Kelly betting
optimal control
Deep Q-Network