🤖 AI Summary
In reinforcement learning applications such as healthcare, observational data often exhibit intra-cluster correlation—e.g., repeated measurements from the same patient—violating the standard i.i.d. assumption and degrading policy evaluation and optimization.
Method: We propose Generalized Fitted Q-Iteration (G-FQI), the first algorithm to integrate Generalized Estimating Equations (GEE) into the FQI framework, explicitly modeling clustering structure in state–action value function estimation.
Contribution/Results: G-FQI achieves optimal statistical efficiency under correct specification of the correlation structure and retains parameter consistency under misspecification, substantially improving robustness. Its convergence and asymptotic normality are theoretically guaranteed. Empirical evaluations on synthetic benchmarks and real-world mobile health data demonstrate that G-FQI reduces cumulative regret by 50% on average compared to standard FQI, while markedly enhancing both policy performance and stability.
📝 Abstract
This paper focuses on reinforcement learning (RL) with clustered data, which is commonly encountered in healthcare applications. We propose a generalized fitted Q-iteration (FQI) algorithm that incorporates generalized estimating equations into policy learning to handle the intra-cluster correlations. Theoretically, we demonstrate (i) the optimalities of our Q-function and policy estimators when the correlation structure is correctly specified, and (ii) their consistencies when the structure is mis-specified. Empirically, through simulations and analyses of a mobile health dataset, we find the proposed generalized FQI achieves, on average, a half reduction in regret compared to the standard FQI.