🤖 AI Summary
This study addresses the limitation of low-dimensional state spaces in adaptive health interventions—particularly for smoking cessation support and physical activity promotion—which impedes personalization. We propose a novel paradigm integrating large language models (LLMs) with reinforcement learning (RL): a pretrained LLM automatically extracts high-dimensional semantic state features from participants’ free-text descriptions, overcoming the bottleneck of conventional structured, low-dimensional state representations; and a joint text-generation-and-policy-learning simulation environment enables online policy optimization grounded in natural-language states. Evaluated in a physical activity intervention simulation, our approach significantly improves policy performance without compromising sample efficiency. The core contribution is the first deep integration of LLMs into RL state modeling, achieving a unified balance among expressive state representation, interpretability, and sample efficiency.
📝 Abstract
The use of reinforcement learning (RL) methods to support health behavior change via personalized and just-in-time adaptive interventions is of significant interest to health and behavioral science researchers focused on problems such as smoking cessation support and physical activity promotion. However, RL methods are often applied to these domains using a small collection of context variables to mitigate the significant data scarcity issues that arise from practical limitations on the design of adaptive intervention trials. In this paper, we explore an approach to significantly expanding the state space of an adaptive intervention without impacting data efficiency. The proposed approach enables intervention participants to provide natural language descriptions of aspects of their current state. It then leverages inference with pre-trained large language models (LLMs) to better align the policy of a base RL method with these state descriptions. To evaluate our method, we develop a novel physical activity intervention simulation environment that generates text-based state descriptions conditioned on latent state variables using an auxiliary LLM. We show that this approach has the potential to significantly improve the performance of online policy learning methods.