🤖 AI Summary
Prior work on behavior change interventions is largely confined to text-only LLM interactions, lacking effective paradigms for synergistic LLM–UI collaboration in health coaching. Method: We propose a hybrid health coaching framework integrating LLM-driven dialogue with mature UI interaction, specifically targeting physical activity promotion. We construct a red-teaming–evaluated safety benchmark dataset and conduct a randomized controlled field trial (N=XX, four-week duration) to systematically assess the LLM’s capacity to shape positive psychological constructs—including self-compassion, activity enjoyment, and behavioral beliefs. Results: Both groups achieved recommended physical activity levels (+100%), but the LLM-augmented group demonstrated statistically significant improvements over controls across cognitive and affective dimensions. This study provides the first empirical validation of synergistic multimodal human–AI interaction in behavioral intervention, establishing a reproducible methodology and safety-aware evaluation paradigm for LLM-powered digital health.
📝 Abstract
Large language models (LLMs) offer novel opportunities to support health behavior change, yet existing work has narrowly focused on text-only interactions. Building on decades of HCI research demonstrating the effectiveness of UI-based interactions, we present Bloom, an application for physical activity promotion that integrates an LLM-based health coaching chatbot with established UI-based interactions. As part of Bloom's development, we conducted a redteaming evaluation and contribute a safety benchmark dataset. In a four-week randomized field study (N=54) comparing Bloom to a non-LLM control, we observed important shifts in psychological outcomes: participants in the LLM condition reported stronger beliefs that activity was beneficial, greater enjoyment, and more self-compassion. Both conditions significantly increased physical activity levels, doubling the proportion of participants meeting recommended weekly guidelines, though we observed no significant differences between conditions. Instead, our findings suggest that LLMs may be more effective at shifting mindsets that precede longer-term behavior change.