π€ AI Summary
A critical gap exists in publicly available multimodal datasets supporting AI modeling of individual and collective responses to public health campaigns. To address this, we introduce PHORECASTβthe first fine-grained, multimodal dataset designed specifically for health communication response forecasting. It captures both individual behavioral responses (e.g., clicks, shares, comments) and community-level engagement patterns (e.g., topic diffusion, sentiment clustering). Methodologically, PHORECAST uniquely enables heterogeneous response modeling across hierarchical levels (individual β community) and modalities (visual-textual content, textual posts, user demographic and behavioral profiles), integrating vision-language understanding, behavioral sequence prediction, and community dynamics analysis. The dataset comprises high-quality, rigorously annotated, and publicly accessible data, accompanied by standardized benchmark tasks. PHORECAST bridges a fundamental data void in socially aware health AI, substantially enhancing model capabilities in understanding complex public health discourse, improving predictive accuracy, and enabling scalable, personalized interventions.
π Abstract
Understanding how diverse individuals and communities respond to persuasive messaging holds significant potential for advancing personalized and socially aware machine learning. While Large Vision and Language Models (VLMs) offer promise, their ability to emulate nuanced, heterogeneous human responses, particularly in high stakes domains like public health, remains underexplored due in part to the lack of comprehensive, multimodal dataset. We introduce PHORECAST (Public Health Outreach REceptivity and CAmpaign Signal Tracking), a multimodal dataset curated to enable fine-grained prediction of both individuallevel behavioral responses and community-wide engagement patterns to health messaging. This dataset supports tasks in multimodal understanding, response prediction, personalization, and social forecasting, allowing rigorous evaluation of how well modern AI systems can emulate, interpret, and anticipate heterogeneous public sentiment and behavior. By providing a new dataset to enable AI advances for public health, PHORECAST aims to catalyze the development of models that are not only more socially aware but also aligned with the goals of adaptive and inclusive health communication