Investigating the Association Between Text-Based Indications of Foodborne Illness from Yelp Reviews and New York City Health Inspection Outcomes (2023)

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the timeliness and coverage limitations of official health inspections in New York City by leveraging Yelp review text to detect early signals of foodborne illness. We propose the Hierarchical Sigmoid Attention Network (HSAN)—the first text classification architecture explicitly designed for fine-grained modeling of restaurant-level health risks—and integrate it with spatial analysis and statistical testing to assess the geographic association between user-generated content and official health ratings at the census tract level. Experiments demonstrate that HSAN effectively extracts health risk signals from large-scale unstructured reviews. However, its predictions exhibit no statistically significant spatial correlation with either official inspection scores or the density of restaurants receiving “C” grades, revealing a fundamental decoupling between social media–derived signals and traditional regulatory metrics. This work establishes a novel methodological framework for passive public health surveillance and highlights critical challenges in cross-source data fusion for urban health monitoring.

Technology Category

Application Category

📝 Abstract
Foodborne illnesses are gastrointestinal conditions caused by consuming contaminated food. Restaurants are critical venues to investigate outbreaks because they share sourcing, preparation, and distribution of foods. Public reporting of illness via formal channels is limited, whereas social media platforms host abundant user-generated content that can provide timely public health signals. This paper analyzes signals from Yelp reviews produced by a Hierarchical Sigmoid Attention Network (HSAN) classifier and compares them with official restaurant inspection outcomes issued by the New York City Department of Health and Mental Hygiene (NYC DOHMH) in 2023. We evaluate correlations at the Census tract level, compare distributions of HSAN scores by prevalence of C-graded restaurants, and map spatial patterns across NYC. We find minimal correlation between HSAN signals and inspection scores at the tract level and no significant differences by number of C-graded restaurants. We discuss implications and outline next steps toward address-level analyses.
Problem

Research questions and friction points this paper is trying to address.

Detecting foodborne illness signals from Yelp reviews
Comparing social media signals with official health inspections
Evaluating spatial correlation between digital and regulatory data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Sigmoid Attention Network classifier analyzes Yelp reviews
Compares social media signals with official health inspection outcomes
Evaluates correlations and spatial patterns at Census tract level
🔎 Similar Papers
No similar papers found.
E
Eden Shaveet
Columbia University, Department of Computer Science, New York, NY, USA
C
Crystal Su
Columbia University, Department of Computer Science, New York, NY, USA
Daniel Hsu
Daniel Hsu
Columbia University
Algorithmic statisticslearning theorymachine learning
Luis Gravano
Luis Gravano
Columbia University, Department of Computer Science, New York, NY, USA