Cooking Up Risks: Benchmarking and Reducing Food Safety Risks in Large Language Models

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical safety gap in current large language models (LLMs) when applied to food safety, where inadequate alignment renders them vulnerable to jailbreak attacks that can produce harmful instructions with real-world consequences. To systematically evaluate this risk, the authors introduce FoodGuardBench—the first comprehensive benchmark grounded in FDA guidelines—comprising 3,339 fine-grained annotated queries. Leveraging adversarial methods such as AutoDAN and PAP, the work identifies three key vulnerabilities in existing models within this domain. In response, the authors propose FoodGuard-4B, a specialized defense model that employs domain-adaptive fine-tuning to significantly enhance detection and blocking of malicious inputs, thereby substantially reducing the likelihood of generating hazardous content.
📝 Abstract
Large language models (LLMs) are increasingly deployed for everyday tasks, including food preparation and health-related guidance. However, food safety remains a high-stakes domain where inaccurate or misleading information can cause severe real-world harm. Despite these risks, current LLMs and safety guardrails lack rigorous alignment tailored to domain-specific food hazards. To address this gap, we introduce FoodGuardBench, the first comprehensive benchmark comprising 3,339 queries grounded in FDA guidelines, designed to evaluate the safety and robustness of LLMs. By constructing a taxonomy of food safety principles and employing representative jailbreak attacks (e.g., AutoDAN and PAP), we systematically evaluate existing LLMs and guardrails. Our evaluation results reveal three critical vulnerabilities: First, current LLMs exhibit sparse safety alignment in the food-related domain, easily succumbing to a few canonical jailbreak strategies. Second, when compromised, LLMs frequently generate actionable yet harmful instructions, inadvertently empowering malicious actors and posing tangible risks. Third, existing LLM-based guardrails systematically overlook these domain-specific threats, failing to detect a substantial volume of malicious inputs. To mitigate these vulnerabilities, we introduce FoodGuard-4B, a specialized guardrail model fine-tuned on our datasets to safeguard LLMs within food-related domains.
Problem

Research questions and friction points this paper is trying to address.

food safety
large language models
safety alignment
domain-specific risks
harmful instructions
Innovation

Methods, ideas, or system contributions that make the work stand out.

FoodGuardBench
food safety
LLM guardrails
jailbreak attacks
domain-specific alignment
🔎 Similar Papers
No similar papers found.