🤖 AI Summary
This study investigates how the tone and structure of linguistic prompts systematically induce hallucinations in vision-language models (VLMs) when critical visual information is absent. To this end, we introduce Ghost-100, a synthetic dataset specifically designed for studying absence-based hallucinations, and propose a five-level prompt intensity framework—ranging from neutral queries to toxic instructions—to systematically evaluate open-source VLMs including MiniCPM-V 2.6-8B, Qwen2-VL-7B, and Qwen3-VL-8B. Our analysis reveals, for the first time, a non-monotonic relationship between prompt intensity and hallucination rate: surprisingly, higher-intensity prompts yield lower hallucination rates, suggesting that current safety alignment mechanisms are more adept at detecting semantic hostility than structural coercion. This exposes a distinct vulnerability in model behavior under compliance pressure. We further propose a structured prompt stress analysis framework to advance the understanding and mitigation of VLM hallucinations.
📝 Abstract
Vision-Language Models (VLMs) are increasingly used in safety-critical applications that require reliable visual grounding. However, these models often hallucinate details that are not present in the image to satisfy user prompts. While recent datasets and benchmarks have been introduced to evaluate systematic hallucinations in VLMs, many hallucination behaviors remain insufficiently characterized. In particular, prior work primarily focuses on object presence or absence, leaving it unclear how prompt phrasing and structural constraints can systematically induce hallucinations. In this paper, we investigate how different forms of prompt pressure influence hallucination behavior. We introduce Ghost-100, a procedurally generated dataset of synthetic scenes in which key visual details are deliberately removed, enabling controlled analysis of absence-based hallucinations. Using a structured 5-Level Prompt Intensity Framework, we vary prompts from neutral queries to toxic demands and rigid formatting constraints. We evaluate three representative open-weight VLMs: MiniCPM-V 2.6-8B, Qwen2-VL-7B, and Qwen3-VL-8B. Across all three models, hallucination rates do not increase monotonically with prompt intensity. All models exhibit reductions at higher intensity levels at different thresholds, though not all show sustained reduction under maximum coercion. These results suggest that current safety alignment is more effective at detecting semantic hostility than structural coercion, revealing model-specific limitations in handling compliance pressure. Our dataset is available at: https://github.com/bli1/tone-matters