Mitigating Hallucinations in Multimodal Spatial Relations through Constraint-Aware Prompting

📅 2025-02-12

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Large Vision-Language Models (LVLMs) frequently generate hallucinated object locations and spatial relations—termed “spatial relation hallucinations”—when interpreting images. To address this, we propose the Constraint-Aware Prompting (CAP) framework, the first to explicitly embed formal spatial logic constraints into the prompting process. CAP jointly enforces internal coherence in multi-object spatial descriptions via bidirectional relational consistency modeling and spatial transitivity logic injection. Crucially, it requires no model fine-tuning, relying solely on prompt engineering for strong, interpretable constraint guidance. Evaluated on three mainstream spatial relation understanding benchmarks, CAP consistently outperforms all baselines, significantly suppressing hallucinations. Ablation studies validate the complementary effectiveness of bidirectional and transitivity constraints, and demonstrate their modular scalability when combined.

Technology Category

Application Category

📝 Abstract

Spatial relation hallucinations pose a persistent challenge in large vision-language models (LVLMs), leading to generate incorrect predictions about object positions and spatial configurations within an image. To address this issue, we propose a constraint-aware prompting framework designed to reduce spatial relation hallucinations. Specifically, we introduce two types of constraints: (1) bidirectional constraint, which ensures consistency in pairwise object relations, and (2) transitivity constraint, which enforces relational dependence across multiple objects. By incorporating these constraints, LVLMs can produce more spatially coherent and consistent outputs. We evaluate our method on three widely-used spatial relation datasets, demonstrating performance improvements over existing approaches. Additionally, a systematic analysis of various bidirectional relation analysis choices and transitivity reference selections highlights greater possibilities of our methods in incorporating constraints to mitigate spatial relation hallucinations.

Problem

Research questions and friction points this paper is trying to address.

Mitigate hallucinations in LVLMs

Improve spatial relation predictions

Enforce consistency with constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

constraint-aware prompting framework

bidirectional constraint enforcement

transitivity constraint integration

🔎 Similar Papers

Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models