Revis: Sparse Latent Steering to Mitigate Object Hallucination in Large Vision-Language Models

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses object hallucination in large vision-language models, a phenomenon often arising from the entanglement of visual and textual representations in deep layers. To mitigate this issue, the authors propose REVIS, a training-free intervention framework that leverages the geometric structure of the latent space to orthogonally project visual information at specific depths and apply sparse interventions to precisely recover occluded visual signals. By incorporating a depth-specific calibration mechanism, REVIS locates and rectifies the sources of hallucination with minimal computational overhead. Evaluated on standard benchmarks, the method reduces object hallucination rates by approximately 19% while preserving the model’s general reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Despite the advanced capabilities of Large Vision-Language Models (LVLMs), they frequently suffer from object hallucination. One reason is that visual features and pretrained textual representations often become intertwined in the deeper network layers. To address this, we propose REVIS, a training-free framework designed to explicitly re-activate this suppressed visual information. Rooted in latent space geometry, REVIS extracts the pure visual information vector via orthogonal projection and employs a calibrated strategy to perform sparse intervention only at the precise depth where suppression occurs. This surgical approach effectively restores visual information with minimal computational cost. Empirical evaluations on standard benchmarks demonstrate that REVIS reduces object hallucination rates by approximately 19% compared to state-of-the-art baselines, while preserving general reasoning capabilities.

Problem

Research questions and friction points this paper is trying to address.

object hallucination

Large Vision-Language Models

visual features

textual representations

latent space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Latent Steering

Object Hallucination

Orthogonal Projection