Improving Alignment in LVLMs with Debiased Self-Judgment

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Large vision-language models (LVLMs) suffer from hallucination and cross-domain safety risks due to insufficient visual–linguistic modality alignment. Method: We propose an endogenous, supervision-free debiased self-evaluation mechanism that leverages fine-grained, model-generated evaluation scores to guide decoding strategy optimization and preference tuning within an internal feedback loop, enabling autonomous alignment improvement. Contribution/Results: To our knowledge, this is the first work to embed debiased self-evaluation directly into the training pipeline—eliminating reliance on human annotations or external data, thereby enhancing scalability and safety. Experiments demonstrate consistent and significant improvements over existing alignment paradigms across multiple hallucination detection, safety evaluation, and multi-task benchmarks, while preserving—or even enhancing—general vision-language understanding and generation capabilities.

Technology Category

Application Category

📝 Abstract

The rapid advancements in Large Language Models (LLMs) and Large Visual-Language Models (LVLMs) have opened up new opportunities for integrating visual and linguistic modalities. However, effectively aligning these modalities remains challenging, often leading to hallucinations--where generated outputs are not grounded in the visual input--and raising safety concerns across various domains. Existing alignment methods, such as instruction tuning and preference tuning, often rely on external datasets, human annotations, or complex post-processing, which limit scalability and increase costs. To address these challenges, we propose a novel approach that generates the debiased self-judgment score, a self-evaluation metric created internally by the model without relying on external resources. This enables the model to autonomously improve alignment. Our method enhances both decoding strategies and preference tuning processes, resulting in reduced hallucinations, enhanced safety, and improved overall capability. Empirical results show that our approach significantly outperforms traditional methods, offering a more effective solution for aligning LVLMs.

Problem

Research questions and friction points this paper is trying to address.

Addressing modality misalignment causing hallucinations in LVLMs

Reducing reliance on external datasets for model alignment

Improving safety and capability through autonomous self-evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Debiased self-judgment score for alignment

Autonomous improvement without external resources

Enhances both decoding and preference tuning

🔎 Similar Papers

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning