LLaVA-Critic: Learning to Evaluate Multimodal Models

📅 2024-10-03

🏛️ arXiv.org

📈 Citations: 11

✨ Influential: 3

career value

227K/year

🤖 AI Summary

This work addresses the challenges of automated evaluation and alignment optimization for multimodal large models (LMMs). To this end, we introduce LLaVA-Critic—the first open-source, general-purpose multimodal evaluation model—and propose the novel “multimodal self-critique” paradigm, enabling unified scoring and preference modeling across diverse tasks and dimensions (e.g., factual accuracy, relevance, visual consistency). LLaVA-Critic integrates a vision encoder with a large language model and is fine-tuned on high-quality critique instructions, trained on multi-source, multi-criteria evaluation datasets. Experiments demonstrate that LLaVA-Critic matches or surpasses closed-source models such as GPT-4V on standard benchmarks including MMBench, MME, and POPE. When deployed as a reward model in RLHF, it significantly improves the stability and alignment performance of preference learning. Overall, this work establishes a new paradigm for autonomous LMM iteration and superhuman feedback mechanisms.

Technology Category

Application Category

📝 Abstract

We introduce LLaVA-Critic, the first open-source large multimodal model (LMM) designed as a generalist evaluator to assess performance across a wide range of multimodal tasks. LLaVA-Critic is trained using a high-quality critic instruction-following dataset that incorporates diverse evaluation criteria and scenarios. Our experiments demonstrate the model's effectiveness in two key areas: (1) LMM-as-a-Judge, where LLaVA-Critic provides reliable evaluation scores, performing on par with or surpassing GPT models on multiple evaluation benchmarks; and (2) Preference Learning, where it generates reward signals for preference learning, enhancing model alignment capabilities. This work underscores the potential of open-source LMMs in self-critique and evaluation, setting the stage for future research into scalable, superhuman alignment feedback mechanisms for LMMs.

Problem

Research questions and friction points this paper is trying to address.

Develops LLaVA-Critic for multimodal model evaluation

Trains using diverse criteria for reliable performance assessment

Enhances model alignment through preference learning feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source large multimodal model for evaluation

High-quality critic instruction-following dataset training

Enhances model alignment via preference learning

🔎 Similar Papers

No similar papers found.