A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing MLLM evaluation methods suffer from high redundancy and low efficiency. To address this, we propose a novel “one-interviewer–multiple-models” interview-style evaluation paradigm, inspired by human recruitment interviews. Our method comprises two stages—pre-interview and formal interview—and integrates dynamic judge-weight adjustment with adaptive question difficulty selection, thereby establishing an efficient and fair evaluation framework. Crucially, we model evaluation as a structured interactive process rather than static question sampling. Extensive experiments across multiple benchmarks demonstrate that our approach achieves strong correlation with full-scale evaluation using only ~30% of the questions: Pearson and Spearman correlation coefficients improve by 17.6% and 16.7%, respectively, significantly outperforming random sampling. This work introduces a principled, interaction-driven paradigm for efficient and reliable MLLM assessment.

Technology Category

Application Category

📝 Abstract

The rapid progress of Multi-Modal Large Language Models (MLLMs) has spurred the creation of numerous benchmarks. However, conventional full-coverage Question-Answering evaluations suffer from high redundancy and low efficiency. Inspired by human interview processes, we propose a multi-to-one interview paradigm for efficient MLLM evaluation. Our framework consists of (i) a two-stage interview strategy with pre-interview and formal interview phases, (ii) dynamic adjustment of interviewer weights to ensure fairness, and (iii) an adaptive mechanism for question difficulty-level chosen. Experiments on different benchmarks show that the proposed paradigm achieves significantly higher correlation with full-coverage results than random sampling, with improvements of up to 17.6% in PLCC and 16.7% in SRCC, while reducing the number of required questions. These findings demonstrate that the proposed paradigm provides a reliable and efficient alternative for large-scale MLLM benchmarking.

Problem

Research questions and friction points this paper is trying to address.

Addresses inefficiency in MLLM evaluation benchmarks

Reduces redundancy in multimodal model assessment

Proposes interview paradigm for scalable performance testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage interview strategy for efficiency

Dynamic interviewer weight adjustment for fairness

Adaptive question difficulty mechanism selection

🔎 Similar Papers

Surveying the MLLM Landscape: A Meta-Review of Current Surveys