InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing benchmarks inadequately evaluate the human–AI interaction intelligence of large multimodal models (LMMs), particularly their ability to dynamically revise outputs in response to human feedback. Method: We propose InterFeedback—the first autonomous evaluation framework for interaction intelligence—featuring an interactive evaluation paradigm, an automated assessment system generalizable across arbitrary LMMs, the dual-modal benchmark InterFeedback-Bench, and the human-validated set InterFeedback-Human. Our methodology integrates interactive prompt engineering, feedback-response modeling, multi-turn trajectory analysis, and cross-dataset consistency evaluation. Contribution/Results: Experiments reveal that state-of-the-art LMMs—including OpenAI-o1—achieve less than 50% success rate in correctly revising outputs based on human feedback, exposing a critical bottleneck in interaction intelligence. InterFeedback establishes a novel, quantifiable paradigm and foundational toolkit for rigorously assessing and iteratively improving LMMs’ interactive capabilities.

Technology Category

Application Category

📝 Abstract
Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users which is vital for developing general-purpose AI assistants. We design InterFeedback, an interactive framework, which can be applied to any LMM and dataset to assess this ability autonomously. On top of this, we introduce InterFeedback-Bench which evaluates interactive intelligence using two representative datasets, MMMU-Pro and MathVerse, to test 10 different open-source LMMs. Additionally, we present InterFeedback-Human, a newly collected dataset of 120 cases designed for manually testing interactive performance in leading models such as OpenAI-o1 and Claude-3.5-Sonnet. Our evaluation results show that even state-of-the-art LMM (like OpenAI-o1) can correct their results through human feedback less than 50%. Our findings point to the need for methods that can enhance the LMMs' capability to interpret and benefit from feedback.
Problem

Research questions and friction points this paper is trying to address.

Assessing interactive intelligence in LMMs
Developing feedback interpretation methods
Evaluating LMMs with human feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive framework for LMMs
Human feedback evaluation datasets
Enhancing LMMs with feedback interpretation
🔎 Similar Papers
No similar papers found.
Henry Hengyuan Zhao
Henry Hengyuan Zhao
Ph.D. student at National University of Singapore
Multimodal ReasoningAI AgentHuman-AI Interaction
W
Wenqi Pei
Show Lab, National University of Singapore
Y
Yifei Tao
Show Lab, National University of Singapore
Haiyang Mei
Haiyang Mei
National University of Singapore, Dalian University of Technology, ETH Zurich
Computer VisionNeuroinformatics
M
Mike Zheng Shou
Show Lab, National University of Singapore