Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference

๐Ÿ“… 2025-02-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Visual large language models (VLLMs) exhibit ill-defined knowledge boundaries, leading to excessive retrieval-augmented generation (RAG) invocation and inflated computational costs. To address this, we propose a sampling-based inference method for knowledge boundary detectionโ€”first enabling transferable, lightweight boundary discrimination for VLLMs. Our approach automatically constructs a boundary identification dataset, performs light-weight fine-tuning of the VLLM, and integrates sampling-driven uncertainty estimation to trigger retrieval only when necessary. Crucially, it eliminates the need for model-specific boundary detectors and supports cross-model generalization. Evaluated on multi-source visual question answering tasks, our method reduces redundant RAG calls by up to 47% while maintaining or improving answer accuracy. This significantly enhances both the efficiency and practicality of RAG integration with VLLMs.

Technology Category

Application Category

๐Ÿ“ Abstract
Despite the advancements made in Visual Large Language Models (VLLMs), like text Large Language Models (LLMs), they have limitations in addressing questions that require real-time information or are knowledge-intensive. Indiscriminately adopting Retrieval Augmented Generation (RAG) techniques is an effective yet expensive way to enable models to answer queries beyond their knowledge scopes. To mitigate the dependence on retrieval and simultaneously maintain, or even improve, the performance benefits provided by retrieval, we propose a method to detect the knowledge boundary of VLLMs, allowing for more efficient use of techniques like RAG. Specifically, we propose a method with two variants that fine-tunes a VLLM on an automatically constructed dataset for boundary identification. Experimental results on various types of Visual Question Answering datasets show that our method successfully depicts a VLLM's knowledge boundary based on which we are able to reduce indiscriminate retrieval while maintaining or improving the performance. In addition, we show that the knowledge boundary identified by our method for one VLLM can be used as a surrogate boundary for other VLLMs. Code will be released at https://github.com/Chord-Chen-30/VLLM-KnowledgeBoundary
Problem

Research questions and friction points this paper is trying to address.

Detect knowledge boundary in VLLMs
Reduce dependency on retrieval techniques
Improve efficiency in Visual Question Answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sampling-Based Inference
Retrieval Augmented Generation
Knowledge Boundary Detection
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zhuo Chen
School of Information Science and Technology, ShanghaiTech University; Shanghai Engineering Research Center of Intelligent Vision and Imaging
X
Xinyu Wang
Institute for Intelligent Computing, Alibaba Group
Y
Yong Jiang
Institute for Intelligent Computing, Alibaba Group
Z
Zhen Zhang
Institute for Intelligent Computing, Alibaba Group
X
Xinyu Geng
Institute for Intelligent Computing, Alibaba Group
Pengjun Xie
Pengjun Xie
Alibaba Group
NLP/IR/ML
F
Fei Huang
Institute for Intelligent Computing, Alibaba Group
Kewei Tu
Kewei Tu
School of Information Science and Technology, ShanghaiTech University, China
Natural Language ProcessingMachine Learning