OAD-Promoter: Enhancing Zero-shot VQA using Large Language Models with Object Attribute Description

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from severe language bias and poor out-of-distribution (OOD) generalization in zero-shot visual question answering (VQA). To address this, we propose OAD-Promoter, a novel framework that introduces Object Attribute Descriptions (OADs) to explicitly disentangle linguistic priors. It integrates global scene understanding with local object-level visual semantics via three synergistic components: object-focused exemplar generation, memory-augmented knowledge retrieval, and attribute-aware prompting. By leveraging fine-grained attribute information, OAD-Promoter mitigates reliance on textual shortcuts and enhances cross-domain robustness. Extensive experiments on multiple zero-shot and few-shot VQA benchmarks demonstrate that our method consistently surpasses state-of-the-art approaches, achieving an average accuracy improvement of 4.2% and up to 12.7% gain in OOD transfer performance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have become a crucial tool in Visual Question Answering (VQA) for handling knowledge-intensive questions in few-shot or zero-shot scenarios. However, their reliance on massive training datasets often causes them to inherit language biases during the acquisition of knowledge. This limitation imposes two key constraints on existing methods: (1) LLM predictions become less reliable due to bias exploitation, and (2) despite strong knowledge reasoning capabilities, LLMs still struggle with out-of-distribution (OOD) generalization. To address these issues, we propose Object Attribute Description Promoter (OAD-Promoter), a novel approach for enhancing LLM-based VQA by mitigating language bias and improving domain-shift robustness. OAD-Promoter comprises three components: the Object-concentrated Example Generation (OEG) module, the Memory Knowledge Assistance (MKA) module, and the OAD Prompt. The OEG module generates global captions and object-concentrated samples, jointly enhancing visual information input to the LLM and mitigating bias through complementary global and regional visual cues. The MKA module assists the LLM in handling OOD samples by retrieving relevant knowledge from stored examples to support questions from unseen domains. Finally, the OAD Prompt integrates the outputs of the preceding modules to optimize LLM inference. Experiments demonstrate that OAD-Promoter significantly improves the performance of LLM-based VQA methods in few-shot or zero-shot settings, achieving new state-of-the-art results.
Problem

Research questions and friction points this paper is trying to address.

Mitigating language bias in LLMs for Visual Question Answering tasks
Improving out-of-distribution generalization in zero-shot VQA scenarios
Enhancing visual information processing through object attribute descriptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-concentrated Example Generation enhances visual input
Memory Knowledge Assistance retrieves knowledge for OOD samples
OAD Prompt integrates modules to optimize LLM inference
🔎 Similar Papers
No similar papers found.
Q
Quanxing Xu
School of Computer Science and Engineering, Macau University of Science and Technology, Macau SAR, China
L
Ling Zhou
School of Computer Science and Engineering, Macau University of Science and Technology, Macau SAR, China
F
Feifei Zhang
School of Computer Science and Engineering, Tianjin University of Technology, Tianjin, China
Jinyu Tian
Jinyu Tian
Macau University of Science and Technology
Adversarial Machine Learning
Rubing Huang
Rubing Huang
Macau University of Science and Technology
AI for Software EngineeringSoftware Engineering for AISoftware TestingAI Applications