MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering

πŸ“… 2026-01-05
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge in continual visual question answering (Continual VQA) of simultaneously preserving prior knowledge, adapting to new information, and maintaining robust feature representations. To this end, the authors propose a synergistic mechanism that integrates multimodal information, prototype-based memory management, and global noise filtering. The approach employs adaptive memory allocation to dynamically optimize knowledge storage and suppresses cross-task interference at the feature level, thereby enabling efficient knowledge acquisition, retention, and compositional generalization. Evaluated across 10 continual VQA tasks, the model achieves average accuracies of 43.38% on standard tasks and 42.53% on novel compositional tasks, with remarkably low average forgetting rates of 2.32% and 3.60%, respectively, substantially outperforming existing methods.

Technology Category

Application Category

πŸ“ Abstract
Visual Question Answering (VQA) requires models to reason over multimodal information, combining visual and textual data. With the development of continual learning, significant progress has been made in retaining knowledge and adapting to new information in the VQA domain. However, current methods often struggle with balancing knowledge retention, adaptation, and robust feature representation. To address these challenges, we propose a novel framework with adaptive memory allocation and global noise filtering called MacVQA for visual question answering. MacVQA fuses visual and question information while filtering noise to ensure robust representations, and employs prototype-based memory allocation to optimize feature quality and memory usage. These designs enable MacVQA to balance knowledge acquisition, retention, and compositional generalization in continual VQA learning. Experiments on ten continual VQA tasks show that MacVQA outperforms existing baselines, achieving 43.38% average accuracy and 2.32% average forgetting on standard tasks, and 42.53% average accuracy and 3.60% average forgetting on novel composition tasks.
Problem

Research questions and friction points this paper is trying to address.

Continual Learning
Visual Question Answering
Knowledge Retention
Robust Feature Representation
Compositional Generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive memory allocation
global noise filtering
continual visual question answering
prototype-based memory
compositional generalization
πŸ”Ž Similar Papers
No similar papers found.
Zhifei Li
Zhifei Li
Research Scientist at Google
machine translationnatural language processingmachine learningwireless networks
Yiran Wang
Yiran Wang
Huazhong University of Science and Technology
Computer Vision3D VisionDepth PredictionVideo Comprehension
C
Chenyi Xiong
School of Computer Science, Hubei University, Wuhan 430062, China
Y
Yujing Xia
School of Computer Science, Hubei University, Wuhan 430062, China
X
Xiaoju Hou
Institute of Vocational Education, Guangdong Industry Polytechnic University, Guangzhou 510300, China
Y
Yue Zhao
Shandong Police College, Ji’nan 250200, China
M
Miao Zhang
School of Computer Science, Hubei University, Wuhan 430062, China
K
Kui Xiao
School of Computer Science, Hubei University, Wuhan 430062, China
B
Bing Yang
School of Computer Science, Hubei University, Wuhan 430062, China