One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering

📅 2024-11-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

To address catastrophic forgetting in continual visual question answering (VQA), where vision-language models (VLMs) rapidly lose prior knowledge upon learning new tasks, this paper proposes the first fully data-free continual learning method. Our approach requires no historical data storage nor external models; instead, it leverages the intrinsic language generation capability of a single VLM to automatically synthesize task-relevant questions on novel visual inputs, thereby constructing cross-modal pseudo-review samples. To mitigate distributional shift in these synthetic samples, we introduce a pseudo-review balancing module that jointly employs question meta-statistical modeling and K-means unsupervised clustering to align generated distributions with original task distributions. This work establishes the first purely VLM-driven, data-free continual VQA framework. On the VQACL-VQAv2 and CLOVE-function benchmarks, it substantially outperforms all existing data-free baselines and approaches the performance of strong oracle methods that retain access to historical data.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets. However, these models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks. As an effective remedy to mitigate catastrophic forgetting, rehearsal strategy uses the data of past tasks upon learning new task. However, such strategy incurs the need of storing past data, which might not be feasible due to hardware constraints or privacy concerns. In this work, we propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models, to produce pseudo-rehearsal data for addressing continual VQA. Our proposal, named as GaB, generates pseudo-rehearsal data by posing previous task questions on new task data. Yet, despite being effective, the distribution of generated questions skews towards the most frequently posed questions due to the limited and task-specific training data. To mitigate this issue, we introduce a pseudo-rehearsal balancing module that aligns the generated data towards the ground-truth data distribution using either the question meta-statistics or an unsupervised clustering method. We evaluate our proposed method on two recent benchmarks, ie VQACL-VQAv2 and CLOVE-function benchmarks. GaB outperforms all the data-free baselines with substantial improvement in maintaining VQA performance across evolving tasks, while being on-par with methods with access to the past data.

Problem

Research questions and friction points this paper is trying to address.

Mitigates catastrophic forgetting in continual VQA tasks

Generates pseudo-rehearsal data without storing past data

Balances generated data to match ground-truth distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-free method using VLM for pseudo-rehearsal data

Generates questions from new task data for rehearsal

Balances data distribution using meta-statistics or clustering

🔎 Similar Papers

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions

2024-02-20arXiv.orgCitations: 41

Bosch Group

bangalore, IN

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)