FastRM: An efficient and automatic explainability framework for multimodal generative models

📅 2024-12-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

To address the unreliability and lack of real-time interpretability in outputs generated by large vision-language models (LVLMs), this paper introduces FastRM—the first lightweight, plug-and-play interpretability framework that requires no architectural modification or retraining of the original LVLM. FastRM leverages internal feature distillation and a lightweight surrogate network to automatically and instantaneously predict vision–language relevance maps, enabling both quantitative confidence estimation and qualitative attribution visualization. Compared to gradient-based backpropagation methods, FastRM reduces computational latency by 99.8% and memory footprint by 44.4%. By decoupling interpretability from model-specific training or inference overhead, FastRM significantly enhances the practicality and deployment efficiency of multimodal interpretable AI in real-world scenarios, establishing a new paradigm for trustworthy LVLM applications.

Technology Category

Application Category

📝 Abstract

Large Vision Language Models (LVLMs) have demonstrated remarkable reasoning capabilities over textual and visual inputs. However, these models remain prone to generating misinformation. Identifying and mitigating ungrounded responses is crucial for developing trustworthy AI. Traditional explainability methods such as gradient-based relevancy maps, offer insight into the decision process of models, but are often computationally expensive and unsuitable for real-time output validation. In this work, we introduce FastRM, an efficient method for predicting explainable Relevancy Maps of LVLMs. Furthermore, FastRM provides both quantitative and qualitative assessment of model confidence. Experimental results demonstrate that FastRM achieves a 99.8% reduction in computation time and a 44.4% reduction in memory footprint compared to traditional relevancy map generation. FastRM allows explainable AI to be more practical and scalable, thereby promoting its deployment in real-world applications and enabling users to more effectively evaluate the reliability of model outputs.

Problem

Research questions and friction points this paper is trying to address.

Identify misinformation in multimodal models

Reduce computation time for explainability

Enable real-time model output validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient Relevancy Maps prediction

Quantitative and qualitative confidence assessment

Reduced computation time and memory footprint

🔎 Similar Papers

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models