VISLIX: An XAI Framework for Validating Vision Models with Slice Discovery and Analysis

📅 2025-05-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In safety-critical applications such as autonomous driving, validating object detection models via data slicing remains challenging due to reliance on metadata, insufficient concept-level annotations, high expert cognitive overhead, and lack of interactive hypothesis-testing mechanisms. Method: This paper introduces the first XAI framework integrating multimodal foundation models (CLIP and SAM) with visual analytics. It enables unsupervised visual slice discovery without requiring auxiliary image metadata or manual annotations; employs natural language generation (NLG) to produce interpretable, natural-language insights; and supports expert-driven, natural-language-based interactive hypothesis validation. Results: Evaluated across three real-world use cases and an expert study, the framework significantly reduces cognitive load, improves efficiency in identifying vulnerable slices, and comprehensively supports pre-deployment reliability assessment of object detection models.

Technology Category

Application Category

📝 Abstract
Real-world machine learning models require rigorous evaluation before deployment, especially in safety-critical domains like autonomous driving and surveillance. The evaluation of machine learning models often focuses on data slices, which are subsets of the data that share a set of characteristics. Data slice finding automatically identifies conditions or data subgroups where models underperform, aiding developers in mitigating performance issues. Despite its popularity and effectiveness, data slicing for vision model validation faces several challenges. First, data slicing often needs additional image metadata or visual concepts, and falls short in certain computer vision tasks, such as object detection. Second, understanding data slices is a labor-intensive and mentally demanding process that heavily relies on the expert's domain knowledge. Third, data slicing lacks a human-in-the-loop solution that allows experts to form hypothesis and test them interactively. To overcome these limitations and better support the machine learning operations lifecycle, we introduce VISLIX, a novel visual analytics framework that employs state-of-the-art foundation models to help domain experts analyze slices in computer vision models. Our approach does not require image metadata or visual concepts, automatically generates natural language insights, and allows users to test data slice hypothesis interactively. We evaluate VISLIX with an expert study and three use cases, that demonstrate the effectiveness of our tool in providing comprehensive insights for validating object detection models.
Problem

Research questions and friction points this paper is trying to address.

Validating vision models without image metadata or visual concepts
Reducing labor-intensive slice analysis relying on expert knowledge
Enabling interactive hypothesis testing for data slice evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses foundation models for vision slice analysis
Generates natural language insights automatically
Enables interactive data slice hypothesis testing