Unlocking Open-Set Language Accessibility in Vision Models

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the challenge of enabling pre-trained visual classifiers to support open-set textual queries for enhanced interpretability—without compromising original performance or inference logic. We propose a lightweight, label-agnostic zero-shot text interface that aligns visual features with natural language concepts via feature-space alignment and semantic projection, requiring no architectural modification, retraining, or additional annotations. To our knowledge, this is the first method achieving distribution-preserving zero-shot decoding from visual features to free-form text, enabling concept bottleneck modeling and plug-and-play textual interaction. We validate generalizability across 40 mainstream visual classifiers and demonstrate state-of-the-art performance on two downstream tasks: interpretability analysis and cross-modal decoding—significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

Visual classifiers offer high-dimensional feature representations that are challenging to interpret and analyze. Text, in contrast, provides a more expressive and human-friendly interpretable medium for understanding and analyzing model behavior. We propose a simple, yet powerful method for reformulating any visual classifier so that it can be accessed with open-set text queries without compromising its original performance. Our approach is label-free, efficient, and preserves the underlying classifier's distribution and reasoning processes. We thus unlock several text-based interpretability applications for any classifier. We apply our method on 40 visual classifiers and demonstrate two primary applications: 1) building both label-free and zero-shot concept bottleneck models and therefore converting any classifier to be inherently-interpretable and 2) zero-shot decoding of visual features into natural language. In both applications, we achieve state-of-the-art results, greatly outperforming existing works. Our method enables text approaches for interpreting visual classifiers.

Problem

Research questions and friction points this paper is trying to address.

Enables open-set text queries for visual classifiers.

Converts classifiers to be inherently-interpretable without labels.

Decodes visual features into natural language zero-shot.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-set text queries for visual classifiers

Label-free, efficient classifier reformulation

Zero-shot visual feature decoding into language

🔎 Similar Papers

What Is Missing in Multilingual Visual Reasoning and How to Fix It