Learning Encoding-Decoding Direction Pairs to Unveil Concepts of Influence in Deep Vision Networks

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the weak concept interpretability and difficulty in quantifying feature influence in deep visual networks. We propose an encoder-decoder directional pair learning framework that jointly learns semantic concept decoding directions (for concept identification) and encoding directions (for influence quantification) via directional clustering and probabilistic signal vector modeling. To overcome limitations of conventional reconstruction-driven approaches, we introduce uncertainty region alignment. On synthetic data, our method precisely recovers ground-truth directional pairs; on real-world benchmarks including ImageNet, it extracts unambiguous, human-interpretable concept directions, achieving significantly superior decoding performance over unsupervised baselines. Encoding directions are validated via activation maximization and successfully enable model diagnosis and targeted intervention.

Technology Category

Application Category

📝 Abstract
Empirical evidence shows that deep vision networks represent concepts as directions in latent space, vectors we call concept embeddings. Each concept has a latent factor-a scalar-indicating its presence in an input patch. For a given patch, multiple latent factors are encoded into a compact representation by linearly combining concept embeddings, with the factors as coefficients. Since these embeddings enable such encoding, we call them encoding directions. A latent factor can be recovered via the inner product with a filter, a vector we call a decoding direction. These encoding-decoding direction pairs are not directly accessible, but recovering them helps open the black box of deep networks, enabling understanding, debugging, and improving models. Decoder directions attribute meaning to latent codes, while encoding directions assess concept influence on predictions, with both enabling model correction by unlearning irrelevant concepts. Unlike prior matrix decomposition, autoencoder, or dictionary learning methods that rely on feature reconstruction, we propose a new perspective: decoding directions are identified via directional clustering of activations, and encoding directions are estimated with signal vectors under a probabilistic view. We further leverage network weights through a novel technique, Uncertainty Region Alignment, which reveals interpretable directions affecting predictions. Our analysis shows that (a) on synthetic data, our method recovers ground-truth direction pairs; (b) on real data, decoding directions map to monosemantic, interpretable concepts and outperform unsupervised baselines; and (c) signal vectors faithfully estimate encoding directions, validated via activation maximization. Finally, we demonstrate applications in understanding global model behavior, explaining individual predictions, and intervening to produce counterfactuals or correct errors.
Problem

Research questions and friction points this paper is trying to address.

Unveils encoding-decoding direction pairs in deep vision networks
Identifies interpretable concepts influencing model predictions and errors
Enables model correction by unlearning irrelevant concepts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies decoding directions via directional clustering of activations
Estimates encoding directions using signal vectors probabilistically
Aligns uncertainty regions to reveal interpretable prediction-influencing directions
🔎 Similar Papers
No similar papers found.