Interpret and Control Dense Retrieval with Sparse Latent Features

📅 2024-10-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

To address the poor interpretability and limited controllability of dense retrieval models, this paper proposes a latent-space modeling approach based on sparse autoencoders (SAEs), coupled with a retrieval-oriented contrastive loss that learns semantically faithful and task-effective sparse latent features. Our method enables, for the first time, fine-grained, controllable editing of dense retrieval behavior—e.g., steering results along dimensions such as topic, recency, or sentiment—while preserving high interpretability and strong controllability. On benchmarks including MSMARCO, reconstructed vectors retain over 99% of the original dense representation’s retrieval accuracy. Empirical results demonstrate that the learned sparse features exhibit clear semantic correspondence with human-understandable concepts and support precise, targeted interventions in the latent space. This work bridges the gap between dense retrieval performance and transparent, user-controllable behavior.

Technology Category

Application Category

📝 Abstract

Dense embeddings deliver strong retrieval performance but often lack interpretability and controllability. This paper introduces a novel approach using sparse autoencoders (SAE) to interpret and control dense embeddings via the learned latent sparse features. Our key contribution is the development of a retrieval-oriented contrastive loss, which ensures the sparse latent features remain effective for retrieval tasks and thus meaningful to interpret. Experimental results demonstrate that both the learned latent sparse features and their reconstructed embeddings retain nearly the same retrieval accuracy as the original dense vectors, affirming their faithfulness. Our further examination of the sparse latent space reveals interesting features underlying the dense embeddings and we can control the retrieval behaviors via manipulating the latent sparse features, for example, prioritizing documents from specific perspectives in the retrieval results.

Problem

Research questions and friction points this paper is trying to address.

Interpret dense embeddings' latent features

Control retrieval using sparse autoencoders

Maintain retrieval accuracy with interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoders interpret dense embeddings

Contrastive loss ensures retrieval effectiveness

Manipulate latent features for controlled retrieval

🔎 Similar Papers

No similar papers found.