Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Matryoshka Representation Learning (MRL) requires full model retraining for each embedding dimension and suffers from sharp performance degradation in short embeddings. Method: This paper proposes an adaptive representation learning framework that eliminates retraining by introducing sparse coding into adaptive representation learning—specifically, Contrastive Sparse Representations (CSR)—enabling multi-granularity semantic preservation and zero-cost embedding-length adjustment within a high-dimensional, sparsely activated feature space. The approach integrates a lightweight autoencoder with task-aware contrastive learning to jointly optimize semantic fidelity and computational efficiency. Results: Evaluated on image, text, and multimodal retrieval benchmarks, the method consistently outperforms MRL: it achieves higher retrieval accuracy, faster inference speed, and reduces training time to under 10% of MRL’s cost—effectively alleviating the inherent trade-off between accuracy and efficiency in large-scale deployment scenarios.

Technology Category

Application Category

📝 Abstract

Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation (CSR), a method that sparsifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of both accuracy and retrieval speed-often by large margins-while also cutting training time to a fraction of that required by MRL. Our results establish sparse coding as a powerful paradigm for adaptive representation learning in real-world applications where efficiency and fidelity are both paramount. Code is available at https://github.com/neilwen987/CSR_Adaptive_Rep

Problem

Research questions and friction points this paper is trying to address.

Achieving adaptive representation with minimal overhead and high fidelity.

Sparsifying pre-trained embeddings into a high-dimensional, selectively activated feature space.

Improving accuracy, retrieval speed, and reducing training time compared to MRL.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse coding for adaptive representation

Contrastive Sparse Representation (CSR) method

Lightweight autoencoding and contrastive objectives

🔎 Similar Papers

No similar papers found.