LLMs Meet Isolation Kernel: Lightweight, Learning-free Binary Embeddings for Fast Retrieval

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the substantial storage and retrieval overhead incurred by high-dimensional embeddings generated by large language models (LLMs), a challenge exacerbated by existing compression techniques that often compromise retrieval accuracy. To overcome this limitation, the authors propose IKE, a training-free binarization method that introduces Isolation Kernels into LLM embedding compression for the first time. IKE leverages ensembles of random partitions and binary encoding to produce compact binary representations, enabling highly efficient similarity search through bitwise operations. Evaluated across multiple text retrieval benchmarks, IKE achieves up to 16.7× faster retrieval and 16× memory savings compared to full-precision embeddings, while maintaining comparable or even superior retrieval accuracy.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have recently enabled remarkable progress in text representation. However, their embeddings are typically high-dimensional, leading to substantial storage and retrieval overhead. Although recent approaches such as Matryoshka Representation Learning (MRL) and Contrastive Sparse Representation (CSR) alleviate these issues to some extent, they still suffer from retrieval accuracy degradation. This paper proposes \emph{Isolation Kernel Embedding} or IKE, a learning-free method that transforms an LLM embedding into a binary embedding using Isolation Kernel (IK). IKE is an ensemble of diverse (random) partitions, enabling robust estimation of ideal kernel in the LLM embedding space, thus reducing retrieval accuracy loss as the ensemble grows. Lightweight and based on binary encoding, it offers low memory footprint and fast bitwise computation, lowering retrieval latency. Experiments on multiple text retrieval datasets demonstrate that IKE offers up to 16.7x faster retrieval and 16x lower memory usage than LLM embeddings, while maintaining comparable or better accuracy. Compared to CSR and other compression methods, IKE consistently achieves the best balance between retrieval efficiency and effectiveness.

Problem

Research questions and friction points this paper is trying to address.

large language models

high-dimensional embeddings

retrieval overhead

accuracy degradation

embedding compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Isolation Kernel

Binary Embedding

Learning-free