LLMs Meet Isolation Kernel: Lightweight, Learning-free Binary Embeddings for Fast Retrieval

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the substantial storage and retrieval overhead incurred by high-dimensional embeddings generated by large language models (LLMs), a challenge exacerbated by existing compression techniques that often compromise retrieval accuracy. To overcome this limitation, the authors propose IKE, a training-free binarization method that introduces Isolation Kernels into LLM embedding compression for the first time. IKE leverages ensembles of random partitions and binary encoding to produce compact binary representations, enabling highly efficient similarity search through bitwise operations. Evaluated across multiple text retrieval benchmarks, IKE achieves up to 16.7× faster retrieval and 16× memory savings compared to full-precision embeddings, while maintaining comparable or even superior retrieval accuracy.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have recently enabled remarkable progress in text representation. However, their embeddings are typically high-dimensional, leading to substantial storage and retrieval overhead. Although recent approaches such as Matryoshka Representation Learning (MRL) and Contrastive Sparse Representation (CSR) alleviate these issues to some extent, they still suffer from retrieval accuracy degradation. This paper proposes \emph{Isolation Kernel Embedding} or IKE, a learning-free method that transforms an LLM embedding into a binary embedding using Isolation Kernel (IK). IKE is an ensemble of diverse (random) partitions, enabling robust estimation of ideal kernel in the LLM embedding space, thus reducing retrieval accuracy loss as the ensemble grows. Lightweight and based on binary encoding, it offers low memory footprint and fast bitwise computation, lowering retrieval latency. Experiments on multiple text retrieval datasets demonstrate that IKE offers up to 16.7x faster retrieval and 16x lower memory usage than LLM embeddings, while maintaining comparable or better accuracy. Compared to CSR and other compression methods, IKE consistently achieves the best balance between retrieval efficiency and effectiveness.
Problem

Research questions and friction points this paper is trying to address.

large language models
high-dimensional embeddings
retrieval overhead
accuracy degradation
embedding compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Isolation Kernel
Binary Embedding
Learning-free
Efficient Retrieval
LLM Compression
🔎 Similar Papers
No similar papers found.
Z
Zhibo Zhang
National Key Laboratory for Novel Software Technology, Nanjing University
Y
Yang Xu
National Key Laboratory for Novel Software Technology, Nanjing University
Kai Ming Ting
Kai Ming Ting
Nanjing University
Machine LearningData Mining
Cam-Tu Nguyen
Cam-Tu Nguyen
Associate Professor of AI School, Nanjing University, China
Data MiningImage AnnotationText MiningMachine LearningGraphical Models