The Curious Case of High-Dimensional Indexing as a File Structure: A Case Study of eCP-FS

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional approximate nearest neighbor (ANN) indexes suffer from redundancy and high search overhead when serialized into generic file formats. Method: We propose eCP-FS—the first disk-based ANN indexing system that directly models the index as a human-readable, program-parsable filesystem structure, replacing conventional serialization with filesystem abstractions. Built upon the eCP indexing algorithm and standard filesystem libraries, eCP-FS enables cross-language interoperability and transparent access. Contribution/Results: Experiments show eCP-FS achieves minimal memory footprint under memory-constrained settings, with pronounced advantages in multi-index coexistence scenarios; it remains competitive even with ample memory. This work validates the feasibility and practicality of the “file-as-index” paradigm, substantially reducing debugging and maintenance complexity for ANN indexes.

Technology Category

Application Category

📝 Abstract
Modern analytical pipelines routinely deploy multiple deep learning and retrieval models that rely on approximate nearest-neighbor (ANN) indexes to support efficient similarity-based search. While many state-of-the-art ANN-indexes are memory-based (e.g., HNSW and IVF), using multiple ANN indexes creates a competition for limited GPU/CPU memory resources, which in turn necessitates disk-based index structures (e.g., DiskANN or eCP). In typical index implementations, the main component is a complex data structure that is serialized to disk and is read either fully at startup time, for memory-based indexes, or incrementally at query time, for disk-based indexes. To visualize the index structure, or analyze its quality, complex coding is needed that is either embedded in the index implementation or replicates the code that reads the data structure. In this paper, we consider an alternative approach that maps the data structure to a file structure, using a file library, making the index easily readable for any programming language and even human-readable. The disadvantage is that the serialized index is verbose, leading to overhead of searching through the index. The question addressed in this paper is how severe this performance penalty is. To that end, this paper presents eCP-FS, a file-based implementation of eCP, a well-known disk-based ANN index. A comparison with state-of-the-art indexes shows that while eCP-FS is slower, the implementation is nevertheless somewhat competitive even when memory is not constrained. In a memory-constrained scenario, eCP-FS offers a minimal memory footprint, making it ideal for resource-constrained or multi-index environments.
Problem

Research questions and friction points this paper is trying to address.

Evaluates performance penalty of file-based ANN indexes
Compares eCP-FS with memory-based and disk-based ANN indexes
Assesses suitability of eCP-FS for memory-constrained environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

File-based implementation of eCP index
Minimal memory footprint in constrained scenarios
Human-readable and language-agnostic index structure
🔎 Similar Papers
No similar papers found.