🤖 AI Summary
High-dimensional approximate nearest neighbor (ANN) indexes suffer from redundancy and high search overhead when serialized into generic file formats. Method: We propose eCP-FS—the first disk-based ANN indexing system that directly models the index as a human-readable, program-parsable filesystem structure, replacing conventional serialization with filesystem abstractions. Built upon the eCP indexing algorithm and standard filesystem libraries, eCP-FS enables cross-language interoperability and transparent access. Contribution/Results: Experiments show eCP-FS achieves minimal memory footprint under memory-constrained settings, with pronounced advantages in multi-index coexistence scenarios; it remains competitive even with ample memory. This work validates the feasibility and practicality of the “file-as-index” paradigm, substantially reducing debugging and maintenance complexity for ANN indexes.
📝 Abstract
Modern analytical pipelines routinely deploy multiple deep learning and retrieval models that rely on approximate nearest-neighbor (ANN) indexes to support efficient similarity-based search. While many state-of-the-art ANN-indexes are memory-based (e.g., HNSW and IVF), using multiple ANN indexes creates a competition for limited GPU/CPU memory resources, which in turn necessitates disk-based index structures (e.g., DiskANN or eCP). In typical index implementations, the main component is a complex data structure that is serialized to disk and is read either fully at startup time, for memory-based indexes, or incrementally at query time, for disk-based indexes. To visualize the index structure, or analyze its quality, complex coding is needed that is either embedded in the index implementation or replicates the code that reads the data structure. In this paper, we consider an alternative approach that maps the data structure to a file structure, using a file library, making the index easily readable for any programming language and even human-readable. The disadvantage is that the serialized index is verbose, leading to overhead of searching through the index. The question addressed in this paper is how severe this performance penalty is. To that end, this paper presents eCP-FS, a file-based implementation of eCP, a well-known disk-based ANN index. A comparison with state-of-the-art indexes shows that while eCP-FS is slower, the implementation is nevertheless somewhat competitive even when memory is not constrained. In a memory-constrained scenario, eCP-FS offers a minimal memory footprint, making it ideal for resource-constrained or multi-index environments.