Scalable Evaluation for Audio Identification via Synthetic Latent Fingerprint Generation

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world audio fingerprint evaluation is hindered by the lack of large-scale, publicly available music corpora. Method: We propose the first scalable, audio-free evaluation framework: leveraging pre-trained fingerprint models to extract latent representations, and introducing a Rectified Flow model to synthesize high-fidelity fingerprint embeddings—enabling construction of a billion-scale distractor database. Distribution alignment metrics quantitatively assess synthesis fidelity, ensuring generated fingerprints faithfully reproduce the statistical properties of real-world data. Contribution/Results: Experiments demonstrate that retrieval performance degradation under synthetic interference closely mirrors that observed with real-world distractors, validating cross-system benchmarking capability. Our approach eliminates dependence on massive real audio collections, significantly enhancing evaluation efficiency, scalability, and reproducibility—establishing a new paradigm for large-scale audio fingerprint assessment.

Technology Category

Application Category

📝 Abstract
The evaluation of audio fingerprinting at a realistic scale is limited by the scarcity of large public music databases. We present an audio-free approach that synthesises latent fingerprints which approximate the distribution of real fingerprints. Our method trains a Rectified Flow model on embeddings extracted by pre-trained neural audio fingerprinting systems. The synthetic fingerprints generated using our system act as realistic distractors and enable the simulation of retrieval performance at a large scale without requiring additional audio. We assess the fidelity of synthetic fingerprints by comparing the distributions to real data. We further benchmark the retrieval performances across multiple state-of-the-art audio fingerprinting frameworks by augmenting real reference databases with synthetic distractors, and show that the scaling trends obtained with synthetic distractors closely track those obtained with real distractors. Finally, we scale the synthetic distractor database to model retrieval performance for very large databases, providing a practical metric of system scalability that does not depend on access to audio corpora.
Problem

Research questions and friction points this paper is trying to address.

Evaluating audio fingerprinting scalability faces large database scarcity
Synthetic latent fingerprints approximate real fingerprint distributions without audio
Simulating large-scale retrieval performance using synthetic distractors instead of audio
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesizes latent fingerprints using Rectified Flow
Generates realistic distractors without requiring audio
Enables scalable evaluation by augmenting reference databases
🔎 Similar Papers
No similar papers found.
A
Aditya Bhattacharjee
School of Electronic Engineering and Computer Science, Queen Mary University of London, UK
M
Marco Pasini
School of Electronic Engineering and Computer Science, Queen Mary University of London, UK
Emmanouil Benetos
Emmanouil Benetos
Queen Mary University of London
Machine listeningAudio signal processingMusic information retrievalMachine learning