ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval

πŸ“… 2025-12-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

201K/year
πŸ€– AI Summary
Existing audio-text retrieval (ATR) methods suffer from the gradient locality bottleneck (GLB) induced by small-batch contrastive learning, limiting their capacity to model fine-grained and long-tail semantics beyond the batch. While external knowledge enhancement alleviates GLB, it introduces representation drift mismatch (RDM)β€”a misalignment between static knowledge bases and dynamically evolving model representations. To address both challenges jointly, this paper proposes the Adaptive Self-Optimizing Knowledge (ASOK) framework, the first to systematically decouple and co-resolve GLB and RDM. ASOK integrates three core innovations: (1) multi-granularity knowledge injection, (2) dynamic graph neural network–driven knowledge refinement, and (3) an adaptive reliability-weighting mechanism under cross-modal embedding alignment. Evaluated on AudioCaps and Clotho benchmarks, ASOK achieves state-of-the-art performance, with significant gains in retrieval accuracy for fine-grained and long-tail samples.

Technology Category

Application Category

πŸ“ Abstract
The dominant paradigm for Audio-Text Retrieval (ATR) relies on mini-batch-based contrastive learning. This process, however, is inherently limited by what we formalize as the Gradient Locality Bottleneck (GLB), which structurally prevents models from leveraging out-of-batch knowledge and thus impairs fine-grained and long-tail learning. While external knowledge-enhanced methods can alleviate the GLB, we identify a critical, unaddressed side effect: the Representation-Drift Mismatch (RDM), where a static knowledge base becomes progressively misaligned with the evolving model, turning guidance into noise. To address this dual challenge, we propose the Adaptive Self-improving Knowledge (ASK) framework, a model-agnostic, plug-and-play solution. ASK breaks the GLB via multi-grained knowledge injection, systematically mitigates RDM through dynamic knowledge refinement, and introduces a novel adaptive reliability weighting scheme to ensure consistent knowledge contributes to optimization. Experimental results on two benchmark datasets with superior, state-of-the-art performance justify the efficacy of our proposed ASK framework.
Problem

Research questions and friction points this paper is trying to address.

Addresses Gradient Locality Bottleneck limiting out-of-batch knowledge use
Mitigates Representation-Drift Mismatch from static knowledge base misalignment
Proposes adaptive framework for dynamic knowledge refinement in retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-grained knowledge injection breaks gradient locality bottleneck
Dynamic knowledge refinement mitigates representation-drift mismatch
Adaptive reliability weighting ensures consistent knowledge optimization