Generalizable Prompt Tuning for Audio-Language Models via Semantic Expansion

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This work addresses the generalization trade-off between base and novel classes in prompt tuning for audio-language models. To this end, the authors propose Semantic-Expanded Prompt Tuning (SEPT), a novel framework that establishes the first audio-language prompt generalization benchmark and introduces explicit regularization of the prompt embedding space using semantic neighbors generated by large language models. By designing a boundary-constrained semantic expansion loss, SEPT enhances intra-class compactness and inter-class separability. Experimental results demonstrate that SEPT significantly improves generalization from base to novel classes and cross-dataset transfer performance across multiple baselines, without incurring additional inference overhead.

Technology Category

Application Category

📝 Abstract
Prompt tuning has achieved remarkable progress in vision-language models (VLMs) and is recently being adopted for audio-language models (ALMs). However, its generalization ability in ALMs remains largely underexplored. We observe that conventional prompt tuning for ALMs also suffers from the Base-New Tradeoff, and we identify that this issue stems from the disrupted semantic structure of the embedding space. To address this issue, we propose Semantically Expanded Prompt Tuning (SEPT)-a plug-and-play framework that explicitly regularizes the prompt embedding space by incorporating semantic neighbors generated by large language models. SEPT introduces a novel semantic expansion loss with margin constraints that promote intra-class compactness and inter-class separability, thereby enhancing the semantic structure of the prompt embedding space. For comprehensive evaluation, we establish the first benchmark setup for prompt generalization in ALMs, covering both base-to-new generalization and cross-dataset transferability. Extensive experiments demonstrate that SEPT consistently improves generalization performance across multiple prompt tuning baselines, while maintaining computational cost during inference. Codes are available in https://github.com/jhyukjang/SEPT.
Problem

Research questions and friction points this paper is trying to address.

prompt tuning
audio-language models
generalization
Base-New Tradeoff
semantic structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt Tuning
Audio-Language Models
Semantic Expansion
Generalization
Embedding Regularization
🔎 Similar Papers
No similar papers found.