Mind the Prompt: Prompting Strategies in Audio Generations for Improving Sound Classification

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the limited utility of text-to-audio (TTA) synthetic data for sound classification. We propose a task-oriented prompt design strategy and, for the first time, explore cross-TTA model synthetic data fusion. Unlike naive data augmentation, our approach leverages semantic-aligned prompt engineering to guide multiple TTA models in collaboratively generating high-fidelity, class-balanced audio samples; their prediction confidences are then fused to refine training data quality. Evaluated on ESC-50 and UrbanSound8K, our method achieves significant improvements in supervised sound classification accuracy (+3.2%–5.7%). Results demonstrate the effectiveness and robustness of prompt-driven, multi-model synthetic data fusion for enhancing downstream task performance. This establishes a novel paradigm for audio data augmentation in low-resource settings.

Technology Category

Application Category

📝 Abstract

This paper investigates the design of effective prompt strategies for generating realistic datasets using Text-To-Audio (TTA) models. We also analyze different techniques for efficiently combining these datasets to enhance their utility in sound classification tasks. By evaluating two sound classification datasets with two TTA models, we apply a range of prompt strategies. Our findings reveal that task-specific prompt strategies significantly outperform basic prompt approaches in data generation. Furthermore, merging datasets generated using different TTA models proves to enhance classification performance more effectively than merely increasing the training dataset size. Overall, our results underscore the advantages of these methods as effective data augmentation techniques using synthetic data.

Problem

Research questions and friction points this paper is trying to address.

Designing effective prompt strategies for Text-To-Audio models

Enhancing sound classification with synthetic dataset combinations

Evaluating task-specific prompts versus basic prompt approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-specific prompts enhance TTA data generation

Merging datasets from different TTA models improves classification

Synthetic data augmentation boosts sound classification performance

🔎 Similar Papers

No similar papers found.