City2Scene: Improving Acoustic Scene Classification with City Features

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the limited generalization of Acoustic Scene Classification (ASC) models caused by inter-city acoustic environmental disparities, this paper proposes a scene classification method that integrates city-specific acoustic priors. The core method introduces city classification as an auxiliary supervision task and transfers city-level acoustic discriminative knowledge to the primary ASC model via knowledge distillation. Additionally, it establishes a city-scene joint labeling framework, compatible with both CNN- and Transformer-based backbones, to jointly learn cross-city invariant features and city-specific acoustic cues. Evaluated on the DCASE 2023 Task 1 benchmark, the approach consistently improves the accuracy of multiple state-of-the-art ASC models—achieving an average gain of +1.8%. Empirical results demonstrate that explicitly modeling city-level acoustic variations significantly enhances the robustness and generalization capability of ASC systems.

Technology Category

Application Category

📝 Abstract

Acoustic scene recordings are often collected from a diverse range of cities. Most existing acoustic scene classification (ASC) approaches focus on identifying common acoustic scene patterns across cities to enhance generalization. In contrast, we hypothesize that city-specific environmental and cultural differences in acoustic features are beneficial for the ASC task. In this paper, we introduce City2Scene, a novel framework that leverages city features to improve ASC. City2Scene transfers the city-specific knowledge from city classification models to a scene classification model using knowledge distillation. We evaluated City2Scene on the DCASE Challenge Task 1 datasets, where each audio clip is annotated with both scene and city labels. Experimental results demonstrate that city features provide valuable information for classifying scenes. By distilling the city-specific knowledge, City2Scene effectively improves accuracy for various state-of-the-art ASC backbone models, including both CNNs and Transformers.

Problem

Research questions and friction points this paper is trying to address.

Leveraging city-specific features for acoustic scene classification

Improving ASC accuracy using city knowledge distillation

Addressing environmental and cultural differences in acoustic scenes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages city features for scene classification

Uses knowledge distillation for city-specific knowledge transfer

Improves accuracy for CNNs and Transformers models

🔎 Similar Papers

Compositional Audio Representation Learning