City2Scene: Improving Acoustic Scene Classification with City Features

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited generalization of Acoustic Scene Classification (ASC) models caused by inter-city acoustic environmental disparities, this paper proposes a scene classification method that integrates city-specific acoustic priors. The core method introduces city classification as an auxiliary supervision task and transfers city-level acoustic discriminative knowledge to the primary ASC model via knowledge distillation. Additionally, it establishes a city-scene joint labeling framework, compatible with both CNN- and Transformer-based backbones, to jointly learn cross-city invariant features and city-specific acoustic cues. Evaluated on the DCASE 2023 Task 1 benchmark, the approach consistently improves the accuracy of multiple state-of-the-art ASC models—achieving an average gain of +1.8%. Empirical results demonstrate that explicitly modeling city-level acoustic variations significantly enhances the robustness and generalization capability of ASC systems.

Technology Category

Application Category

📝 Abstract
Acoustic scene recordings are often collected from a diverse range of cities. Most existing acoustic scene classification (ASC) approaches focus on identifying common acoustic scene patterns across cities to enhance generalization. In contrast, we hypothesize that city-specific environmental and cultural differences in acoustic features are beneficial for the ASC task. In this paper, we introduce City2Scene, a novel framework that leverages city features to improve ASC. City2Scene transfers the city-specific knowledge from city classification models to a scene classification model using knowledge distillation. We evaluated City2Scene on the DCASE Challenge Task 1 datasets, where each audio clip is annotated with both scene and city labels. Experimental results demonstrate that city features provide valuable information for classifying scenes. By distilling the city-specific knowledge, City2Scene effectively improves accuracy for various state-of-the-art ASC backbone models, including both CNNs and Transformers.
Problem

Research questions and friction points this paper is trying to address.

Leveraging city-specific features for acoustic scene classification
Improving ASC accuracy using city knowledge distillation
Addressing environmental and cultural differences in acoustic scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages city features for scene classification
Uses knowledge distillation for city-specific knowledge transfer
Improves accuracy for CNNs and Transformers models
🔎 Similar Papers
No similar papers found.
Y
Yiqiang Cai
Xi’an Jiaotong-Liverpool University, China
Y
Yizhou Tan
Xi’an Jiaotong-Liverpool University, China
Peihong Zhang
Peihong Zhang
Professor of Physics, University at Buffalo
condensed matter physicselectronic structure theorymaterials science
Y
Yuxuan Liu
Xi’an Jiaotong-Liverpool University, China
Shengchen Li
Shengchen Li
Xi'an Jiaotong-Liverpool University
Machine Listening
Xi Shao
Xi Shao
Professor of Computer Engineering,Nanjing University of Posts and Telecommunications
Multimedia Information SystemComputer Audition
M
Mark D. Plumbley
University of Surrey, United Kingdom