DroFiT: A Lightweight Band-fused Frequency Attention Toward Real-time UAV Speech Enhancement

📅 2025-09-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses real-time single-microphone speech enhancement for UAVs under severe self-noise and stringent resource constraints. We propose a lightweight band-fusion attention network that innovatively integrates frequency-domain Transformers with subband encoding. Key methodological contributions include a learnable gated fusion mechanism, a hybrid full-band/subband encoder-decoder architecture, a temporal convolutional network (TCN) backend, and a joint spectral-temporal loss function—enabling low-latency streaming inference. Experiments on VoiceBank-DEMAND and realistic UAV noise datasets demonstrate robust spectral reconstruction at extremely low SNRs: PESQ improves by over 1.2 points. The model reduces computational complexity and memory footprint by 47% and 53%, respectively, while maintaining real-time performance—fulfilling the strict deployment requirements of onboard UAV platforms.

Technology Category

Application Category

📝 Abstract
This paper proposes DroFiT (Drone Frequency lightweight Transformer for speech enhancement, a single microphone speech enhancement network for severe drone self-noise. DroFit integrates a frequency-wise Transformer with a full/sub-band hybrid encoder-decoder and a TCN back-end for memory-efficient streaming. A learnable skip-and-gate fusion with a combined spectral-temporal loss further refines reconstruction. The model is trained on VoiceBank-DEMAND mixed with recorded drone noise (-5 to -25 dB SNR) and evaluate using standard speech enhancement metrics and computational efficiency. Experimental results show that DroFiT achieves competitive enhancement performance while significantly reducing computational and memory demands, paving the way for real-time processing on resource-constrained UAV platforms. Audio demo samples are available on our demo page.
Problem

Research questions and friction points this paper is trying to address.

Enhancing speech corrupted by severe drone self-noise
Achieving real-time processing on resource-constrained UAV platforms
Reducing computational and memory demands for speech enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-wise Transformer with hybrid encoder-decoder
TCN back-end for memory-efficient streaming
Learnable skip-and-gate fusion with spectral-temporal loss
🔎 Similar Papers
No similar papers found.
J
Jeongmin Lee
Sungkyunkwan University, Suwon, Republic of Korea
C
Chanhong Jeon
Sungkyunkwan University, Suwon, Republic of Korea
H
Hyungjoo Seo
University of Illinois Urbana-Champaign, Urbana, IL, USA
Taewook Kang
Taewook Kang
Electronics and Telecommunications Research Institute (ETRI), Principal Researcher
Spiking neural networksMachine learningBiometricsHuman body communicationsEnergy harvesting