DroFiT: A Lightweight Band-fused Frequency Attention Toward Real-time UAV Speech Enhancement

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses real-time single-microphone speech enhancement for UAVs under severe self-noise and stringent resource constraints. We propose a lightweight band-fusion attention network that innovatively integrates frequency-domain Transformers with subband encoding. Key methodological contributions include a learnable gated fusion mechanism, a hybrid full-band/subband encoder-decoder architecture, a temporal convolutional network (TCN) backend, and a joint spectral-temporal loss function—enabling low-latency streaming inference. Experiments on VoiceBank-DEMAND and realistic UAV noise datasets demonstrate robust spectral reconstruction at extremely low SNRs: PESQ improves by over 1.2 points. The model reduces computational complexity and memory footprint by 47% and 53%, respectively, while maintaining real-time performance—fulfilling the strict deployment requirements of onboard UAV platforms.

Technology Category

Application Category

📝 Abstract

This paper proposes DroFiT (Drone Frequency lightweight Transformer for speech enhancement, a single microphone speech enhancement network for severe drone self-noise. DroFit integrates a frequency-wise Transformer with a full/sub-band hybrid encoder-decoder and a TCN back-end for memory-efficient streaming. A learnable skip-and-gate fusion with a combined spectral-temporal loss further refines reconstruction. The model is trained on VoiceBank-DEMAND mixed with recorded drone noise (-5 to -25 dB SNR) and evaluate using standard speech enhancement metrics and computational efficiency. Experimental results show that DroFiT achieves competitive enhancement performance while significantly reducing computational and memory demands, paving the way for real-time processing on resource-constrained UAV platforms. Audio demo samples are available on our demo page.

Problem

Research questions and friction points this paper is trying to address.

Enhancing speech corrupted by severe drone self-noise

Achieving real-time processing on resource-constrained UAV platforms

Reducing computational and memory demands for speech enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-wise Transformer with hybrid encoder-decoder

TCN back-end for memory-efficient streaming

Learnable skip-and-gate fusion with spectral-temporal loss

🔎 Similar Papers

No similar papers found.