Combolutional Neural Networks

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Existing audio frontends struggle with harmonic feature extraction due to mismatched inductive biases and poor interpretability. Method: This paper introduces the *combolutional layer*—a time-domain harmonic analysis module that integrates a learnable-delay IIR comb filter with envelope detection, implemented entirely in real-valued arithmetic for low parameter count (<1K), high interpretability, and CPU-efficient inference. It serves as a drop-in replacement for standard convolutional layers and is trained end-to-end. Contribution/Results: Evaluated on piano transcription, speaker classification, and key detection, the combolutional layer outperforms mainstream spectrogram-based frontends (e.g., Log-Mel, CQT) in harmonic structure modeling while significantly reducing computational overhead. Its core innovation lies in embedding physically grounded comb filtering—rooted in harmonic resonance principles—into a differentiable neural architecture, thereby unifying efficiency, interpretability, and task-specific adaptability.

Technology Category

Application Category

📝 Abstract

Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low parameter count, efficient CPU inference, strictly real-valued computations, and improved interpretability.

Problem

Research questions and friction points this paper is trying to address.

Proposing combolutional layer for harmonic feature extraction

Evaluating computational cost of audio frontends

Replacing convolutional layers in precise harmonic tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned-delay IIR comb filter

Fused envelope detector

Time-domain harmonic feature extraction

🔎 Similar Papers

A Survey on State-of-the-art Deep Learning Applications and Challenges