Combolutional Neural Networks

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing audio frontends struggle with harmonic feature extraction due to mismatched inductive biases and poor interpretability. Method: This paper introduces the *combolutional layer*—a time-domain harmonic analysis module that integrates a learnable-delay IIR comb filter with envelope detection, implemented entirely in real-valued arithmetic for low parameter count (<1K), high interpretability, and CPU-efficient inference. It serves as a drop-in replacement for standard convolutional layers and is trained end-to-end. Contribution/Results: Evaluated on piano transcription, speaker classification, and key detection, the combolutional layer outperforms mainstream spectrogram-based frontends (e.g., Log-Mel, CQT) in harmonic structure modeling while significantly reducing computational overhead. Its core innovation lies in embedding physically grounded comb filtering—rooted in harmonic resonance principles—into a differentiable neural architecture, thereby unifying efficiency, interpretability, and task-specific adaptability.

Technology Category

Application Category

📝 Abstract
Selecting appropriate inductive biases is an essential step in the design of machine learning models, especially when working with audio, where even short clips may contain millions of samples. To this end, we propose the combolutional layer: a learned-delay IIR comb filter and fused envelope detector, which extracts harmonic features in the time domain. We demonstrate the efficacy of the combolutional layer on three information retrieval tasks, evaluate its computational cost relative to other audio frontends, and provide efficient implementations for training. We find that the combolutional layer is an effective replacement for convolutional layers in audio tasks where precise harmonic analysis is important, e.g., piano transcription, speaker classification, and key detection. Additionally, the combolutional layer has several other key benefits over existing frontends, namely: low parameter count, efficient CPU inference, strictly real-valued computations, and improved interpretability.
Problem

Research questions and friction points this paper is trying to address.

Proposing combolutional layer for harmonic feature extraction
Evaluating computational cost of audio frontends
Replacing convolutional layers in precise harmonic tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned-delay IIR comb filter
Fused envelope detector
Time-domain harmonic feature extraction
🔎 Similar Papers
No similar papers found.
C
Cameron Churchwell
University of Illinois at Urbana-Champaign, Siebel School of Computing and Data Science, Urbana IL, USA, 61801
M
Minje Kim
University of Illinois at Urbana-Champaign, Siebel School of Computing and Data Science, Urbana IL, USA, 61801
Paris Smaragdis
Paris Smaragdis
Professor, Massachusetts Institute of Technology
Audio Signal ProcessingComputational AuditionMachine LearningMachine Listening