Learning Perceptually Relevant Temporal Envelope Morphing

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the perceptually unnatural interpolation of audio temporal envelopes—specifically, the difficulty current methods face in generating perceptually natural intermediate envelopes between signals with large duration or rhythmic disparities. To tackle this, we introduce the first framework integrating psychoacoustic experimentation with machine learning: human listening experiments inform perceptual constraints, which guide the construction of a large-scale synthetic–natural hybrid envelope dataset. We propose an auditory-guided envelope warping principle, design an interpretable latent-space autoencoder, and establish a multi-level evaluation benchmark. Our model achieves significant improvements over state-of-the-art methods in generating perceptually “central” intermediate envelopes—i.e., those subjectively judged as equidistant and natural between endpoints. All code, models, and datasets are publicly released.

Technology Category

Application Category

📝 Abstract
Temporal envelope morphing, the process of interpolating between the amplitude dynamics of two audio signals, is an emerging problem in generative audio systems that lacks sufficient perceptual grounding. Morphing of temporal envelopes in a perceptually intuitive manner should enable new methods for sound blending in creative media and for probing perceptual organization in psychoacoustics. However, existing audio morphing techniques often fail to produce intermediate temporal envelopes when input sounds have distinct temporal structures; many morphers effectively overlay both temporal structures, leading to perceptually unnatural results. In this paper, we introduce a novel workflow for learning envelope morphing with perceptual guidance: we first derive perceptually grounded morphing principles through human listening studies, then synthesize large-scale datasets encoding these principles, and finally train machine learning models to create perceptually intermediate morphs. Specifically, we present: (1) perceptual principles that guide envelope morphing, derived from our listening studies, (2) a supervised framework to learn these principles, (3) an autoencoder that learns to compress temporal envelope structures into latent representations, and (4) benchmarks for evaluating audio envelope morphs, using both synthetic and naturalistic data, and show that our approach outperforms existing methods in producing temporally intermediate morphs. All code, models, and datasets will be made publicly available upon publication.
Problem

Research questions and friction points this paper is trying to address.

Developing perceptually intuitive temporal envelope morphing for audio signals
Addressing unnatural results in existing audio morphing techniques
Learning perceptual principles for intermediate temporal envelope generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perceptual principles guide envelope morphing
Autoencoder compresses temporal envelope structures
Supervised framework learns morphing principles
🔎 Similar Papers
No similar papers found.