Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative text-to-audio diffusion models suffer from high inference energy consumption, making it challenging to simultaneously achieve high audio fidelity and energy efficiency. Method: We introduce the first systematic energy quantification framework for such models, empirically evaluating seven state-of-the-art architectures. We analyze nonlinear relationships between energy consumption and key inference parameters—including sampling steps and audio resolution—via sensitivity analysis, multi-objective Pareto frontier modeling, and cross-model energy-efficiency normalization. Contribution/Results: We propose a novel “Pareto energy-efficiency–quality co-optimization” paradigm for green AI. Experiments identify three low-energy, high-fidelity configurations that reduce energy consumption by up to 47% while degrading Mean Opinion Score (MOS) by less than 0.3. Our approach provides a reproducible, generalizable pathway for sustainable audio generation, enabling principled trade-offs between perceptual quality and computational sustainability.

Technology Category

Application Category

📝 Abstract
Text-to-audio models have recently emerged as a powerful technology for generating sound from textual descriptions. However, their high computational demands raise concerns about energy consumption and environmental impact. In this paper, we conduct an analysis of the energy usage of 7 state-of-the-art text-to-audio diffusion-based generative models, evaluating to what extent variations in generation parameters affect energy consumption at inference time. We also aim to identify an optimal balance between audio quality and energy consumption by considering Pareto-optimal solutions across all selected models. Our findings provide insights into the trade-offs between performance and environmental impact, contributing to the development of more efficient generative audio models.
Problem

Research questions and friction points this paper is trying to address.

Analyzing energy consumption of text-to-audio diffusion models
Balancing audio quality and energy efficiency in generative models
Evaluating parameter impact on inference-time energy usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing energy consumption of text-to-audio diffusion models
Evaluating parameter impacts on inference energy usage
Identifying Pareto-optimal audio quality-energy balance
🔎 Similar Papers
No similar papers found.
R
Riccardo Passoni
Dipartimento di Elettronica, Informazione e Bioingegneria - Politecnico di Milano, Milan, Italy
Francesca Ronchini
Francesca Ronchini
Politecnico di Milano
Luca Comanducci
Luca Comanducci
Politecnico di Milano
Music InformaticsGenerative ModelsMachine LearningSpatial Audio
R
Romain Serizel
Université de Lorraine, CNRS, Inria, Loria, Nancy, France
Fabio Antonacci
Fabio Antonacci
Politecnico di Milano
computer science