Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Generative text-to-audio diffusion models suffer from high inference energy consumption, making it challenging to simultaneously achieve high audio fidelity and energy efficiency. Method: We introduce the first systematic energy quantification framework for such models, empirically evaluating seven state-of-the-art architectures. We analyze nonlinear relationships between energy consumption and key inference parameters—including sampling steps and audio resolution—via sensitivity analysis, multi-objective Pareto frontier modeling, and cross-model energy-efficiency normalization. Contribution/Results: We propose a novel “Pareto energy-efficiency–quality co-optimization” paradigm for green AI. Experiments identify three low-energy, high-fidelity configurations that reduce energy consumption by up to 47% while degrading Mean Opinion Score (MOS) by less than 0.3. Our approach provides a reproducible, generalizable pathway for sustainable audio generation, enabling principled trade-offs between perceptual quality and computational sustainability.

Technology Category

Application Category

📝 Abstract

Text-to-audio models have recently emerged as a powerful technology for generating sound from textual descriptions. However, their high computational demands raise concerns about energy consumption and environmental impact. In this paper, we conduct an analysis of the energy usage of 7 state-of-the-art text-to-audio diffusion-based generative models, evaluating to what extent variations in generation parameters affect energy consumption at inference time. We also aim to identify an optimal balance between audio quality and energy consumption by considering Pareto-optimal solutions across all selected models. Our findings provide insights into the trade-offs between performance and environmental impact, contributing to the development of more efficient generative audio models.

Problem

Research questions and friction points this paper is trying to address.

Analyzing energy consumption of text-to-audio diffusion models

Balancing audio quality and energy efficiency in generative models

Evaluating parameter impact on inference-time energy usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing energy consumption of text-to-audio diffusion models

Evaluating parameter impacts on inference energy usage

Identifying Pareto-optimal audio quality-energy balance

🔎 Similar Papers

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer