Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design

📅 2023-02-06

🏛️ Comput. Aided Des.

📈 Citations: 40

✨ Influential: 1

career value

220K/year

🤖 AI Summary

Existing evaluation metrics for deep generative models (e.g., VAEs, GANs, diffusion models, Transformers) in engineering design—largely borrowed from statistical likelihood-based measures—fail to capture design-critical properties such as constraint satisfaction, functional performance, and design value. Method: We propose the first multidimensional evaluation framework tailored to engineering design, comprising four orthogonal dimensions: constraint compliance, functional effectiveness, novelty, and conditional controllability. We further develop an open-source, reproducible benchmark suite and software toolkit to bridge machine learning theory and design practice. Contribution/Results: The framework is rigorously validated on 2D visualization case studies and real-world engineering tasks—including bicycle frame and structural topology generation. Experiments demonstrate substantial improvements in alignment between automated evaluation and human-assessed design value: target achievement rate (+23.6%), geometric constraint compliance (+31.4%), and design novelty (+18.9%).

📝 Abstract

Deep generative models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), Diffusion Models, and Transformers, have shown great promise in a variety of applications, including image and speech synthesis, natural language processing, and drug discovery. However, when applied to engineering design problems, evaluating the performance of these models can be challenging, as traditional statistical metrics based on likelihood may not fully capture the requirements of engineering applications. This paper doubles as a review and practical guide to evaluation metrics for deep generative models (DGMs) in engineering design. We first summarize the well-accepted `classic' evaluation metrics for deep generative models grounded in machine learning theory. Using case studies, we then highlight why these metrics seldom translate well to design problems but see frequent use due to the lack of established alternatives. Next, we curate a set of design-specific metrics which have been proposed across different research communities and can be used for evaluating deep generative models. These metrics focus on unique requirements in design and engineering, such as constraint satisfaction, functional performance, novelty, and conditioning. Throughout our discussion, we apply the metrics to models trained on simple-to-visualize 2-dimensional example problems. Finally, we evaluate four deep generative models on a bicycle frame design problem and structural topology generation problem. In particular, we showcase the use of proposed metrics to quantify performance target achievement, design novelty, and geometric constraints. We publicly release the code for the datasets, models, and metrics used throughout the paper at https://decode.mit.edu/projects/metrics/.

Problem

Research questions and friction points this paper is trying to address.

Rethinking evaluation metrics for deep generative models in engineering design

Addressing limitations of statistical metrics for engineering requirements

Proposing design-specific metrics for constraint satisfaction and functional performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposing design-specific metrics for generative models

Focusing on constraint satisfaction and functional performance

Evaluating models on engineering design case studies

🔎 Similar Papers

Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception