š¤ AI Summary
In multi-objective machine learning, heterogeneous metricsāsuch as accuracy and carbon emissionsāexhibit incompatible units and scales, impeding direct comparison, trade-off analysis, and Pareto frontier navigation. To address this, we propose a dimensionless normalization method grounded in empirical cumulative distribution functions (CDFs) and relative ranking, enabling cross-scale objective comparability and user preferenceādriven personalized aggregationāovercoming key limitations of conventional min-max normalization and weighted sum scalarization. Our framework comprises four core components: relative ranking estimation, CDF-based objective modeling, Pareto frontier search, and preference-embedded aggregation. Extensive evaluation across three benchmark scenariosāLLM deployment, domain generalization, and AutoMLādemonstrates substantial improvements in recommendation rationality and robustness, particularly where classical scalarization methods fail. The approach enables interpretable, preference-aware optimization without requiring prior knowledge of objective ranges or manual weight tuning.
š Abstract
In machine learning (ML), it is common to account for multiple objectives when, e.g., selecting a model to deploy. However, it is often unclear how one should compare, aggregate and, ultimately, trade-off these objectives, as they might be measured in different units or scales. For example, when deploying large language models (LLMs), we might not only care about their performance, but also their CO2 consumption. In this work, we investigate how objectives can be sensibly compared and aggregated to navigate their Pareto front. To do so, we propose to make incomparable objectives comparable via their CDFs, approximated by their relative rankings. This allows us to aggregate them while matching user-specific preferences, allowing practitioners to meaningfully navigate and search for models in the Pareto front. We demonstrate the potential impact of our methodology in diverse areas such as LLM selection, domain generalization, and AutoML benchmarking, where classical ways to aggregate and normalize objectives fail.