AllMetrics: A Unified Python Library for Standardized Metric Evaluation and Robust Data Validation in Machine Learning

๐Ÿ“… 2025-05-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing ML evaluation metric libraries suffer from fragmentation, inconsistent implementations, and weak data validation, hindering reliable cross-framework comparisons. To address this, we propose AllMetricsโ€”the first task-aware, standardized metric framework supporting diverse tasks including regression, classification, clustering, segmentation, and image-to-image translation. AllMetrics systematically eliminates implementation discrepancies (IDs) and reporting discrepancies (RDs) via a modular API, strongly typed input validation, multi-task adapters, and cross-language (Python/Matlab/R) consistency verification. Empirical evaluation across healthcare, finance, and real estate domains demonstrates that AllMetrics significantly reduces evaluation errors, improves reproducibility, and enhances trustworthiness in ML workflows.

Technology Category

Application Category

๐Ÿ“ Abstract
Machine learning (ML) models rely heavily on consistent and accurate performance metrics to evaluate and compare their effectiveness. However, existing libraries often suffer from fragmentation, inconsistent implementations, and insufficient data validation protocols, leading to unreliable results. Existing libraries have often been developed independently and without adherence to a unified standard, particularly concerning the specific tasks they aim to support. As a result, each library tends to adopt its conventions for metric computation, input/output formatting, error handling, and data validation protocols. This lack of standardization leads to both implementation differences (ID) and reporting differences (RD), making it difficult to compare results across frameworks or ensure reliable evaluations. To address these issues, we introduce AllMetrics, an open-source unified Python library designed to standardize metric evaluation across diverse ML tasks, including regression, classification, clustering, segmentation, and image-to-image translation. The library implements class-specific reporting for multi-class tasks through configurable parameters to cover all use cases, while incorporating task-specific parameters to resolve metric computation discrepancies across implementations. Various datasets from domains like healthcare, finance, and real estate were applied to our library and compared with Python, Matlab, and R components to identify which yield similar results. AllMetrics combines a modular Application Programming Interface (API) with robust input validation mechanisms to ensure reproducibility and reliability in model evaluation. This paper presents the design principles, architectural components, and empirical analyses demonstrating the ability to mitigate evaluation errors and to enhance the trustworthiness of ML workflows.
Problem

Research questions and friction points this paper is trying to address.

Fragmented and inconsistent ML metric implementations across libraries
Lack of standardized data validation in performance evaluation
Difficulty comparing results due to varying computation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Python library for standardized ML metrics
Modular API with robust input validation
Task-specific parameters to resolve discrepancies
M
Morteza Alizadeh
Department of Mathematics, University of Isfahan, Isfahan, Iran
Mehrdad Oveisi
Mehrdad Oveisi
The University of British Columbia
AI/MLEducationComputational BiologyBiomedical InformaticsData Science
S
Sonya Falahati
Electrical and Computer Engineering Department, Nooshirvani University of Technology, Babol, Iran; Technological Virtual Collaboration (TECVICO Corp.), Vancouver, BC, Canada
G
Ghazal Mousavi
School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
M
Mohsen Alambardar Meybodi
Department of Applied Mathematics and Computer Science, University of Isfahan, Isfahan, Iran
S
Somayeh Sadat Mehrnia
Department of Integrative Oncology, Breast Cancer Research Center, Motamed Cancer Institute, ACECR, Tehran, Iran
Ilker Hacihaliloglu
Ilker Hacihaliloglu
Department of Radiology, Department of Medicine, University of British Columbia
Biomedical EngineeringMedical Image ProcessingUltrasound Image ProcessingImage Guided Surgery and TherapyDeep Learning f
Arman Rahmim
Arman Rahmim
Professor of Radiology, Physics and Biomedical Engineering, University of British Columbia
computational imagingmolecular imagingpersonalized cancer therapyAItheranostics
M
Mohammad R. Salmanpour
Technological Virtual Collaboration (TECVICO Corp.), Vancouver, BC, Canada; Department of Radiology, University of British Columbia, Vancouver, BC, Canada; Department of Integrative Oncology, BC Cancer Research Institute, Vancouver, BC, Canada