Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing black-box optimization benchmarking frameworks suffer from dependency conflicts, tight environment coupling, and poor integration scalability, severely compromising experimental reproducibility. To address these challenges, we propose Bencher—a modular benchmarking framework introducing the novel concept of *benchmark isolation*: it decouples benchmark execution from optimization logic via lightweight, version-agnostic RPC abstractions and isolated virtual environments. Bencher supports cross-platform deployment across local machines, Docker containers, and HPC systems (via Singularity). It unifies modeling of heterogeneous search spaces—including continuous, categorical, and binary domains—enabling reproducible benchmark integration across environments, software versions, and application domains. The framework currently supports one-click integration of 80 real-world benchmarks, with a zero-configuration, lightweight client. Empirical evaluation demonstrates substantial improvements in reliability, reproducibility, and engineering compatibility for black-box optimization algorithm assessment.

Technology Category

Application Category

📝 Abstract
We present Bencher, a modular benchmarking framework for black-box optimization that fundamentally decouples benchmark execution from optimization logic. Unlike prior suites that focus on combining many benchmarks in a single project, Bencher introduces a clean abstraction boundary: each benchmark is isolated in its own virtual Python environment and accessed via a unified, version-agnostic remote procedure call (RPC) interface. This design eliminates dependency conflicts and simplifies the integration of diverse, real-world benchmarks, which often have complex and conflicting software requirements. Bencher can be deployed locally or remotely via Docker or on high-performance computing (HPC) clusters via Singularity, providing a containerized, reproducible runtime for any benchmark. Its lightweight client requires minimal setup and supports drop-in evaluation of 80 benchmarks across continuous, categorical, and binary domains.
Problem

Research questions and friction points this paper is trying to address.

Decouples benchmark execution from optimization logic
Eliminates dependency conflicts in diverse benchmarks
Provides containerized reproducible runtime for benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular benchmarking framework decouples execution and logic
Isolates benchmarks in virtual Python environments via RPC
Containerized runtime supports Docker and HPC deployment
🔎 Similar Papers
No similar papers found.
L
Leonard Papenmeier
Department of Computer Science, Lund University, Sweden
Luigi Nardi
Luigi Nardi
Associate Professor in Machine Learning at Lund University
Machine LearningBayesian Optimization