Bencher: Simple and Reproducible Benchmarking for Black-Box Optimization

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing black-box optimization benchmarking frameworks suffer from dependency conflicts, tight environment coupling, and poor integration scalability, severely compromising experimental reproducibility. To address these challenges, we propose Bencher—a modular benchmarking framework introducing the novel concept of *benchmark isolation*: it decouples benchmark execution from optimization logic via lightweight, version-agnostic RPC abstractions and isolated virtual environments. Bencher supports cross-platform deployment across local machines, Docker containers, and HPC systems (via Singularity). It unifies modeling of heterogeneous search spaces—including continuous, categorical, and binary domains—enabling reproducible benchmark integration across environments, software versions, and application domains. The framework currently supports one-click integration of 80 real-world benchmarks, with a zero-configuration, lightweight client. Empirical evaluation demonstrates substantial improvements in reliability, reproducibility, and engineering compatibility for black-box optimization algorithm assessment.

Technology Category

Application Category

📝 Abstract

We present Bencher, a modular benchmarking framework for black-box optimization that fundamentally decouples benchmark execution from optimization logic. Unlike prior suites that focus on combining many benchmarks in a single project, Bencher introduces a clean abstraction boundary: each benchmark is isolated in its own virtual Python environment and accessed via a unified, version-agnostic remote procedure call (RPC) interface. This design eliminates dependency conflicts and simplifies the integration of diverse, real-world benchmarks, which often have complex and conflicting software requirements. Bencher can be deployed locally or remotely via Docker or on high-performance computing (HPC) clusters via Singularity, providing a containerized, reproducible runtime for any benchmark. Its lightweight client requires minimal setup and supports drop-in evaluation of 80 benchmarks across continuous, categorical, and binary domains.

Problem

Research questions and friction points this paper is trying to address.

Decouples benchmark execution from optimization logic

Eliminates dependency conflicts in diverse benchmarks

Provides containerized reproducible runtime for benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular benchmarking framework decouples execution and logic

Isolates benchmarks in virtual Python environments via RPC

Containerized runtime supports Docker and HPC deployment

🔎 Similar Papers

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization