Efficiently Ranking Software Variants with Minimal Benchmarks

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and redundancy in benchmarking for software variant ranking, this paper proposes BISection Sampling (BISS), a correlation-aware test-suite reduction method that integrates critical-test preservation with a divide-and-conquer sampling strategy. BISS enables adaptive sampling across subsets of variants while preserving ranking stability. Evaluated on real-world datasets—including LLM leaderboards, SAT solver competitions, and configurable systems—BISS achieves an average 44% reduction in benchmarking overhead; in over 50% of cases, it reduces the number of required tests by up to 99%, without degrading Top-k ranking accuracy. The method thus provides a scalable, robust, and lightweight solution for large-scale variant assessment under resource constraints.

Technology Category

Application Category

📝 Abstract
Benchmarking is a common practice in software engineering to assess the qualities and performance of software variants, coming from multiple competing systems or from configurations of the same system. Benchmarks are used notably to compare and understand variant performance, fine-tune software, detect regressions, or design new software systems. The execution of benchmarks to get a complete picture of software variants is highly costly in terms of computational resources and time. In this paper, we propose a novel approach for reducing benchmarks while maintaining stable rankings, using test suite optimization techniques. That is, we remove instances from the benchmarks while trying to keep the same rankings of the variants on all tests. Our method, BISection Sampling, BISS, strategically retains the most critical tests and applies a novel divide-and-conquer approach to efficiently sample among relevant remaining tests. We experiment with datasets and use cases from LLM leaderboards, SAT competitions, and configurable systems for performance modeling. Our results show that our method outperforms baselines even when operating on a subset of variants. Using BISS, we reduce the computational cost of the benchmarks on average to 44% and on more than half the benchmarks by up to 99% without loss in ranking stability.
Problem

Research questions and friction points this paper is trying to address.

Reducing benchmark costs for software variants
Maintaining stable rankings with fewer tests
Optimizing test suites to minimize computational resources
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test suite optimization reduces benchmark instances
BISS method strategically retains critical tests
Divide-and-conquer approach efficiently samples relevant tests
🔎 Similar Papers
No similar papers found.