🤖 AI Summary
This work addresses the lack of dedicated benchmarks for the Java Stream API, which hinders performance optimization and efficient programming guidance. It presents the first automatically generated benchmark suite tailored to the Stream API by systematically translating SQL queries into functionally equivalent declarative (Stream-based) and imperative Java implementations. This approach enables a comprehensive evaluation of diverse coding styles and parallelization strategies, using imperative code as a performance baseline. The methodology supports a wide range of Stream usage patterns and reveals inefficient code constructs alongside their optimized alternatives. The empirical findings not only offer actionable best practices for developers leveraging the Stream API but also provide evidence-based insights to inform future enhancements to the Java standard library’s performance characteristics.
📝 Abstract
The Java Stream API aims at increasing developer productivity thanks to an easy-to-read declarative syntax to express computations. It also simplifies parallel computing, providing a high-level abstraction on top of common parallelization aspects. Unfortunately, there is a lack of benchmarks specifically targeting stream-based applications. Such a lack of benchmarks makes it difficult for researchers and developers of the Java class library to optimize the Stream API. Moreover, in the absence of dedicated benchmarks, it is difficult to analyze the performance of streams to suggest developers how to write efficient code using the API.
In this work we present JEDI, a benchmark suite that targets the Stream API. JEDI is automatically generated by converting SQL benchmarks into Java benchmarks. Our code generator supports targets different implementations (both stream-based and imperative) for the same query. The ultimate goal of our benchmark suite -- and the main contribution of this work -- is to analyze the performance of the different implementations to spot inefficient code structures and better alternatives, suggesting best practices to Java developers. Among the multiple implementations we generate, we focus on different parallelization strategies and explain the most efficient parallelization strategies based on characteristics of the processed data. Finally, the code generation producing imperative code defines of a baseline that can guide researchers and Java implementers to optimize the Stream API.