🤖 AI Summary
Modern parallel libraries, when composed concurrently, suffer from resource contention—particularly for CPU and memory—due to the absence of cross-library resource coordination mechanisms, leading to degraded performance and thread-unsafe behavior; existing solutions require intrusive modifications to either library source code or the OS kernel, limiting practicality. This paper introduces Virtual Library Contexts (VLCs), a lightweight in-process runtime abstraction enabling library-level resource isolation and fine-grained concurrency control. VLCs require no changes to library implementations or the operating system, supporting safe coexistence of multiple instances of the same library—including thread-unsafe ones—within a single process. A prototype implementation in C++/Python integrates seamlessly with mainstream libraries such as OpenMP, OpenBLAS, and LibTorch. Experimental evaluation across multiple benchmarks demonstrates up to 2.85× speedup over conventional concurrency approaches, significantly improving both performance and safety. VLCs thus establish a zero-overhead, non-intrusive paradigm for composing heterogeneous parallel libraries in complex applications.
📝 Abstract
As the complexity and scale of modern parallel machines continue to grow, programmers increasingly rely on composition of software libraries to encapsulate and exploit parallelism. However, many libraries are not designed with composition in mind and assume they have exclusive access to all resources. Using such libraries concurrently can result in contention and degraded performance. Prior solutions involve modifying the libraries or the OS, which is often infeasible.
We propose Virtual Library Contexts (VLCs), which are process subunits that encapsulate sets of libraries and associated resource allocations. VLCs control the resource utilization of these libraries without modifying library code. This enables the user to partition resources between libraries to prevent contention, or load multiple copies of the same library to allow parallel execution of otherwise thread-unsafe code within the same process.
In this paper, we describe and evaluate C++ and Python prototypes of VLCs. Experiments show VLCs enable a speedup up to 2.85x on benchmarks including applications using OpenMP, OpenBLAS, and LibTorch.