OASIS: Object-based Analytics Storage for Intelligent SQL Query Offloading in Scientific Tabular Workloads

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Computational Object Storage (COS) systems face three key bottlenecks when performing large-scale scientific tabular data SQL analytics in HPC environments: rigid output formats, limited operator pushdown capability, and inadequate adaptation to deep storage hierarchies. To address these, we propose COS-SQL—a near-data SQL analytics framework tailored for HPC. Our approach features: (1) flexible output format support—including Arrow columnar layout; (2) full-stage pushdown of complex operators and array expressions; and (3) dynamic execution path selection based on hierarchical storage structure. COS-SQL adopts object-level storage organization and tightly integrates with Apache Spark. Evaluated on real-world HPC workloads, it achieves up to 32.7% end-to-end performance improvement over state-of-the-art COS systems, significantly enhancing both analytical flexibility and execution efficiency.

Technology Category

Application Category

📝 Abstract
Computation-Enabled Object Storage (COS) systems, such as MinIO and Ceph, have recently emerged as promising storage solutions for post hoc, SQL-based analysis on large-scale datasets in High-Performance Computing (HPC) environments. By supporting object-granular layouts, COS facilitates column-oriented access and supports in-storage execution of data reduction operators, such as filters, close to where the data resides. Despite growing interest and adoption, existing COS systems exhibit several fundamental limitations that hinder their effectiveness. First, they impose rigid constraints on output data formats, limiting flexibility and interoperability. Second, they support offloading for only a narrow set of operators and expressions, restricting their applicability to more complex analytical tasks. Third--and perhaps most critically--they fail to incorporate design strategies that enable compute offloading optimized for the characteristics of deep storage hierarchies. To address these challenges, this paper proposes OASIS, a novel COS system that features: (i) flexible and interoperable output delivery through diverse formats, including columnar layouts such as Arrow; (ii) broad support for complex operators (e.g., aggregate, sort) and array-aware expressions, including element-wise predicates over array structures; and (iii) dynamic selection of optimal execution paths across internal storage layers, guided by operator characteristics and data movement costs. We implemented a prototype of OASIS and integrated it into the Spark analytics framework. Through extensive evaluation using real-world scientific queries from HPC workflows, OASIS achieves up to a 32.7% performance improvement over Spark configured with existing COS-based storage systems.
Problem

Research questions and friction points this paper is trying to address.

Overcomes rigid output format constraints in computation-enabled object storage
Expands operator support for complex analytical tasks in storage systems
Optimizes execution paths across deep storage hierarchies for efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flexible output formats including columnar layouts
Broad support for complex operators and expressions
Dynamic execution path selection across storage layers
🔎 Similar Papers
No similar papers found.
S
Soon Hwang
Sogang University, Seoul, Republic of Korea
J
Junhyeok Park
Sogang University, Seoul, Republic of Korea
J
Junghyun Ryu
Sogang University, Seoul, Republic of Korea
S
Seonghoon Ahn
Sogang University, Seoul, Republic of Korea
J
Jeoungahn Park
Memory Systems Research, SK hynix Inc.
Jeongjin Lee
Jeongjin Lee
The Ohio State University
StatisticsBiostatistics
S
Soonyeal Yang
Memory Systems Research, SK hynix Inc.
J
Jungki Noh
Memory Systems Research, SK hynix Inc.
W
Woosuk Chung
Memory Systems Research, SK hynix Inc.
H
Hoshik Kim
Memory Systems Research, SK hynix Inc.
Youngjae Kim
Youngjae Kim
Professor, Department of Computer Science and Engineering, Sogang University
Operating SystemFile and Storage SystemDistributed System