A Hybrid Heuristic Framework for Resource-Efficient Querying of Scientific Experiments Data

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address uneven resource allocation, high overhead from full-data loading, and low efficiency due to repeated parsing in scientific experimental data querying, this paper proposes RAW-HF—a resource- and workload-aware lightweight hybrid framework. Its core contributions are: (1) a novel hybrid heuristic scheduling mechanism jointly modeling resource availability and query workload; and (2) the MUAR data access strategy, which balances parsing cost and cache locality to enable on-demand partial data loading. RAW-HF integrates heuristic scheduling, workload characterization, and resource-constrained modeling—avoiding reliance on full-data ingestion or machine learning models. Evaluated on the SDSS and LOD datasets, RAW-HF reduces query execution time by 90% and 85%, respectively. Compared to WA, it achieves an average 26% reduction in CPU and I/O overhead and execution time, while improving memory utilization by 33%.

Technology Category

Application Category

📝 Abstract
Scientific experiments and modern applications are generating large amounts of data every day. Most organizations utilize In-house servers or Cloud resources to manage application data and workload. The traditional database management system (DBMS) and HTAP systems spend significant time&resources to load the entire dataset into DBMS before starting query execution. On the other hand, in-situ engines may reparse required data multiple times, increasing resource utilization and data processing costs. Additionally, over or under-allocation of resources also increases application running costs. This paper proposes a lightweight Resource Availability&Workload aware Hybrid Framework (RAW-HF) to optimize querying raw data by utilizing existing finite resources efficiently. RAW-HF includes modules that help optimize the resources required to execute a given workload and maximize the utilization of existing resources. The impact of applying RAW-HF to real-world scientific dataset workloads like Sloan Digital Sky Survey (SDSS) and Linked Observation Data (LOD) presented over 90% and 85% reduction in workload execution time (WET) compared to widely used traditional DBMS PostgreSQL. The overall CPU, IO resource utilization, and WET have been reduced by 26%, 25%, and 26%, respectively, while improving memory utilization by 33%, compared to the state-of-the-art workload-aware partial loading technique (WA) proposed for hybrid systems. A comparison of MUAR technique used by RAW-HF with machine learning based resource allocation techniques like PCC is also presented.
Problem

Research questions and friction points this paper is trying to address.

Optimize querying raw data with finite resources efficiently
Reduce workload execution time and resource utilization
Improve memory and CPU usage in hybrid systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework optimizes raw data querying
Efficiently utilizes finite existing resources
Reduces workload execution time significantly
🔎 Similar Papers
No similar papers found.
Mayank Patel
Mayank Patel
PhD Student, Purdue University
Deep LearningComputational DesignHuman-Computer Interaction
M
Minal Bhise
Distributed Databases Research Group, Dhirubhai Ambani University