STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of tuning large-scale parallel file systems, a process that is typically complex, costly, and heavily reliant on expert knowledge, thereby hindering efficient use by domain scientists. The paper presents the first autonomous tuning engine that integrates large language models (LLMs) with a multi-agent architecture. By analyzing application I/O logs, the system automatically identifies tunable parameters, iteratively executes real-world benchmarks, and employs a reflection mechanism to generate reusable tuning knowledge. The approach synergistically combines retrieval-augmented generation (RAG), tool invocation, and multi-agent collaborative reasoning to effectively mitigate hallucinations and enhance stability. Experimental results demonstrate that the system can identify near-optimal configurations for new applications in fewer than five trials, substantially outperforming conventional automated tuning methods that often require hundreds of thousands of iterations.

Technology Category

Application Category

📝 Abstract
I/O performance is crucial to efficiency in data-intensive scientific computing; but tuning large-scale storage systems is complex, costly, and notoriously manpower-intensive, making it inaccessible for most domain scientists. To address this problem, we propose STELLAR, an autonomous tuner for high-performance parallel file systems. Our evaluations show that STELLAR almost always selects near-optimal parameter configurations for parallel file systems within the first five attempts, even for previously unseen applications. STELLAR differs fundamentally from traditional autotuning methods, which often require hundreds of thousands of iterations to converge. Powered by large language models (LLMs), STELLAR enables autonomous end-to-end agentic tuning by (1) accurately extracting tunable parameters from software manuals, (2) analyzing I/O trace logs generated by applications, (3) selecting initial tuning strategies, (4) rerunning applications on real systems and collecting I/O performance feedback, (5) adjusting tuning strategies and repeating the tuning cycle, and (6) reflecting on and summarizing tuning experiences into reusable knowledge for future optimizations. STELLAR integrates retrieval-augmented generation (RAG), tool execution, LLM-based reasoning, and a multiagent design to stabilize reasoning and combat hallucinations. We evaluate the impact of each component on optimization outcomes, providing design insights for similar systems in other optimization domains. STELLAR's architecture and empirical results highlight a promising approach to complex system optimization, especially for problems with large search spaces and high exploration costs, while making I/O tuning more accessible to domain scientists with minimal added resources.
Problem

Research questions and friction points this paper is trying to address.

storage tuning
parallel file systems
I/O performance
autonomous optimization
large-scale systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based autotuning
parallel file systems
retrieval-augmented generation
multiagent reasoning
I/O performance optimization