AstraAI: LLMs, Retrieval, and AST-Guided Assistance for HPC Codebases

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of automatically generating high-performance computing (HPC) code that simultaneously preserves contextual awareness and structural consistency within complex codebases. To this end, the authors propose an in-terminal CLI framework that integrates large language models (LLMs), retrieval-augmented generation (RAG), and abstract syntax tree (AST) analysis. Their key innovation lies in embedding AST-derived structural information into the RAG retrieval process to construct high-fidelity prompts, thereby enabling structure-preserving, precise code generation. Experimental evaluation on representative HPC projects such as AMReX demonstrates that the proposed approach effectively produces high-quality code adhering to existing structural conventions and programming paradigms, significantly enhancing both development efficiency and code consistency.
📝 Abstract
We present AstraAI, a command-line interface (CLI) coding framework for high-performance computing (HPC) software development. AstraAI operates directly within a Linux terminal and integrates large language models (LLMs) with Retrieval-Augmented Generation (RAG) and Abstract Syntax Tree (AST)-based structural analysis to enable context-aware code generation for complex scientific codebases. The central idea is to construct a high-fidelity prompt that is passed to the LLM for inference. This prompt augments the user request with relevant code snippets retrieved from the underlying framework codebase via RAG and structural context extracted from AST analysis, providing the model with precise information about relevant functions, data structures, and overall code organization. The framework is designed to perform scoped modifications to source code while preserving structural consistency with the surrounding code. AstraAI supports both locally hosted models from Hugging Face and API-based frontier models accessible via the American Science Cloud, enabling flexible deployment across HPC environments. The system generates code that aligns with existing project structures and programming patterns. We demonstrate AstraAI on representative HPC code generation tasks within AMReX, a DOE-supported HPC software infrastructure for exascale applications.
Problem

Research questions and friction points this paper is trying to address.

HPC code generation
context-aware code generation
codebase consistency
scientific software development
structured code modification
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM
Retrieval-Augmented Generation
Abstract Syntax Tree
HPC code generation
context-aware code assistance
🔎 Similar Papers
No similar papers found.