A Fast, Reliable, and Secure Programming Language for LLM Agents with Code Actions

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) deployed as autonomous agents frequently rely on Python-based tool-calling code, yet suffer from performance bottlenecks, security vulnerabilities, and insufficient reliability—hindering real-world deployment. To address these challenges, we propose Quasar, a lightweight domain-specific language explicitly designed for LLM agent code actions. Quasar introduces three novel, tightly integrated capabilities: automatic parallelization, uncertainty quantification (to mitigate hallucination), and user-verifiable safety policies. It supports transparent transpilation from a safe Python subset and ensures trustworthiness via static analysis, sandboxed execution, and conformal prediction. Evaluated on the GQA benchmark using the ViperGPT agent, Quasar reduces execution time by 42%, decreases user approval interactions by 52%, and achieves the target reliability coverage—demonstrating significant gains in efficiency, safety, and dependability for LLM-driven automation.

Technology Category

Application Category

📝 Abstract
Modern large language models (LLMs) are often deployed as agents, calling external tools adaptively to solve tasks. Rather than directly calling tools, it can be more effective for LLMs to write code to perform the tool calls, enabling them to automatically generate complex control flow such as conditionals and loops. Such code actions are typically provided as Python code, since LLMs are quite proficient at it; however, Python may not be the ideal language due to limited built-in support for performance, security, and reliability. We propose a novel programming language for code actions, called Quasar, which has several benefits: (1) automated parallelization to improve performance, (2) uncertainty quantification to improve reliability and mitigate hallucinations, and (3) security features enabling the user to validate actions. LLMs can write code in a subset of Python, which is automatically transpiled to Quasar. We evaluate our approach on the ViperGPT visual question answering agent, applied to the GQA dataset, demonstrating that LLMs with Quasar actions instead of Python actions retain strong performance, while reducing execution time when possible by 42%, improving security by reducing user approval interactions when possible by 52%, and improving reliability by applying conformal prediction to achieve a desired target coverage level.
Problem

Research questions and friction points this paper is trying to address.

Designing a secure, efficient language for LLM-generated code actions
Improving performance and reliability in LLM agent tool interactions
Reducing execution time and enhancing security in code execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated parallelization for performance enhancement
Uncertainty quantification for reliability improvement
Security features for user action validation
🔎 Similar Papers
No similar papers found.
S
Stephen Mell
University of Pennsylvania
B
Botong Zhang
University of Pennsylvania
D
David Mell
Unaffiliated
S
Shuo Li
University of Pennsylvania
R
Ramya Ramalingam
University of Pennsylvania
N
Nathan Yu
Unaffiliated
Steve Zdancewic
Steve Zdancewic
Professor of Computer and Information Science, University of Pennsylvania
Programming LanguagesType TheoryLogicComputer Security
Osbert Bastani
Osbert Bastani
University of Pennsylvania
Machine LearningArtificial IntelligenceProgramming LanguagesSecurity