PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the high end-to-end latency in composite AI applications caused by frequent invocations of external machine learning models, a challenge poorly mitigated by conventional Python optimization techniques due to language dynamism, dynamic dispatch, and abrupt variable mutations. To overcome these limitations, the authors propose an automatic parallelization framework that synergistically combines static compilation with runtime coordination. The framework supports a highly expressive subset of Python while preserving semantic correctness and requires minimal manual intervention to uncover parallelism inherent in the program. It is particularly effective in scenarios involving heavyweight external components, achieving up to 6.4× end-to-end speedup on real-world composite AI applications and substantially reducing execution latency.
📝 Abstract
Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their end-to-end latency a critical bottleneck. In contrast to traditional applications, execution time is dominated by the external components, which cannot be handled by traditional language optimization systems, like optimizing compilers. To address this problem, we develop PopPy, a system that can uncover parallelization opportunities in Python applications that invoke these heavy external components, including those used in compound AI applications. PopPy supports a very expressive fragment of Python and requires minimal developer input to uncover parallelism. It combines an ahead-of-time compiler with a runtime, addressing three key challenges in extracting parallelism from Python applications: language complexity, dynamic dispatch, and variable mutation. On a set of real-world compound AI applications, PopPy achieves up to $6.4\times$ speedups in end-to-end execution time compared to standard Python execution while preserving the sequential program semantics.
Problem

Research questions and friction points this paper is trying to address.

Compound AI Applications
End-to-end Latency
Parallelism
Python
External Components
Innovation

Methods, ideas, or system contributions that make the work stand out.

parallelization
compound AI applications
Python optimization
ahead-of-time compilation
dynamic dispatch