PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the high end-to-end latency in composite AI applications caused by frequent invocations of external machine learning models, a challenge poorly mitigated by conventional Python optimization techniques due to language dynamism, dynamic dispatch, and abrupt variable mutations. To overcome these limitations, the authors propose an automatic parallelization framework that synergistically combines static compilation with runtime coordination. The framework supports a highly expressive subset of Python while preserving semantic correctness and requires minimal manual intervention to uncover parallelism inherent in the program. It is particularly effective in scenarios involving heavyweight external components, achieving up to 6.4× end-to-end speedup on real-world composite AI applications and substantially reducing execution latency.

📝 Abstract

Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their end-to-end latency a critical bottleneck. In contrast to traditional applications, execution time is dominated by the external components, which cannot be handled by traditional language optimization systems, like optimizing compilers. To address this problem, we develop PopPy, a system that can uncover parallelization opportunities in Python applications that invoke these heavy external components, including those used in compound AI applications. PopPy supports a very expressive fragment of Python and requires minimal developer input to uncover parallelism. It combines an ahead-of-time compiler with a runtime, addressing three key challenges in extracting parallelism from Python applications: language complexity, dynamic dispatch, and variable mutation. On a set of real-world compound AI applications, PopPy achieves up to $6.4\times$ speedups in end-to-end execution time compared to standard Python execution while preserving the sequential program semantics.

Problem

Research questions and friction points this paper is trying to address.

Compound AI Applications

End-to-end Latency

Parallelism

Python

External Components

Innovation

Methods, ideas, or system contributions that make the work stand out.

parallelization

compound AI applications

Python optimization