Scaling Tractable Probabilistic Circuits: A Systems Perspective

📅 2024-06-02
🏛️ International Conference on Machine Learning
📈 Citations: 13
Influential: 1
📄 PDF

career value

232K/year
🤖 AI Summary
Existing probabilistic circuit (PC) systems suffer from significant bottlenecks in GPU time and memory efficiency, hindering their large-scale deployment. This paper introduces PyJuice—the first PC compilation framework optimized for NVIDIA Tensor Cores—addressing these limitations via GPU-accelerated compilation, block-level parallelization, Tensor Core–aware tensor operations, compact graph representations, and differentiable structure learning. Our approach achieves 10–100× faster training, 10–100× speedup in inference, and 2–5× reduction in GPU memory consumption. PyJuice is the first to demonstrate the feasibility of ultra-large-scale PCs on benchmarks including ImageNet32 and WikiText, establishing new state-of-the-art results across both vision and language modeling tasks. It thereby sets a foundational, scalable baseline for next-generation large PC models.

Technology Category

Application Category

📝 Abstract
Probabilistic Circuits (PCs) are a general framework for tractable deep generative models, which support exact and efficient probabilistic inference on their learned distributions. Recent modeling and training advancements have enabled their application to complex real-world tasks. However, the time and memory inefficiency of existing PC implementations hinders further scaling up. This paper proposes PyJuice, a general GPU implementation design for PCs that improves prior art in several regards. Specifically, PyJuice is 1-2 orders of magnitude faster than existing systems (including very recent ones) at training large-scale PCs. Moreover, PyJuice consumes 2-5x less GPU memory, which enables us to train larger models. At the core of our system is a compilation process that converts a PC into a compact representation amenable to efficient block-based parallelization, which significantly reduces IO and makes it possible to leverage Tensor Cores available in modern GPUs. Empirically, PyJuice can be used to improve state-of-the-art PCs trained on image (e.g., ImageNet32) and language (e.g., WikiText, CommonGen) datasets. We further establish a new set of baselines on natural image and language datasets by benchmarking existing PC structures but with much larger sizes and more training epochs, with the hope of incentivizing future research. Code is available at https://github.com/Tractables/pyjuice.
Problem

Research questions and friction points this paper is trying to address.

Improving time efficiency of probabilistic circuit implementations
Reducing GPU memory consumption for larger PC models
Enabling scalable training on image and language datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU implementation design for probabilistic circuits
Compact representation enabling block-based parallelization
Leveraging Tensor Cores in modern GPUs
🔎 Similar Papers
No similar papers found.