🤖 AI Summary
Existing Snowpark Execution Environments (SEEs) lack fine-grained, high-assurance sandbox isolation required by modern data engineering and AI/ML workloads. To address this, we propose a novel secure execution environment built upon gVisor. Our approach integrates a customized gVisor sandbox into Snowflake’s virtual warehouse nodes, combining lightweight virtualization, fine-grained resource governance, and kernel-level security hardening to enable safe, high-performance execution of multi-language runtimes—including Python. Compared to the native SEE, our architecture significantly strengthens isolation guarantees while improving runtime performance, scalability, and operational maintainability. Experimental evaluation and real-world deployment cases demonstrate that the design achieves strong security assurance without compromising compatibility with existing Snowpark APIs and workloads. This provides a robust, flexible, and production-ready execution foundation for next-generation Snowpark applications involving complex, heterogeneous, and security-sensitive data and ML pipelines.
📝 Abstract
Snowpark enables Data Engineering and AI/ML workloads to run directly within Snowflake by deploying a secure sandbox on virtual warehouse nodes. This Snowpark Execution Environment (SEE) allows users to execute arbitrary workloads in Python and other languages in a secure and performant manner. As adoption has grown, the diversity of workloads has introduced increasingly sophisticated needs for sandboxing. To address these evolving requirements, Snowpark transitioned its in-house sandboxing solution to gVisor, augmented with targeted optimizations. This paper describes both the functional and performance objectives that guided the upgrade, outlines the new sandbox architecture, and details the challenges encountered during the journey, along with the solutions developed to resolve them. Finally, we present case studies that highlight new features enabled by the upgraded architecture, demonstrating SEE's extensibility and flexibility in supporting the next generation of Snowpark workloads.