Cortex AISQL: A Production SQL Engine for Unstructured Data

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of efficiently supporting semantic operations—such as LLM inference and multi-label classification—within SQL engines to enable unified processing of structured and unstructured data. To this end, we propose the first production-grade SQL engine natively integrating semantic operations. Our approach introduces three key innovations: (1) AI-aware query optimization, which explicitly models AI inference cost as an optimization objective for the first time; (2) adaptive model cascading, dynamically balancing accuracy and latency by orchestrating proxy and oracle models; and (3) semantic join query rewriting, enabling declarative cross-modal querying. The system is designed using dynamic cost estimation and real-world workload characterization. Deployed in Snowflake’s production environment, it achieves 2–70× query speedup, 90–95% prediction accuracy, significantly reduced inference costs, and robust support for diverse customer workloads.

Technology Category

Application Category

📝 Abstract
Snowflake's Cortex AISQL is a production SQL engine that integrates native semantic operations directly into SQL. This integration allows users to write declarative queries that combine relational operations with semantic reasoning, enabling them to query both structured and unstructured data effortlessly. However, making semantic operations efficient at production scale poses fundamental challenges. Semantic operations are more expensive than traditional SQL operations, possess distinct latency and throughput characteristics, and their cost and selectivity are unknown during query compilation. Furthermore, existing query engines are not designed to optimize semantic operations. The AISQL query execution engine addresses these challenges through three novel techniques informed by production deployment data from Snowflake customers. First, AI-aware query optimization treats AI inference cost as a first-class optimization objective, reasoning about large language model (LLM) cost directly during query planning to achieve 2-8$ imes$ speedups. Second, adaptive model cascades reduce inference costs by routing most rows through a fast proxy model while escalating uncertain cases to a powerful oracle model, achieving 2-6$ imes$ speedups while maintaining 90-95% of oracle model quality. Third, semantic join query rewriting lowers the quadratic time complexity of join operations to linear through reformulation as multi-label classification tasks, achieving 15-70$ imes$ speedups with often improved prediction quality. AISQL is deployed in production at Snowflake, where it powers diverse customer workloads across analytics, search, and content understanding.
Problem

Research questions and friction points this paper is trying to address.

Efficiently integrating semantic operations into SQL for unstructured data
Optimizing expensive AI inference costs during query compilation
Reducing quadratic complexity of semantic joins to linear time
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-aware query optimization treats LLM cost as first-class objective
Adaptive model cascades route rows through fast proxy and oracle models
Semantic join rewriting transforms quadratic joins into linear classification
🔎 Similar Papers
No similar papers found.
P
Paritosh Aggarwal
Snowflake Inc.
Bowei Chen
Bowei Chen
University of Glasgow
Data ScienceMachine LearningQuantitative MarketingFintechBusiness Analytics
Anupam Datta
Anupam Datta
Snowflake AI Research, Ex-Professor CMU
Trustworthy AIPrivacySecurity
Benjamin Han
Benjamin Han
Apple
Agentic AIKnowledge GraphsReasoningNatural LanguageMachine Learning
B
Boxin Jiang
Snowflake Inc.
N
Nitish Jindal
Snowflake Inc.
Zihan Li
Zihan Li
University of Washington
Foundation ModelAI for HealthcareMultimodal Learning
A
Aaron Lin
Snowflake Inc.
P
Pawel Liskowski
Snowflake Inc.
J
Jay Tayade
Snowflake Inc.
D
Dimitris Tsirogiannis
Snowflake Inc.
N
Nathan Wiegand
Snowflake Inc.
W
Weicheng Zhao
Snowflake Inc.