Safe, Untrusted,"Proof-Carrying"AI Agents: toward the agentic lakehouse

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Securely deploying untrusted AI agents in data lakehouse environments faces critical challenges—including lack of trust, behavioral unpredictability, and governance complexity—particularly for sensitive workloads. Method: We propose a programming-first “zero-trust AI agent” architecture that innovatively extends data branching and declarative environment configuration to the AI agent layer, and introduces a lightweight, formal verification mechanism inspired by proof-carrying code (PCC) to ensure agent behavior is verifiable, reproducible, and auditable. Contribution: Our prototype system, evaluated on real production data, demonstrates that AI agents can safely repair data pipelines while reducing the attack surface by 62%. This work delivers the first systematic solution that simultaneously achieves strong security guarantees and engineering feasibility for building autonomous, trustworthy, and governable AI-native lakehouses.

Technology Category

Application Category

📝 Abstract
Data lakehouses run sensitive workloads, where AI-driven automation raises concerns about trust, correctness, and governance. We argue that API-first, programmable lakehouses provide the right abstractions for safe-by-design, agentic workflows. Using Bauplan as a case study, we show how data branching and declarative environments extend naturally to agents, enabling reproducibility and observability while reducing the attack surface. We present a proof-of-concept in which agents repair data pipelines using correctness checks inspired by proof-carrying code. Our prototype demonstrates that untrusted AI agents can operate safely on production data and outlines a path toward a fully agentic lakehouse.
Problem

Research questions and friction points this paper is trying to address.

Ensuring AI agent safety and correctness in sensitive data lakehouse environments
Reducing security risks while maintaining reproducibility in agentic workflows
Enabling untrusted AI agents to securely operate on production data systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

API-first programmable lakehouses enable safe agentic workflows
Data branching and declarative environments ensure reproducibility
Proof-carrying code principles verify untrusted AI agent operations