🤖 AI Summary
This work addresses the challenge of inconsistent query results in agent-driven lakehouse architectures, where multiple data versions coexist across branches. To resolve this issue, the paper introduces hyper-assignment semantics—a novel approach tailored for multi-branch querying in lakehouse systems—and proposes a cross-branch query processing method that relies on a unified semantic model rather than any single snapshot. The authors design a query routing and branch management mechanism grounded in hyper-assignment semantics and implement an open-source prototype system. This implementation constitutes the first benchmark in the OLAP community to support semantic-aware querying over multiple data branches, thereby establishing a new paradigm for handling uncertainty arising from concurrent data versions.
📝 Abstract
Agentic analytics is turning the lakehouse into a multi-version system: swarms of (human or AI) producers materialize competing pipelines in data branches, while (human or AI) consumers need answers without knowing the underlying data life-cycle. We demonstrate a new system that answers questions across branches rather than at a single snapshot. Our prototype focuses on a novel query path that evaluates queries under supervaluationary semantics. In the absence of comparable multi-branch querying capabilities in mainstream OLAP systems, we open source the demo code as a concrete baseline for the OLAP community.