🤖 AI Summary
In graph query optimization, traditional cardinality estimation methods suffer severe inaccuracy on large queries due to many-to-many joins and complex correlations among graph patterns. This paper proposes COLOR, the first framework that leverages graph coloring and graph compression theory for subgraph cardinality estimation, constructing compact summaries that capture global topological structure. COLOR introduces scalable summary representations, heuristic pruning strategies, and approximate counting algorithms—enabling low-latency estimation, minimal memory footprint, fast summary construction, and graceful degradation under dynamic data updates. Experimental evaluation demonstrates that COLOR achieves up to 1,000× higher estimation accuracy than state-of-the-art methods, significantly enhancing the robustness and scalability of graph query optimizers.
📝 Abstract
Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality estimation methods which generally see catastrophic errors when estimating the size of queries with only a handful of joins. To overcome this, we propose COLOR, a framework for subgraph cardinality estimation which applies insights from graph compression theory to produce a compact summary that captures the global topology of the data graph. Further, we identify several key optimizations that enable tractable estimation over this summary even for large query graphs. We then evaluate several designs within this framework and find that they improve accuracy by up to 10
3
× over all competing methods while maintaining fast inference, a small memory footprint, efficient construction, and graceful degradation under updates.