🤖 AI Summary
Standard $k^2$-trees suffer from redundant subtree storage, poor cache locality, and high navigation overhead due to breadth-first traversal when compressing sparse binary matrices. To address these issues, this paper introduces the first depth-first representation of the $k^2$-tree—termed DF-$k^2$-tree. Our method employs depth-first encoding, hash-based detection of subtree isomorphism, compact bitmap serialization, and cache-aware access patterns to achieve linear-time duplicate-subtree identification and superior compression ratios. Evaluated on web-graph adjacency matrices, DF-$k^2$-tree significantly outperforms the standard $k^2$-tree in compression ratio and accelerates sparse matrix multiplication across diverse real-world sparse matrices. The core innovation lies in shifting the traversal paradigm from breadth-first to depth-first—thereby simultaneously enhancing space efficiency, navigation speed, and cache locality—without compromising structural integrity or query support.
📝 Abstract
The $k^2$-tree is a compact data structure designed to efficiently store sparse binary matrices by leveraging both sparsity and clustering of nonzero elements. This representation supports efficiently navigational operations and complex binary operations, such as matrix-matrix multiplication, while maintaining space efficiency. The standard $k^2$-tree follows a level-by-level representation, which, while effective, prevents further compression of identical subtrees and it si not cache friendly when accessing individual subtrees. In this work, we introduce some novel depth-first representations of the $k^2$-tree and propose an efficient linear-time algorithm to identify and compress identical subtrees within these structures. Our experimental results show that the use of a depth-first representations is a strategy worth pursuing: for the adjacency matrix of web graphs exploiting the presence of identical subtrees does improve the compression ratio, and for some matrices depth-first representations turns out to be faster than the standard $k^2$-tree in computing the matrix-matrix multiplication.