🤖 AI Summary
To address the scarcity of large-scale real-world sparse tensor datasets and the high computational cost of feature extraction in sparse tensor research, this paper proposes GenTensor—the first synthetic framework that jointly optimizes structural fidelity and generation efficiency. GenTensor leverages statistical modeling and graph-structured sampling to generate sparse tensors that faithfully reproduce critical structural properties of real data, including nonzero patterns and rank distributions. We further design FeaTensor, a lightweight yet comprehensive feature extraction pipeline integrating cache-aware traversal and bitmap compression, substantially reducing computational overhead. The complete toolchain is open-sourced. Experimental results demonstrate that generated tensors closely match real data across core structural characteristics; FeaTensor accelerates feature extraction by 3.2–8.7×; and the framework enables storage format and decomposition algorithm selection with sub-1.5% error relative to ground truth.
📝 Abstract
Sparse tensor operations are increasingly important in diverse applications such as social networks, deep learning, diagnosis, crime, and review analysis. However, a major obstacle in sparse tensor research is the lack of large-scale sparse tensor datasets. Another challenge lies in analyzing sparse tensor features, which are essential not only for understanding the nonzero pattern but also for selecting the most suitable storage format, decomposition algorithm, and reordering methods. However, due to the large size of real-world tensors, even extracting these features can be computationally expensive without careful optimization. To address these limitations, we have developed a smart sparse tensor generator that replicates key characteristics of real sparse tensors. Additionally, we propose efficient methods for extracting a comprehensive set of sparse tensor features. The effectiveness of our generator is validated through the quality of extracted features and the performance of decomposition on the generated tensors. Both the sparse tensor feature extractor and the tensor generator are open source with all the artifacts available at https://github.com/sparcityeu/FeaTensor and https://github.com/sparcityeu/GenTensor, respectively.