🤖 AI Summary
The rapid proliferation of large-scale AI models has imposed significant storage and distribution burdens on model repositories. This work proposes a tensor-centric, unsupervised deduplication and compression method that accurately identifies cross-model redundancies without requiring labeled data, leveraging tensor-level fingerprint extraction and clustering. By enabling fine-grained deduplication, the approach substantially reduces storage overhead while preserving model performance and usability. Experimental results demonstrate that the method achieves high compression efficiency on real-world model repositories with minimal runtime overhead.
📝 Abstract
Modern AI models are growing rapidly in size and redundancy, leading to significant storage and distribution challenges in model hubs. We present TensorHub, a tensor-centric system for reducing storage overhead through fine-grained deduplication and compression. TensorHub leverages tensor-level fingerprinting and clustering to identify redundancy across models without requiring annotations. Our design enables efficient storage reduction while preserving model usability and performance. Experiments on real-world model repositories demonstrate substantial storage savings with minimal overhead.