🤖 AI Summary
Machine learning (ML) assets—including models, datasets, and metadata—suffer from fragmented documentation, isolated storage, inconsistent licensing, and inadequate discovery mechanisms, severely hindering reuse and management efficiency. To address these challenges, this paper introduces the first unified, lifecycle-aware classification system and systematic management framework for ML assets. It integrates asset cataloging, structured metadata modeling, lineage tracking, and semantic retrieval to explicitly tackle three core system-level challenges: scalability, traceability, and cross-domain unified indexing. The framework is realized through an open-source toolchain and validated via real-world system demonstrations, enabling license-aware, cross-domain asset discovery and compliant reuse. Our contribution provides researchers and practitioners with a deployable management paradigm and practical tools, advancing ML asset governance from ad hoc, experience-driven practices toward rigorous, engineering-based stewardship.
📝 Abstract
Machine learning (ML) assets, such as models, datasets, and metadata, are central to modern ML workflows. Despite their explosive growth in practice, these assets are often underutilized due to fragmented documentation, siloed storage, inconsistent licensing, and lack of unified discovery mechanisms, making ML-asset management an urgent challenge. This tutorial offers a comprehensive overview of ML-asset management activities across its lifecycle, including curation, discovery, and utilization. We provide a categorization of ML assets, and major management issues, survey state-of-the-art techniques, and identify emerging opportunities at each stage. We further highlight system-level challenges related to scalability, lineage, and unified indexing. Through live demonstrations of systems, this tutorial equips both researchers and practitioners with actionable insights and practical tools for advancing ML-asset management in real-world and domain-specific settings.