Online hierarchical partitioning of the output space in extreme multi-label data stream

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Extreme multi-label learning over data streams faces challenges including high-dimensional, sparse, and dynamically evolving label spaces, compounded by concept drift affecting label correlations and imbalance. Method: We propose iHOMER—a novel framework that incrementally constructs an online hierarchical clustering of the label space without requiring a predefined hierarchy; integrates a global–local dual-granularity concept drift detection mechanism to adaptively reconfigure label partitions and model architecture; and employs Jaccard-similarity-driven online split-and-merge clustering, coupled with multivariate Bernoulli tree-based learners for efficient label grouping and instance assignment. Contribution/Results: Extensive experiments on 23 real-world datasets demonstrate that iHOMER achieves an average 23% improvement over five global baselines and outperforms twelve local baselines by 32%, significantly advancing scalable and robust extreme multi-label stream learning under dynamic environments.

Technology Category

Application Category

📝 Abstract
Mining data streams with multi-label outputs poses significant challenges due to evolving distributions, high-dimensional label spaces, sparse label occurrences, and complex label dependencies. Moreover, concept drift affects not only input distributions but also label correlations and imbalance ratios over time, complicating model adaptation. To address these challenges, structured learners are categorized into local and global methods. Local methods break down the task into simpler components, while global methods adapt the algorithm to the full output space, potentially yielding better predictions by exploiting label correlations. This work introduces iHOMER (Incremental Hierarchy Of Multi-label Classifiers), an online multi-label learning framework that incrementally partitions the label space into disjoint, correlated clusters without relying on predefined hierarchies. iHOMER leverages online divisive-agglomerative clustering based on extit{Jaccard} similarity and a global tree-based learner driven by a multivariate extit{Bernoulli} process to guide instance partitioning. To address non-stationarity, it integrates drift detection mechanisms at both global and local levels, enabling dynamic restructuring of label partitions and subtrees. Experiments across 23 real-world datasets show iHOMER outperforms 5 state-of-the-art global baselines, such as MLHAT, MLHT of Pruned Sets and iSOUPT, by 23%, and 12 local baselines, such as binary relevance transformations of kNN, EFDT, ARF, and ADWIN bagging/boosting ensembles, by 32%, establishing its robustness for online multi-label classification.
Problem

Research questions and friction points this paper is trying to address.

Handles evolving distributions in multi-label data streams
Addresses high-dimensional sparse label dependencies dynamically
Detects concept drift at global and local levels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online divisive-agglomerative clustering for label space
Global tree-based learner with multivariate Bernoulli process
Drift detection at global and local levels
🔎 Similar Papers
L
Lara Neves
GECAD, ISEP, Polytechnic of Porto, Rua Dr. António Bernardino de Almeida, Porto, 4249-015, Portugal
A
Afonso Lourenço
GECAD, ISEP, Polytechnic of Porto, Rua Dr. António Bernardino de Almeida, Porto, 4249-015, Portugal
Alberto Cano
Alberto Cano
Associate Vice President for Research Computing, Virginia Tech, USA
Machine LearningData Stream MiningConcept DriftMulti-label learningGPU
Goreti Marreiros
Goreti Marreiros
Full Professor, GECAD/ISEP/Polytechnic of Porto