Feature Incremental Clustering with Generalization Bounds

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the dynamic clustering problem where the feature space incrementally expands over time, as commonly encountered in activity recognition. It establishes the first theoretical framework for feature-incremental clustering and proposes four k-means-based strategies—Feature Truncation (FT), Data Reconstruction (DR), Data Adaptation (DA), and Model Reuse (MR)—tailored to different data access scenarios. By integrating distribution discrepancy measures and model transfer techniques, the study derives generalization error bounds for each strategy, elucidating the impact of key factors such as training sample size and hypothesis space complexity on clustering performance. Extensive experiments on activity recognition tasks validate the effectiveness of the proposed methods and demonstrate a strong alignment between the theoretical error bounds and empirical results.

Technology Category

Application Category

📝 Abstract
In many learning systems, such as activity recognition systems, as new data collection methods continue to emerge in various dynamic environmental applications, the attributes of instances accumulate incrementally, with data being stored in gradually expanding feature spaces. How to design theoretically guaranteed algorithms to effectively cluster this special type of data stream, commonly referred to as activity recognition, remains unexplored. Compared to traditional scenarios, we will face at least two fundamental questions in this feature incremental scenario. (i) How to design preliminary and effective algorithms to address the feature incremental clustering problem? (ii) How to analyze the generalization bounds for the proposed algorithms and under what conditions do these algorithms provide a strong generalization guarantee? To address these problems, by tailoring the most common clustering algorithm, i.e., $k$-means, as an example, we propose four types of Feature Incremental Clustering (FIC) algorithms corresponding to different situations of data access: Feature Tailoring (FT), Data Reconstruction (DR), Data Adaptation (DA), and Model Reuse (MR), abbreviated as FIC-FT, FIC-DR, FIC-DA, and FIC-MR. Subsequently, we offer a detailed analysis of the generalization error bounds for these four algorithms and highlight the critical factors influencing these bounds, such as the amounts of training data, the complexity of the hypothesis space, the quality of pre-trained models, and the discrepancy of the reconstruction feature distribution. The numerical experiments show the effectiveness of the proposed algorithms, particularly in their application to activity recognition clustering tasks.
Problem

Research questions and friction points this paper is trying to address.

feature incremental clustering
activity recognition
data stream
generalization bounds
incremental learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature Incremental Clustering
Generalization Bounds
k-means
Activity Recognition
Incremental Learning
🔎 Similar Papers
No similar papers found.
J
Jing Zhang
Key Laboratory of Computing and Stochastic Mathematics (Ministry of Education), School of Mathematics and Statistics, Hunan Normal University, Changsha, Hunan 410081, P.R. China
Chenping Hou
Chenping Hou
National University of Defense Technology
Statistical data analysisData MiningMachine learning