Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals

📅 2026-01-22

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses online non-centroid clustering under delayed assignment, where data points arrive sequentially and must be assigned to clusters with the option to defer decisions. Unlike traditional online approaches that commit immediately—often sacrificing clustering quality for low latency—this study introduces a model that balances intra-cluster distance against delay cost under a stochastic arrival setting. The authors present the first online algorithm achieving a constant competitive ratio in the random-order model with unknown data distributions, thereby overcoming the sub-logarithmic competitive barrier inherent in worst-case adversarial models. The proposed algorithm guarantees that its expected total cost remains within a constant factor of the optimal offline solution, marking a significant theoretical advance with practical implications for real-time clustering systems.

Technology Category

Application Category

📝 Abstract

Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering with delays, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need not be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to irrevocably assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a delay cost, instead of following the more common assumption of immediate decisions upon arrival. This poses a critical challenge: the goal is to minimize both the total distance costs between points in each cluster and the overall delay costs incurred by postponing assignments. In the classic worst-case arrival model, where points arrive in an arbitrary order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a stochastic arrival model, where points'locations are drawn independently across time from an unknown and fixed probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is constant competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant.

Problem

Research questions and friction points this paper is trying to address.

online clustering

delayed assignments

stochastic arrivals

non-centroid clustering

competitive ratio

Innovation

Methods, ideas, or system contributions that make the work stand out.

delayed assignment

online non-centroid clustering

stochastic arrivals