Dynamic Algorithm for Explainable k-medians Clustering under lp Norm

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the interpretable k-median clustering problem under arbitrary ℓₚ norms (p ≥ 1). We propose the first threshold decision tree algorithm for this setting: it constructs a tree of depth O(log k) via univariate threshold splits, partitioning data into k clusters while explicitly revealing each sample’s clustering path. Our method achieves an O(p(log k)^{1+1/p−1/p²}) approximation ratio—the first such guarantee for general ℓₚ norms—and strictly improves upon prior results for p = 2. It supports dynamic point insertions and deletions with O(d log³k) amortized update time and O(log k) reconfiguration cost. The core innovation lies in rigorously unifying interpretability—enforced by the tree structure—with ℓₚ-norm k-median optimization, delivering both strong theoretical guarantees and practical deployability for large-scale, dynamic settings.

Technology Category

Application Category

📝 Abstract
We study the problem of explainable k-medians clustering introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (2020). In this problem, the goal is to construct a threshold decision tree that partitions data into k clusters while minimizing the k-medians objective. These trees are interpretable because each internal node makes a simple decision by thresholding a single feature, allowing users to trace and understand how each point is assigned to a cluster. We present the first algorithm for explainable k-medians under lp norm for every finite p >= 1. Our algorithm achieves an O(p(log k)^{1 + 1/p - 1/p^2}) approximation to the optimal k-medians cost for any p >= 1. Previously, algorithms were known only for p = 1 and p = 2. For p = 2, our algorithm improves upon the existing bound of O(log^{3/2}k), and for p = 1, it matches the tight bound of log k + O(1) up to a multiplicative O(log log k) factor. We show how to implement our algorithm in a dynamic setting. The dynamic algorithm maintains an explainable clustering under a sequence of insertions and deletions, with amortized update time O(d log^3 k) and O(log k) recourse, making it suitable for large-scale and evolving datasets.
Problem

Research questions and friction points this paper is trying to address.

Develops an algorithm for explainable k-medians clustering under lp norms.
Provides dynamic updates for clustering with insertions and deletions.
Improves approximation bounds for clustering under lp norms.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic algorithm for explainable k-medians clustering
Threshold decision tree for interpretable cluster assignments
Approximation under lp norm for any finite p
🔎 Similar Papers
No similar papers found.