Clustering in Varying Metrics

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This paper studies the multi-metric agglomerative center clustering problem: given a set of $n$ points represented under $T$ distinct metrics, select $k$ centers to minimize the $Psi$-norm (e.g., $ell_1$, $ell_infty$) of their $k$-center/median/means costs across all metrics. We first establish that no finite approximation ratio exists for $T geq 3$. For $T = 2$, we design constant-factor approximation algorithms. We introduce a joint parameterized framework yielding a 3-approximation when both $k$ and $T$ are bounded; achieve a $(1+varepsilon)$-approximation for instances with bounded $varepsilon$-scattering dimension or treewidth; and prove, under the Exponential Time Hypothesis (ETH), that no nontrivial approximation is possible when parameterized solely by $T$. Our results provide a unified characterization of the computational boundaries and tractable structures in multi-metric clustering.

Technology Category

Application Category

📝 Abstract

We introduce the aggregated clustering problem, where one is given $T$ instances of a center-based clustering task over the same $n$ points, but under different metrics. The goal is to open $k$ centers to minimize an aggregate of the clustering costs -- e.g., the average or maximum -- where the cost is measured via $k$-center/median/means objectives. More generally, we minimize a norm $Ψ$ over the $T$ cost values. We show that for $T geq 3$, the problem is inapproximable to any finite factor in polynomial time. For $T = 2$, we give constant-factor approximations. We also show W[2]-hardness when parameterized by $k$, but obtain $f(k,T)mathrm{poly}(n)$-time 3-approximations when parameterized by both $k$ and $T$. When the metrics have structure, we obtain efficient parameterized approximation schemes (EPAS). If all $T$ metrics have bounded $varepsilon$-scatter dimension, we achieve a $(1+varepsilon)$-approximation in $f(k,T,varepsilon)mathrm{poly}(n)$ time. If the metrics are induced by edge weights on a common graph $G$ of bounded treewidth $mathsf{tw}$, and $Ψ$ is the sum function, we get an EPAS in $f(T,varepsilon,mathsf{tw})mathrm{poly}(n,k)$ time. Conversely, unless (randomized) ETH is false, any finite factor approximation is impossible if parametrized by only $T$, even when the treewidth is $mathsf{tw} = Ω(mathrm{poly}log n)$.

Problem

Research questions and friction points this paper is trying to address.

Aggregated clustering under multiple metrics for same points

Minimizing norm of costs across T different clustering instances

Developing approximations for varying metrics with structural constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregated clustering across multiple metric instances

Parameterized approximations for bounded scatter dimensions

Efficient schemes for treewidth-bounded graph metrics

🔎 Similar Papers

Interpretable Clustering: A Survey