Near-optimal Linear Predictive Clustering in Non-separable Spaces via Mixed Integer Programming and Quadratic Pseudo-Boolean Reductions

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Linear Predictive Clustering (LPC) in non-separable spaces faces a fundamental trade-off between global optimality and scalability. Method: We propose an efficient approximation algorithm with provable error bounds, the first to reformulate the original mixed-integer program (MIP) as a quadratic pseudo-Boolean optimization problem. Leveraging separability structure analysis and linear regression theory, our approach simultaneously ensures theoretical global optimality guarantees and computational efficiency. Results: On both synthetic and real-world datasets, our method significantly outperforms greedy heuristics—reducing average regression error by 23.6%—and accelerates computation by two orders of magnitude compared to standard MIP solvers. It enables near-optimal clustering for datasets with up to thousands of samples. This work establishes a new paradigm for predictive clustering in non-separable settings, balancing accuracy, robustness, and practical deployability.

Technology Category

Application Category

📝 Abstract
Linear Predictive Clustering (LPC) partitions samples based on shared linear relationships between feature and target variables, with numerous applications including marketing, medicine, and education. Greedy optimization methods, commonly used for LPC, alternate between clustering and linear regression but lack global optimality. While effective for separable clusters, they struggle in non-separable settings where clusters overlap in feature space. In an alternative constrained optimization paradigm, Bertsimas and Shioda (2007) formulated LPC as a Mixed-Integer Program (MIP), ensuring global optimality regardless of separability but suffering from poor scalability. This work builds on the constrained optimization paradigm to introduce two novel approaches that improve the efficiency of global optimization for LPC. By leveraging key theoretical properties of separability, we derive near-optimal approximations with provable error bounds, significantly reducing the MIP formulation's complexity and improving scalability. Additionally, we can further approximate LPC as a Quadratic Pseudo-Boolean Optimization (QPBO) problem, achieving substantial computational improvements in some settings. Comparative analyses on synthetic and real-world datasets demonstrate that our methods consistently achieve near-optimal solutions with substantially lower regression errors than greedy optimization while exhibiting superior scalability over existing MIP formulations.
Problem

Research questions and friction points this paper is trying to address.

Improving global optimization efficiency for Linear Predictive Clustering
Addressing scalability issues in Mixed-Integer Programming formulations
Solving clustering problems in non-separable feature spaces
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed Integer Programming for near-optimal clustering
Quadratic Pseudo-Boolean reductions for efficiency
Provable error bounds with reduced complexity
🔎 Similar Papers
No similar papers found.
J
Jiazhou Liang
University of Toronto
H
Hassan Khurram
University of Toronto
Scott Sanner
Scott Sanner
University of Toronto
Artificial IntelligenceMachine LearningInformation Retrieval