Unveiling Location-Specific Price Drivers: A Two-Stage Cluster Analysis for Interpretable House Price Predictions

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of housing price valuation under local market heterogeneity—and the trade-off between interpretability and predictive accuracy in existing models (e.g., black-box machine learning or global linear regression)—this paper proposes a two-stage, clustering-driven interpretable modeling framework. First, K-means clustering is applied to parcel properties based on geographic and structural features to identify spatially distinct price-determining regimes. Second, within each cluster, both linear regression (LR) and generalized additive models (GAMs) are fitted. Evaluated on 43,309 German residential property transactions from 2023, the clustered GAM and LR reduce mean absolute error by 36% and 58%, respectively, versus global counterparts. Visual diagnostics further uncover cluster-specific nonlinear responses of key drivers—such as commute time and school quality—highlighting heterogeneous behavioral patterns across submarkets. The framework thus reconciles regional specificity with model transparency, offering a novel paradigm for granular housing valuation and evidence-based urban policy design.

Technology Category

Application Category

📝 Abstract
House price valuation remains challenging due to localized market variations. Existing approaches often rely on black-box machine learning models, which lack interpretability, or simplistic methods like linear regression (LR), which fail to capture market heterogeneity. To address this, we propose a machine learning approach that applies two-stage clustering, first grouping properties based on minimal location-based features before incorporating additional features. Each cluster is then modeled using either LR or a generalized additive model (GAM), balancing predictive performance with interpretability. Constructing and evaluating our models on 43,309 German house property listings from 2023, we achieve a 36% improvement for the GAM and 58% for LR in mean absolute error compared to models without clustering. Additionally, graphical analyses unveil pattern shifts between clusters. These findings emphasize the importance of cluster-specific insights, enhancing interpretability and offering practical value for buyers, sellers, and real estate analysts seeking more reliable property valuations.
Problem

Research questions and friction points this paper is trying to address.

Addressing localized market variations in house price valuation
Improving interpretability and accuracy of house price prediction models
Balancing predictive performance with interpretability using clustering techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage clustering for localized market grouping
Combining LR and GAM for interpretable predictions
Cluster-specific insights enhance valuation accuracy
🔎 Similar Papers
No similar papers found.