🤖 AI Summary
This study addresses the limitation of traditional stock clustering methods that rely on Euclidean distance and fail to capture stochastic dominance relationships reflecting investors’ risk preferences. The authors propose a novel framework that incorporates first- to third-order stochastic dominance test statistics to construct a “stochastic dominance coefficient matrix,” which is then integrated into K-means and hierarchical clustering algorithms, yielding twelve clustering variants tailored to investors with heterogeneous risk attitudes. To evaluate clustering validity, the paper introduces two new metrics: the SD-SC coefficient and the SD-DBI index. Empirical validation on NASDAQ and CSI 100 constituent stocks demonstrates the robustness of the approach, showing significant improvements in portfolio optimization performance for both risk-averse and risk-seeking investors.
📝 Abstract
Stochastic Dominance (SD) theory provides a rigorous framework for selecting superior assets tailored to the asset allocation needs of investors with varying risk preferences (i.e., risk-averse, risk-seeking, and risk-neutral). However, traditional stock clustering methods typically rely on geometric metrics such as Euclidean distance, which often fail to effectively capture the intrinsic risk dominance relationships among assets. To address this limitation, this paper proposes an innovative clustering analysis framework based on SD test statistics. Methodologically, this study deeply integrates SD theory with machine learning algorithms. Transcending the limitations of traditional reliance on geometric distance, we innovatively utilize test statistics from first-, second-, and third-order SD to construct a "Stochastic Dominance Coefficient Matrix." Building upon this matrix, we modify the classic K-means and Hierarchical Clustering algorithms. Specifically, we derive 12 distinct algorithm variants tailored to different orders of SD relationships. Simultaneously, we construct the SD-SC coefficient and the SD-DBI index as specialized validity indices to evaluate the clustering performance. Empirically, we analyze constituent stock data from a representative developed market (the US NASDAQ Index) and an emerging market (China's CSI 100 Index). The results verify the effectiveness and robustness of the proposed method. Furthermore, we apply the clustering results to the modification of the Single Index Model and the construction of Global Minimum Variance Portfolios (GMVP). The findings demonstrate that the proposed method effectively facilitates customized asset allocation for investors, holding significant theoretical value and practical implications.