🤖 AI Summary
This paper addresses two key limitations in stock selection: weak generalization of single models and poor adaptability due to static weighting schemes. To overcome these, we propose an ensemble learning framework that dynamically weights constituent models based on the Information Coefficient (IC). Specifically, we integrate three representative models—Random Forest, XGBoost, and LSTM—and design a rolling-IC-driven real-time weight allocation mechanism. Additionally, a factor importance screening module is embedded to optimize input features. Backtesting on CSI 300 constituents demonstrates that our framework outperforms both conventional static-weight ensembles and individual models: it achieves a 12.3% higher annualized return and attains a mean IC of 0.048 (p < 0.01), indicating significantly improved prediction stability and alpha generation capability. The core contributions are: (i) elevating IC from a mere performance evaluation metric to a principled basis for dynamic weight generation; and (ii) empirically validating that factor screening critically enhances ensemble effectiveness.
📝 Abstract
This paper proposes a novel stock selection strategy framework based on combined machine learning algorithms. Two types of weighting methods for three representative machine learning algorithms are developed to predict the returns of the stock selection strategy. One is static weighting based on model evaluation metrics, the other is dynamic weighting based on Information Coefficients (IC). Using CSI 300 index data, we empirically evaluate the strategy' s backtested performance and model predictive accuracy. The main results are as follows: (1) The strategy by combined machine learning algorithms significantly outperforms single-model approaches in backtested returns. (2) IC-based weighting (particularly IC_Mean) demonstrates greater competitiveness than evaluation-metric-based weighting in both backtested returns and predictive performance. (3) Factor screening substantially enhances the performance of combined machine learning strategies.