🤖 AI Summary
To address overfitting from high-dimensional features and the inflexibility of static clustering in dynamic ETF stock selection under evolving market regimes, this paper proposes a quantum-enhanced temporal adaptive reinforcement learning framework. Methodologically, it integrates variational quantum circuits (VQCs) with asynchronous advantage actor-critic (A3C) to construct an end-to-end quantum-classical hybrid policy network; additionally, it introduces a novel temporal dynamic clustering mechanism for online identification of market-state evolution and cluster-level policy transfer. Empirically evaluated on S&P 500 constituents, the model achieves 17.09% cumulative return—outperforming the benchmark by 7.09%—while demonstrating superior robustness to high-dimensional noise and enhanced exploration efficiency. Key contributions include: (i) the first quantum-enhanced A3C architecture for finance, and (ii) the first market-time-series-driven dynamic clustering paradigm for adaptive decision-making.
📝 Abstract
Traditional ETF stock selection methods and reinforcement learning models such as the Asynchronous Advantage Actor-Critic (A3C) often suffer from high-dimensional feature spaces and overfitting when applied to complex financial markets. Moreover, static clustering algorithms fail to capture evolving market regimes, as the cluster with higher returns in one period may not remain optimal in the next. To address these limitations, this paper proposes Q-A3C2, a quantum-enhanced A3C framework that integrates time-series dynamic clustering. By embedding Variational Quantum Circuits (VQCs) into the policy network, Q-A3C2 enhances nonlinear feature representation and enables adaptive decision-making at the cluster level. Experimental results on the S and P 500 constituents show that Q-A3C2 achieves a cumulative return of 17.09%, outperforming the benchmark's 7.09%, demonstrating superior adaptability and exploration in dynamic financial environments.