🤖 AI Summary
Traditional wealth management optimizes goal-based investment strategies individually for each investor, resulting in inefficiency and poor scalability. This work proposes a novel meta-reinforcement learning approach that, for the first time, integrates zero-shot meta-learning with foundation model pretraining concepts into wealth management. By pretraining a policy model on thousands of simulated goal-based tasks, the method achieves cross-task generalization and can generate near-optimal dynamic portfolio strategies in just 0.01 seconds. The resulting policies attain, on average, 97.8% of the utility achieved by dynamic programming benchmarks while maintaining robustness across diverse market regimes. Notably, the approach scales effectively to high-dimensional state spaces that are intractable for conventional dynamic programming methods.
📝 Abstract
Applying concepts related to zero-shot meta-learning and pre-training of foundation models, we develop a meta reinforcement learning approach (denoted MetaRL) that is pre-trained on thousands of goals-based wealth management (GBWM) problems. Each GBWM problem involves a multiple year scenario over which the investor looks to optimally choose an investment portfolio each year and choose to fulfill all, some, or none of the different financial goals that arise each year. These choices seek to maximize the expected total investor utility obtained from the fulfilled financial goals. By eliminating separate training and optimization for each new investor problem, the MetaRL model in inference mode produces near-optimal dynamic investment portfolio and goal-fulfilling strategies for a new GBWM problem within a few hundredths of a second. This delivers expected utilities that are, on average, 97.8% of the optimal expected utilities (determined via Dynamic Programming). These results are remarkably robust to capital market regime changes, even when training uses only one capital market regime. Further, the MetaRL approach can enable solving problems with larger state spaces where Dynamic Programming becomes computationally infeasible.