🤖 AI Summary
This paper addresses the joint optimization of portfolio allocation and consumption under continuous-time constraints at large scale. Conventional dynamic programming approaches are severely limited by the “curse of dimensionality,” typically handling no more than seven assets. To overcome this, we propose a novel end-to-end framework that integrates Pontryagin’s Maximum Principle (PMP) with neural network–based gradient optimization, ensuring strict adherence to realistic economic constraints—including short-selling limits, borrowing restrictions, and budget constraints. Our method parameterizes control policies via deep neural networks, leverages PMP for theoretical guidance, employs direct policy optimization, and exploits GPU parallelization—bypassing computationally intensive PDE or BSDE modeling and grid-based discretization. Experiments demonstrate exact recovery of analytical solutions in the unconstrained case with 1,000 assets; rapid convergence under complex constraints; and near-optimal performance achieved within just 1–2 minutes of GPU training—substantially advancing the state-of-the-art in both scalability and computational efficiency.
📝 Abstract
We present a Pontryagin-Guided Direct Policy Optimization (PG-DPO) method for constrained dynamic portfolio choice - incorporating consumption and multi-asset investment - that scales to thousands of risky assets. By combining neural-network controls with Pontryagin's Maximum Principle (PMP), it circumvents the curse of dimensionality that renders dynamic programming (DP) grids intractable beyond a handful of assets. Unlike value-based PDE or BSDE approaches, PG-DPO enforces PMP conditions at each gradient step, naturally accommodating no-short-selling or borrowing constraints and optional consumption bounds. A"one-shot"variant rapidly computes Pontryagin-optimal controls after a brief warm-up, leading to substantially higher accuracy than naive baselines. On modern GPUs, near-optimal solutions often emerge within just one or two minutes of training. Numerical experiments confirm that, for up to 1,000 assets, PG-DPO accurately recovers the known closed-form solution in the unconstrained case and remains tractable under constraints -- far exceeding the longstanding DP-based limit of around seven assets.