Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work proposes a language-driven hierarchical reinforcement learning framework to address the challenge of hierarchical credit assignment in pairs trading, which arises from delayed and ambiguous feedback. The approach uniquely leverages large language models (LLMs) in a purely prompt-based manner to parameterize both the high-level asset-pair selection policy and the low-level trade execution policy, explicitly decoupling abstract decision-making from concrete actions. By incorporating episode-level textual feedback and a gradient-free prompt optimization mechanism, the method effectively mitigates non-stationarity across hierarchy levels and enables targeted adaptation under delayed feedback. Experiments on real-world market data demonstrate that the proposed framework significantly outperforms both conventional and existing LLM-based baselines, confirming its effectiveness and novelty in financial sequential decision-making.

📝 Abstract

Many sequential decision-making problems exhibit hierarchical structure, where high-level semantic choices constrain downstream actions and feedback is delayed and ambiguous. Learning in such settings is challenging due to credit assignment: performance degradation may arise from flawed abstractions, suboptimal execution, or their interaction. We study this challenge through pair trading, a domain that naturally combines long-horizon semantic reasoning for asset pair selection with short-horizon execution under partial observability. We formulate pair trading as a hierarchical reinforcement learning problem and propose a language-driven optimization framework in which both high-level and low-level policies are parameterized by large language models (LLMs) and optimized exclusively through prompt updates. Our approach leverages pretrained LLMs as hierarchical policies and uses trajectory- and episode-level textual feedback to adapt abstractions and execution without gradient-based fine-tuning. By explicitly separating abstraction selection from execution, the framework reduces non-stationarity across hierarchical levels and enables targeted adaptation under delayed feedback. Experiments on real-world market data show consistent improvements over traditional and LLM-based baselines, demonstrating the effectiveness of language-driven hierarchical reinforcement learning.

Problem

Research questions and friction points this paper is trying to address.

hierarchical reinforcement learning

credit assignment

pair trading

delayed feedback

sequential decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

language-driven reinforcement learning

hierarchical reinforcement learning

large language models