🤖 AI Summary
Cryptocurrency price prediction faces challenges including high market volatility, difficulty in fusing heterogeneous multi-source data (e.g., on-chain activity, news, and social sentiment), and scarcity of labeled training data. To address these, we propose Meta-RL-Crypto: a self-evolving trading agent integrating instruction-tuned large language models (LLMs) with a Transformer architecture. It introduces the novel Actor-Judge-Meta-Judge closed-loop framework, unifying meta-learning, reinforcement learning, and multimodal data fusion within an unsupervised continual optimization paradigm. Through role-switching mechanisms and internal preference-based feedback, the agent jointly co-evolves both trading policies and evaluation criteria—eliminating reliance on human annotations. Extensive experiments across multiple real-market time horizons demonstrate that Meta-RL-Crypto significantly outperforms existing LLM-driven baselines on key financial metrics, including Sharpe ratio and maximum drawdown.
📝 Abstract
Predicting cryptocurrency returns is notoriously difficult: price movements are driven by a fast-shifting blend of on-chain activity, news flow, and social sentiment, while labeled training data are scarce and expensive. In this paper, we present Meta-RL-Crypto, a unified transformer-based architecture that unifies meta-learning and reinforcement learning (RL) to create a fully self-improving trading agent. Starting from a vanilla instruction-tuned LLM, the agent iteratively alternates between three roles-actor, judge, and meta-judge-in a closed-loop architecture. This learning process requires no additional human supervision. It can leverage multimodal market inputs and internal preference feedback. The agent in the system continuously refines both the trading policy and evaluation criteria. Experiments across diverse market regimes demonstrate that Meta-RL-Crypto shows good performance on the technical indicators of the real market and outperforming other LLM-based baselines.