Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from high inference latency in financial market-making tasks, while existing knowledge distillation methods lack task-specific adaptability. Method: This paper proposes the Collaborative Multi-Model Market-making (CMM) distillation framework, which—novelty first—decouples LLM features orthogonally across layer, task, and data dimensions, and introduces a normalized fluorescence probe analysis mechanism. It further designs a Hájek-MoE ensemble method operating in kernel-function space to enable efficient collaboration and output fusion among lightweight student models. Contribution/Results: Integrating reinforcement learning, feature interpretability analysis, knowledge distillation, and multi-expert systems, CMM significantly outperforms state-of-the-art distillation and RL baselines on four real-world market datasets. It preserves policy accuracy while reducing inference latency by over 90%, establishing a scalable new paradigm for low-latency financial intelligent agents.

Technology Category

Application Category

📝 Abstract
Market making (MM) through Reinforcement Learning (RL) has attracted significant attention in financial trading. With the development of Large Language Models (LLMs), more and more attempts are being made to apply LLMs to financial areas. A simple, direct application of LLM as an agent shows significant performance. Such methods are hindered by their slow inference speed, while most of the current research has not studied LLM distillation for this specific task. To address this, we first propose the normalized fluorescent probe to study the mechanism of the LLM's feature. Based on the observation found by our investigation, we propose Cooperative Market Making (CMM), a novel framework that decouples LLM features across three orthogonal dimensions: layer, task, and data. Various student models collaboratively learn simple LLM features along with different dimensions, with each model responsible for a distinct feature to achieve knowledge distillation. Furthermore, CMM introduces an H'{a}jek-MoE to integrate the output of the student models by investigating the contribution of different models in a kernel function-generated common feature space. Extensive experimental results on four real-world market datasets demonstrate the superiority of CMM over the current distillation method and RL-based market-making strategies.
Problem

Research questions and friction points this paper is trying to address.

Distilling LLM features into smaller models for faster financial market making
Decoupling LLM knowledge across layer, task and data dimensions collaboratively
Integrating distilled features via kernel-based mixture for market making optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distilling LLM features via orthogonal decomposition
Collaborative student models learning distinct features
Integrating outputs with kernel-based MoE ensemble
🔎 Similar Papers
T
Tianhao Fu
Peking University
X
Xinxin Xu
Peking University
Weichen Xu
Weichen Xu
Purdue University
Computer Vision
J
Jue Chen
Peking University
R
Ruilong Ren
Peking University
Bowen Deng
Bowen Deng
Postdoc at MIT | PhD at UC Berkeley
Machine LearningAI for ScienceComputational MaterialsEnergy Materials
Xinyu Zhao
Xinyu Zhao
The University of North Carolina at Chapel Hill
J
Jian Cao
Peking University
X
Xixin Cao
Peking University