Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) suffer from high inference latency in financial market-making tasks, while existing knowledge distillation methods lack task-specific adaptability. Method: This paper proposes the Collaborative Multi-Model Market-making (CMM) distillation framework, which—novelty first—decouples LLM features orthogonally across layer, task, and data dimensions, and introduces a normalized fluorescence probe analysis mechanism. It further designs a Hájek-MoE ensemble method operating in kernel-function space to enable efficient collaboration and output fusion among lightweight student models. Contribution/Results: Integrating reinforcement learning, feature interpretability analysis, knowledge distillation, and multi-expert systems, CMM significantly outperforms state-of-the-art distillation and RL baselines on four real-world market datasets. It preserves policy accuracy while reducing inference latency by over 90%, establishing a scalable new paradigm for low-latency financial intelligent agents.

Technology Category

Application Category

📝 Abstract

Market making (MM) through Reinforcement Learning (RL) has attracted significant attention in financial trading. With the development of Large Language Models (LLMs), more and more attempts are being made to apply LLMs to financial areas. A simple, direct application of LLM as an agent shows significant performance. Such methods are hindered by their slow inference speed, while most of the current research has not studied LLM distillation for this specific task. To address this, we first propose the normalized fluorescent probe to study the mechanism of the LLM's feature. Based on the observation found by our investigation, we propose Cooperative Market Making (CMM), a novel framework that decouples LLM features across three orthogonal dimensions: layer, task, and data. Various student models collaboratively learn simple LLM features along with different dimensions, with each model responsible for a distinct feature to achieve knowledge distillation. Furthermore, CMM introduces an H'{a}jek-MoE to integrate the output of the student models by investigating the contribution of different models in a kernel function-generated common feature space. Extensive experimental results on four real-world market datasets demonstrate the superiority of CMM over the current distillation method and RL-based market-making strategies.

Problem

Research questions and friction points this paper is trying to address.

Distilling LLM features into smaller models for faster financial market making

Decoupling LLM knowledge across layer, task and data dimensions collaboratively

Integrating distilled features via kernel-based mixture for market making optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distilling LLM features via orthogonal decomposition

Collaborative student models learning distinct features

Integrating outputs with kernel-based MoE ensemble

🔎 Similar Papers

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

2024-02-18arXiv.orgCitations: 2

Authors to Follow