🤖 AI Summary
To address three critical bottlenecks in automating 6G communication system modeling—lack of domain-specific knowledge in general-purpose large language models (LLMs), insufficient formal reasoning capability, and scarcity of high-quality training data—this paper introduces the first domain-specific LLM tailored for formalized reasoning over communication system equations. We innovatively construct the Communication System Formalized Reasoning Corpus (CSFRC) and propose C-ReMax, a rule-driven reinforcement learning algorithm enabling self-correction and formal verification. Our two-stage training paradigm comprises chain-of-thought supervised fine-tuning (SFT) followed by rule-augmented reinforcement learning, underpinned by a customized domain reasoning architecture. Experiments demonstrate that our model significantly outperforms larger closed-source LLMs across diverse communication modeling tasks, achieving state-of-the-art performance in the domain. All data, models, and code are publicly released.
📝 Abstract
Communication system formulation is critical for advancing 6G and future wireless technologies, yet it remains a complex, expertise-intensive task. While Large Language Models (LLMs) offer potential, existing general-purpose models often lack the specialized domain knowledge, nuanced reasoning capabilities, and access to high-quality, domain-specific training data required for adapting a general LLM into an LLM specially for communication system formulation. To bridge this gap, we introduce DeepForm, the first reasoning LLM specially for automated communication system formulation. We propose the world-first large-scale, open-source dataset meticulously curated for this domain called Communication System Formulation Reasoning Corpus (CSFRC). Our framework employs a two-stage training strategy: first, Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) data to distill domain knowledge; second, a novel rule-based Reinforcement Learning (RL) algorithm, C-ReMax based on ReMax, to cultivate advanced modeling capabilities and elicit sophisticated reasoning patterns like self-correction and verification. Extensive experiments demonstrate that our model achieves state-of-the-art performance, significantly outperforming larger proprietary LLMs on diverse senerios. We will release related resources to foster further research in this area after the paper is accepted.