DICE: Structured Reasoning in LLMs through SLM-Guided Chain-of-Thought Correction

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often neglect formatting constraints under complex instructions, prioritizing reasoning over faithful instruction following. To address this, we propose DICE—a framework that decouples reasoning from format control by leveraging a lightweight small language model (SLM) to analyze and structurally reconstruct LLM-generated chain-of-thought (CoT) outputs. DICE comprises two key components: (1) a two-stage data construction pipeline—first deriving natural-language reasoning traces, then adapting them into structured CoT formats; and (2) a dual-tuning strategy involving SLM supervised fine-tuning and LLM prompt refinement. Experiments demonstrate that DICE improves format accuracy and content correctness by 35.4% and 29.4%, respectively, significantly outperforming state-of-the-art instruction-following methods. Notably, DICE achieves fine-grained, controllable LLM output generation driven by a compact SLM—marking the first such approach to enable precise structural guidance without modifying the LLM itself.

Technology Category

Application Category

📝 Abstract
When performing reasoning tasks with user-specific requirements, such as strict output formats, large language models (LLMs) often prioritize reasoning over adherence to detailed instructions. Fine-tuning LLMs on supervised datasets to address this is impractical due to high computational costs and limited parameter access. To tackle this, we propose DICE, a lightweight framework that guides small language models (SLMs) to refine LLMs' outputs through chain-of-thought (CoT) correction. DICE decouples the process by first prompting LLMs to generate natural language responses, then using trained SLMs to analyze and refine these outputs to meet structured output specifications. This framework preserves LLMs' broad knowledge and reasoning capabilities while ensuring the outputs conform to user demands. Specifically, DICE first constructs structured CoT adaptation datasets via a two-stage method and subsequently applies a dual-tuning strategy to fine-tune SLMs for generating structured outputs in an analyze-then-answer pattern. Experiments demonstrate that DICE improves the average format accuracy and content correctness of LLM outputs by 35.4% and 29.4%, respectively, achieving state-of-the-art (SOTA) performance over other competitive baselines.
Problem

Research questions and friction points this paper is trying to address.

Improves LLM adherence to structured output formats
Reduces computational costs of fine-tuning large models
Enhances reasoning accuracy while maintaining format compliance
Innovation

Methods, ideas, or system contributions that make the work stand out.

SLMs refine LLM outputs via CoT correction
Decouples reasoning from structured output generation
Uses dual-tuning strategy for analyze-then-answer pattern
🔎 Similar Papers
No similar papers found.
Yiqi Li
Yiqi Li
Shanghai Jiao Tong University
Yusheng Liao
Yusheng Liao
Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory
Large Language ModelsClinical NLPAgentReasoning
Z
Zhe Chen
Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory
Yanfeng Wang
Yanfeng Wang
Shanghai Jiao Tong University
Y
Yu Wang
Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory