MeTHanol: Modularized Thinking Language Models with Intermediate Layer Thinking, Decoding and Bootstrapping Reasoning

📅 2024-09-18

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Large language models (LLMs) often lack explicit, controllable reasoning mechanisms. Method: We propose Think-as-you-Speak (TaS), a novel architecture that embeds trainable “thinking layers”—modular intermediate-language heads—within LLM hidden layers to model the cognitive “think-before-speaking” process as a query-driven, two-stage inference: first generating an internal chain-of-thought (CoT), then producing the final response conditioned on it. TaS introduces a data-driven pipeline for annotating and generating reasoning content, coupled with multi-stage supervised fine-tuning to enable end-to-end training and bootstrapped reasoning enhancement. Contribution/Results: Experiments demonstrate that TaS significantly outperforms standard CoT and Tree-of-Thought (ToT) across multiple complex reasoning benchmarks. Qualitative analysis confirms the logical coherence and task relevance of its generated chains-of-thought, establishing a new paradigm for interpretable and intervention-aware LLM reasoning.

Technology Category

Application Category

📝 Abstract

Large Language Model can reasonably understand and generate human expressions but may lack of thorough thinking and reasoning mechanisms. Recently there have been several studies which enhance the thinking ability of language models but most of them are not data-driven or training-based. In this paper, we are motivated by the cognitive mechanism in the natural world, and design a novel model architecture called TaS which allows it to first consider the thoughts and then express the response based upon the query. We design several pipelines to annotate or generate the thought contents from prompt-response samples, then add language heads in a middle layer which behaves as the thinking layer. We train the language model by the thoughts-augmented data and successfully let the thinking layer automatically generate reasonable thoughts and finally output more reasonable responses. Both qualitative examples and quantitative results validate the effectiveness and performance of TaS. Our code is available at https://anonymous.4open.science/r/TadE.

Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning in language models via intermediate thinking layers

Data-driven training for improved thought generation and response accuracy

Bridging cognitive mechanisms with modularized model architecture (TaS)

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces TaS model with intermediate thinking layer

Trains model using thoughts-augmented data pipeline

Enhances reasoning via modularized thinking and bootstrapping

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Scientist, AI Language