π€ AI Summary
This work addresses the limitation of current large language models in neglecting sentence-level structure during contextual processing, which constrains their linguistic understanding and reasoning capabilities. To remedy this, the study proposes explicitly introducing sentence boundary delimiters into model inputs to endow them with sentence-awareness. By integrating this approach with in-context learning and supervised fine-tuning, the method achieves cognitively inspired enhancements in reasoning across Deepseek-V3 models spanning 7B to 600B parameters. Empirical results demonstrate performance gains of up to 7.7% on GSM8K and 12.5% on DROP benchmarks. Furthermore, internal representation analyses confirm the modelβs improved capacity to capture sentence-level semantics effectively.
π Abstract
Researchers have explored different ways to improve large language models (LLMs)' capabilities via dummy token insertion in contexts. However, existing works focus solely on the dummy tokens themselves, but fail to leverage the inherent sentence-level structure of natural language. This is a critical oversight, as LLMs acquire linguistic capabilities through exposure to human-generated texts, which are inherently structured at the sentence level. Motivated by this gap, we propose an approach that inserts delimiters at sentence boundaries in LLM inputs, which not only integrates dummy tokens into the context, but also facilitates LLMs with sentence-by-sentence processing behavior during reasoning. Two concrete methods: (1). In-context learning and (2). Supervised fine-tuning are experimented using 7B models to 600B Deepseek-V3. Our results demonstrate consistent improvements across various tasks, with notable gains of up to 7.7\% on GSM8k and 12.5\% on DROP. Furthermore, the fine-tuned LLMs can incorporate sentence awareness evidenced by their internal representations. Our work establishes a simple yet effective technique for enhancing LLM's capabilities, offering promising directions for cognitive-inspired LLM enhancement paradigm.