🤖 AI Summary
In long-context question answering (LCQA), large language models (LLMs) often rely on “fast thinking” (shallow pattern matching), leading to insufficient logical reasoning, or “slow thinking” (exhaustive step-by-step inference), resulting in redundant computation and degraded efficiency.
Method: We propose a dynamic multi-granularity reasoning mechanism that adaptively schedules reasoning depth based on question complexity. We introduce the first explicit chain-of-thought (CoT) injection strategy guided by synthetically generated reference QA pairs, enabling synergistic fast–slow thinking. Our approach integrates synthetic data augmentation, dynamic path control, and multi-granularity scheduling.
Contribution/Results: Evaluated on seven QA benchmarks, our method significantly improves accuracy on multi-hop questions, reduces average inference overhead by 32%, and enhances scalability to long texts—overcoming limitations of fixed-depth reasoning paradigms.
📝 Abstract
Long-context question-answering (LCQA) systems have greatly benefited from the powerful reasoning capabilities of large language models (LLMs), which can be categorized into slow and quick reasoning modes. However, both modes have their limitations. Slow thinking generally leans to explore every possible reasoning path, which leads to heavy overthinking and wastes time. Quick thinking usually relies on pattern matching rather than truly understanding the query logic, which misses proper understanding. To address these issues, we propose FReM: Flexible Reasoning Mechanism, a method that adjusts reasoning depth according to the complexity of each question. Specifically, FReM leverages synthetic reference QA examples to provide an explicit chain of thought, enabling efficient handling of simple queries while allowing deeper reasoning for more complex ones. By doing so, FReM helps quick-thinking models move beyond superficial pattern matching and narrows the reasoning space for slow-thinking models to avoid unnecessary exploration. Experiments on seven QA datasets show that FReM improves reasoning accuracy and scalability, particularly for complex multihop questions, indicating its potential to advance LCQA methodologies.