ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure

📅 2026-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of large reasoning models that generate lengthy chains of thought for complex tasks. The authors propose a self-compression method that requires no external teacher, manual pruning, or reinforcement learning. By applying multi-question contextual pressure, the model is induced to produce more concise reasoning traces, which are then used in a self-supervised fine-tuning process to internalize efficient reasoning capabilities. The approach involves constructing multi-question prompts, sampling and filtering reasoning trajectories, and performing lightweight fine-tuning with only 8,000 samples. Evaluated on MATH500 and AIME25 benchmarks, the method reduces reasoning token consumption by 59% and 33%, respectively, while maintaining competitive accuracy.

Technology Category

Application Category

📝 Abstract
Large reasoning models (LRMs) typically solve reasoning-intensive tasks by generating long chain-of-thought (CoT) traces, leading to substantial inference overhead. We identify a reproducible inference-time phenomenon, termed Self-Compression: when multiple independent and answerable questions are presented within a single prompt, the model spontaneously produces shorter reasoning traces for each question. This phenomenon arises from multi-question contextual pressure during generation and consistently manifests across models and benchmarks. Building on this observation, we propose ConPress (Learning from Contextual Pressure), a lightweight self-supervised fine-tuning approach. ConPress constructs multi-question prompts to induce self-compression, samples the resulting model outputs, and parses and filters per-question traces to obtain concise yet correct reasoning trajectories. These trajectories are directly used for supervised fine-tuning, internalizing compressed reasoning behavior in single-question settings without external teachers, manual pruning, or reinforcement learning. With only 8k fine-tuning examples, ConPress reduces reasoning token usage by 59% on MATH500 and 33% on AIME25, while maintaining competitive accuracy.
Problem

Research questions and friction points this paper is trying to address.

reasoning efficiency
chain-of-thought
inference overhead
large reasoning models
token usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Compression
Contextual Pressure
Chain-of-Thought Compression
Self-Supervised Fine-Tuning
Efficient Reasoning
🔎 Similar Papers
No similar papers found.