🤖 AI Summary
This work addresses the limitation of chemical large language models, which rely on discrete textual reasoning and thus struggle to capture the continuous and structural nature of chemical reasoning. To overcome this, the authors propose LatentChem, a novel framework that decouples chemical computation from text generation, enabling multi-step reasoning within a continuous latent space and producing natural language only at the final output stage. This approach represents the first demonstration of an emergent shift from explicit chain-of-thought textual reasoning to implicit, continuous latent-space inference, thereby transcending conventional language modeling paradigms. Evaluated on ChemCoTBench, LatentChem achieves a 59.88% non-draw win rate against strong baselines and delivers an average 10.84× acceleration in reasoning speed.
📝 Abstract
Chemical large language models (LLMs) predominantly rely on explicit Chain-of-Thought (CoT) in natural language to perform complex reasoning. However, chemical reasoning is inherently continuous and structural, and forcing it into discrete linguistic tokens introduces a fundamental representation mismatch that constrains both efficiency and performance. We introduce LatentChem, a latent reasoning interface that decouples chemical computation from textual generation, enabling models to perform multi-step reasoning directly in continuous latent space while emitting language only for final outputs. Remarkably, we observe a consistent emergent behavior: when optimized solely for task success, models spontaneously internalize reasoning, progressively abandoning verbose textual derivations in favor of implicit latent computation. This shift is not merely stylistic but computationally advantageous. Across diverse chemical reasoning benchmarks, LatentChem achieves a 59.88\% non-tie win rate over strong CoT-based baselines on ChemCoTBench, while delivering a 10.84$\times$ average inference speedup. Our results provide empirical evidence that chemical reasoning is more naturally and effectively realized as continuous latent dynamics rather than discretized linguistic trajectories.