🤖 AI Summary
Large language models (LLMs) lack long-horizon planning capability under the next-token prediction paradigm. Method: This paper proposes DiffuSearch, an implicit planning framework that eschews explicit search (e.g., Monte Carlo Tree Search) and instead introduces the first discrete diffusion model for end-to-end implicit forward world modeling. It unifies policy distillation and chess-state sequence modeling via joint training of autoregressive action prediction and reverse denoising. Contribution/Results: On chess tasks, DiffuSearch improves move accuracy by 19.2% over a single-step policy and by 14% over an MCTS-augmented policy; puzzle-solving success rate increases by 30%, and Elo rating rises by 540 points—demonstrating the efficacy of internalizing search into generative modeling.
📝 Abstract
In the post-AlphaGo era, there has been a renewed interest in search techniques such as Monte Carlo Tree Search (MCTS), particularly in their application to Large Language Models (LLMs). This renewed attention is driven by the recognition that current next-token prediction models often lack the ability for long-term planning. Is it possible to instill search-like abilities within the models to enhance their planning abilities without relying on explicit search? We propose DiffuSearch , a model that does extit{implicit search} by looking into the future world via discrete diffusion modeling. We instantiate DiffuSearch on a classical board game, Chess, where explicit search is known to be essential. Through extensive controlled experiments, we show DiffuSearch outperforms both the searchless and explicit search-enhanced policies. Specifically, DiffuSearch outperforms the one-step policy by 19.2% and the MCTS-enhanced policy by 14% on action accuracy. Furthermore, DiffuSearch demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment. These results indicate that implicit search via discrete diffusion is a viable alternative to explicit search over a one-step policy. All codes are publicly available at href{https://github.com/HKUNLP/DiffuSearch}{https://github.com/HKUNLP/DiffuSearch}.