Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

๐Ÿ“… 2026-01-21
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of traditional Chain-of-Thought (CoT) reasoningโ€”namely its verbosity, high computational cost, and lack of effective supervision over intermediate steps, which hinder interpretability. The authors propose the Render-of-Thought (RoT) framework, which, for the first time, renders CoT reasoning steps into images and leverages off-the-shelf visual language models (VLMs) by utilizing their visual encoders as semantic anchors to align textual and visual embeddings. This approach explicitly externalizes implicit reasoning, making it traceable and interpretable. Notably, RoT requires no additional pretraining and operates in a plug-and-play manner. Experiments demonstrate that RoT achieves 3โ€“4ร— token compression and significant inference acceleration on mathematical and logical reasoning tasks while maintaining performance comparable to existing methods.

Technology Category

Application Category

๐Ÿ“ Abstract
Chain-of-Thought (CoT) prompting has achieved remarkable success in unlocking the reasoning capabilities of Large Language Models (LLMs). Although CoT prompting enhances reasoning, its verbosity imposes substantial computational overhead. Recent works often focus exclusively on outcome alignment and lack supervision on the intermediate reasoning process. These deficiencies obscure the analyzability of the latent reasoning chain. To address these challenges, we introduce Render-of-Thought (RoT), the first framework to reify the reasoning chain by rendering textual steps into images, making the latent rationale explicit and traceable. Specifically, we leverage the vision encoders of existing Vision Language Models (VLMs) as semantic anchors to align the vision embeddings with the textual space. This design ensures plug-and-play implementation without incurring additional pre-training overhead. Extensive experiments on mathematical and logical reasoning benchmarks demonstrate that our method achieves 3-4x token compression and substantial inference acceleration compared to explicit CoT. Furthermore, it maintains competitive performance against other methods, validating the feasibility of this paradigm. Our code is available at https://github.com/TencentBAC/RoT
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
reasoning overhead
latent reasoning
intermediate supervision
token compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Render-of-Thought
Chain-of-Thought
Vision Language Models
Token Compression
Latent Reasoning
๐Ÿ”Ž Similar Papers
No similar papers found.
Yifan Wang
Yifan Wang
Tsinghua University
Natural Language Processing
Shiyu Li
Shiyu Li
Tencent Inc.
P
Peiming Li
Tencent BAC; School of Electronic and Computer Engineering, Peking University
Xiaochen Yang
Xiaochen Yang
Senior Lecturer, School of Mathematics & Statistics, University of Glasgow
maching learningmedical image analysis
Y
Yang Tang
Tencent BAC
Z
Zheng Wei
Tencent BAC