🤖 AI Summary
This study addresses the high computational and memory costs of world models in deployment, an area where the effectiveness of post-training quantization (PTQ) remains underexplored. Using DINO-WM as a case study, the work systematically evaluates multiple PTQ methods under both weight-only and weight-activation quantization schemes across varying bit-widths, quantization granularities, and visual planning tasks extending up to 50 rollout steps. The analysis uncovers unique failure modes in world model quantization: group-sparse weight quantization stabilizes low-bit rollouts, and the encoder and predictor exhibit markedly asymmetric sensitivity to quantization. Crucially, the study finds that low-bit quantization severely degrades the alignment between planning objectives and task success rates—a degradation that cannot be recovered through subsequent optimization—providing critical insights for efficient deployment of world models.
📝 Abstract
World models learn an internal representation of environment dynamics, enabling agents to simulate and reason about future states within a compact latent space for tasks such as planning, prediction, and inference. However, running world models rely on hevay computational cost and memory footprint, making model quantization essential for efficient deployment. To date, the effects of post-training quantization (PTQ) on world models remain largely unexamined. In this work, we present a systematic empirical study of world model quantization using DINO-WM as a representative case, evaluating diverse PTQ methods under both weight-only and joint weight-activation settings. We conduct extensive experiments on different visual planning tasks across a wide range of bit-widths, quantization granularities, and planning horizons up to 50 iterations. Our results show that quantization effects in world models extend beyond standard accuracy and bit-width trade-offs: group-wise weight quantization can stabilize low-bit rollouts, activation quantization granularity yields inconsistent benefits, and quantization sensitivity is highly asymmetric between encoder and predictor modules. Moreover, aggressive low-bit quantization significantly degrades the alignment between the planning objective and task success, leading to failures that cannot be remedied by additional optimization. These findings reveal distinct quantization-induced failure modes in world model-based planning and provide practical guidance for deploying quantized world models under strict computational constraints. The code will be available at https://github.com/huawei-noah/noah-research/tree/master/QuantWM.