PhyCritic: Multimodal Critic Models for Physical AI

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing critique models struggle to effectively evaluate the perception, causal reasoning, and planning capabilities essential for physical AI. To address this gap, this work proposes PhyCritic—the first multimodal critique model tailored for physical AI—featuring an innovative self-referential fine-tuning mechanism and a two-stage RLVR training pipeline. The first stage enhances the model’s physical perception and reasoning through physics-skill pretraining, while the second stage employs a self-referential strategy to improve the stability and physical correctness of its critiques. Experimental results demonstrate that PhyCritic significantly outperforms open-source baselines on both physical and general multimodal critique benchmarks. Moreover, when deployed as a policy model, PhyCritic further boosts performance in perception and reasoning on physical tasks.

Technology Category

Application Category

📝 Abstract

With the rapid development of large multimodal models, reliable judge and critic models have become essential for open-ended evaluation and preference alignment, providing pairwise preferences, numerical scores, and explanatory justifications for assessing model-generated responses. However, existing critics are primarily trained in general visual domains such as captioning or image question answering, leaving physical AI tasks involving perception, causal reasoning, and planning largely underexplored. We introduce PhyCritic, a multimodal critic model optimized for physical AI through a two-stage RLVR pipeline: a physical skill warmup stage that enhances physically oriented perception and reasoning, followed by self-referential critic finetuning, where the critic generates its own prediction as an internal reference before judging candidate responses, improving judgment stability and physical correctness. Across both physical and general-purpose multimodal judge benchmarks, PhyCritic achieves strong performance gains over open-source baselines and, when applied as a policy model, further improves perception and reasoning in physically grounded tasks.

Problem

Research questions and friction points this paper is trying to address.

physical AI

multimodal critic

causal reasoning

perception

planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal critic

physical AI

self-referential critique