EVE: A Generator-Verifier System for Generative Policies

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Generative vision–motor policies (e.g., diffusion or flow-matching models) suffer from poor generalization under distribution shifts, weak recovery capability, and reliance on costly fine-tuning. Method: We propose EVE, the first modular *generate-verify* framework for embodied intelligence, enabling zero-shot, test-time enhancement of frozen generative policies via multiple heterogeneous verifiers—without any training or fine-tuning. EVE introduces a VLM-based action correction mechanism for out-of-distribution robustness and a learnable action fusion module that dynamically integrates verifier outputs. Contribution/Results: EVE significantly improves success rates across diverse manipulation tasks. Ablation studies confirm that verifier heterogeneity and adaptive fusion are critical to performance gains. By decoupling generation from verification, EVE establishes an efficient, lightweight, plug-and-play online augmentation paradigm for generative policies—enabling robust, real-time adaptation without parameter updates.

Technology Category

Application Category

📝 Abstract

Visuomotor policies based on generative architectures such as diffusion and flow-based matching have shown strong performance but degrade under distribution shifts, demonstrating limited recovery capabilities without costly finetuning. In the language modeling domain, test-time compute scaling has revolutionized reasoning capabilities of modern LLMs by leveraging additional inference-time compute for candidate solution refinement. These methods typically leverage foundation models as verification modules in a zero-shot manner to synthesize improved candidate solutions. In this work, we hypothesize that generative policies can similarly benefit from additional inference-time compute that employs zero-shot VLM-based verifiers. A systematic analysis of improving policy performance through the generation-verification framework remains relatively underexplored in the current literature. To this end, we introduce EVE - a modular, generator-verifier interaction framework - that boosts the performance of pretrained generative policies at test time, with no additional training. EVE wraps a frozen base policy with multiple zero-shot, VLM-based verifier agents. Each verifier proposes action refinements to the base policy candidate actions, while an action incorporator fuses the aggregated verifier output into the base policy action prediction to produce the final executed action. We study design choices for generator-verifier information interfacing across a system of verifiers with distinct capabilities. Across a diverse suite of manipulation tasks, EVE consistently improves task success rates without any additional policy training. Through extensive ablations, we isolate the contribution of verifier capabilities and action incorporator strategies, offering practical guidelines to build scalable, modular generator-verifier systems for embodied control.

Problem

Research questions and friction points this paper is trying to address.

Improves generative policies' robustness under distribution shifts

Enhances policy performance via zero-shot VLM-based verifiers

Enables test-time compute scaling without additional training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generator-verifier system boosts pretrained policies without training

Zero-shot VLM-based verifiers refine actions at inference time

Modular framework fuses multiple verifier outputs for final actions

🔎 Similar Papers

No similar papers found.