Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the inefficiency and limited generalization of existing offline preference optimization methods in 3D mesh generation by proposing the first asynchronous online reinforcement learning framework tailored for this task. The framework introduces Advantage-guided Ranked Preference Optimization (ARPO), a diagonal-aware hybrid triangle–quadrilateral tokenization representation, and a geometry-completeness reward mechanism based on ray sampling. Experimental results demonstrate that the proposed method achieves a 3.75× speedup in training compared to synchronous reinforcement learning and attains state-of-the-art performance in generating artistically styled and high-density quadrilateral meshes.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has demonstrated remarkable success in text and image generation, yet its potential in 3D generation remains largely unexplored. Existing attempts typically rely on offline direct preference optimization (DPO) method, which suffers from low training efficiency and limited generalization. In this work, we aim to enhance both the training efficiency and generation quality of RL in 3D mesh generation. Specifically, (1) we design the first asynchronous online RL framework tailored for 3D mesh generation post-training efficiency improvement, which is 3.75$\times$ faster than synchronous RL. (2) We propose Advantage-guided Ranking Preference Optimization (ARPO), a novel RL algorithm that achieves a better trade-off between training efficiency and generalization than current RL algorithms designed for 3D mesh generation, such as DPO and group relative policy optimization (GRPO). (3) Based on asynchronous ARPO, we propose Mesh-Pro, which additionally introduces a novel diagonal-aware mixed triangular-quadrilateral tokenization for mesh representation and a ray-based reward for geometric integrity. Mesh-Pro achieves state-of-the-art performance on artistic and dense meshes.

Problem

Research questions and friction points this paper is trying to address.

3D mesh generation

reinforcement learning

training efficiency

generalization

preference optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous reinforcement learning

Advantage-guided Ranking Preference Optimization

quadrilateral mesh generation