Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This work addresses the inefficiency and limited generalization of existing offline preference optimization methods in 3D mesh generation by proposing the first asynchronous online reinforcement learning framework tailored for this task. The framework introduces Advantage-guided Ranked Preference Optimization (ARPO), a diagonal-aware hybrid triangle–quadrilateral tokenization representation, and a geometry-completeness reward mechanism based on ray sampling. Experimental results demonstrate that the proposed method achieves a 3.75× speedup in training compared to synchronous reinforcement learning and attains state-of-the-art performance in generating artistically styled and high-density quadrilateral meshes.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has demonstrated remarkable success in text and image generation, yet its potential in 3D generation remains largely unexplored. Existing attempts typically rely on offline direct preference optimization (DPO) method, which suffers from low training efficiency and limited generalization. In this work, we aim to enhance both the training efficiency and generation quality of RL in 3D mesh generation. Specifically, (1) we design the first asynchronous online RL framework tailored for 3D mesh generation post-training efficiency improvement, which is 3.75$\times$ faster than synchronous RL. (2) We propose Advantage-guided Ranking Preference Optimization (ARPO), a novel RL algorithm that achieves a better trade-off between training efficiency and generalization than current RL algorithms designed for 3D mesh generation, such as DPO and group relative policy optimization (GRPO). (3) Based on asynchronous ARPO, we propose Mesh-Pro, which additionally introduces a novel diagonal-aware mixed triangular-quadrilateral tokenization for mesh representation and a ray-based reward for geometric integrity. Mesh-Pro achieves state-of-the-art performance on artistic and dense meshes.
Problem

Research questions and friction points this paper is trying to address.

3D mesh generation
reinforcement learning
training efficiency
generalization
preference optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous reinforcement learning
Advantage-guided Ranking Preference Optimization
quadrilateral mesh generation
diagonal-aware tokenization
ray-based reward
🔎 Similar Papers
No similar papers found.