Mesh-Pro: Asynchronous Advantage-guided Ranking Preference Optimization for Artist-style Quadrilateral Mesh Generation

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency and limited generalization of existing offline preference optimization methods in 3D mesh generation by proposing the first asynchronous online reinforcement learning framework tailored for this task. The framework introduces Advantage-guided Ranked Preference Optimization (ARPO), a diagonal-aware hybrid triangle–quadrilateral tokenization representation, and a geometry-completeness reward mechanism based on ray sampling. Experimental results demonstrate that the proposed method achieves a 3.75× speedup in training compared to synchronous reinforcement learning and attains state-of-the-art performance in generating artistically styled and high-density quadrilateral meshes.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has demonstrated remarkable success in text and image generation, yet its potential in 3D generation remains largely unexplored. Existing attempts typically rely on offline direct preference optimization (DPO) method, which suffers from low training efficiency and limited generalization. In this work, we aim to enhance both the training efficiency and generation quality of RL in 3D mesh generation. Specifically, (1) we design the first asynchronous online RL framework tailored for 3D mesh generation post-training efficiency improvement, which is 3.75$\times$ faster than synchronous RL. (2) We propose Advantage-guided Ranking Preference Optimization (ARPO), a novel RL algorithm that achieves a better trade-off between training efficiency and generalization than current RL algorithms designed for 3D mesh generation, such as DPO and group relative policy optimization (GRPO). (3) Based on asynchronous ARPO, we propose Mesh-Pro, which additionally introduces a novel diagonal-aware mixed triangular-quadrilateral tokenization for mesh representation and a ray-based reward for geometric integrity. Mesh-Pro achieves state-of-the-art performance on artistic and dense meshes.
Problem

Research questions and friction points this paper is trying to address.

3D mesh generation
reinforcement learning
training efficiency
generalization
preference optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous reinforcement learning
Advantage-guided Ranking Preference Optimization
quadrilateral mesh generation
diagonal-aware tokenization
ray-based reward
🔎 Similar Papers
No similar papers found.
Z
Zhen Zhou
Institute of Automation, Chinese Academy of Sciences
J
Jian Liu
Tencent Hunyuan
Biwen Lei
Biwen Lei
Tencent
Computer VisionDeep Learning
J
Jing Xu
Tencent Hunyuan
Haohan Weng
Haohan Weng
South China University of Technology
generative modelscomputer vision
Y
Yiling Zhu
Tencent Hunyuan
Z
Zhuo Chen
Tencent Hunyuan
Junfeng Fan
Junfeng Fan
Open Security Research
CryptographySecure ChipsSide-channel Attacks
Y
Yunkai Ma
Institute of Automation, Chinese Academy of Sciences
Dazhao Du
Dazhao Du
Hong Kong University of Science and Technology
MultiModal LLMVideo UnderstandingTime Series ForecastingDeep Learning
Song Guo
Song Guo
Chair Professor of CSE, HKUST
Large Language ModelEdge AIMachine Learning Systems
F
Fengshui Jing
Institute of Automation, Chinese Academy of Sciences
C
Chunchao Guo
Tencent Hunyuan