RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses neural rendering of global illumination for triangular mesh scenes without per-scene training. We propose RenderFormer—the first end-to-end, purely Transformer-based framework that jointly models triangular mesh faces and ray bundles as token sequences. It employs a two-stage decoupled architecture: view-agnostic light transport modeling and view-dependent ray-to-pixel mapping, enabling implicit joint representation learning of geometry and illumination—without explicit physical equations or scene-specific training. Our method introduces face tokenization, ray-bundle token representation, and sequence-to-sequence supervised learning. Evaluated across scenes of varying complexity, RenderFormer achieves high-fidelity global illumination rendering at inference speeds significantly surpassing path tracing. Crucially, it exhibits strong cross-scene generalization: a single training run enables zero-shot rendering of unseen geometries.

Technology Category

Application Category

📝 Abstract

We present RenderFormer, a neural rendering pipeline that directly renders an image from a triangle-based representation of a scene with full global illumination effects and that does not require per-scene training or fine-tuning. Instead of taking a physics-centric approach to rendering, we formulate rendering as a sequence-to-sequence transformation where a sequence of tokens representing triangles with reflectance properties is converted to a sequence of output tokens representing small patches of pixels. RenderFormer follows a two stage pipeline: a view-independent stage that models triangle-to-triangle light transport, and a view-dependent stage that transforms a token representing a bundle of rays to the corresponding pixel values guided by the triangle-sequence from the view-independent stage. Both stages are based on the transformer architecture and are learned with minimal prior constraints. We demonstrate and evaluate RenderFormer on scenes with varying complexity in shape and light transport.

Problem

Research questions and friction points this paper is trying to address.

Neural rendering of triangle meshes with global illumination

Sequence-to-sequence transformation for rendering without per-scene training

Transformer-based pipeline for view-independent and view-dependent rendering stages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based neural rendering pipeline

Sequence-to-sequence transformation for rendering

Two-stage view-independent and view-dependent processing

🔎 Similar Papers

No similar papers found.