🤖 AI Summary
This work addresses the challenge of reproducing Neural Radiance Fields (NeRF) papers, which typically requires extensive manual coding and is poorly served by general-purpose code generation methods. To bridge this gap, the authors propose a multi-agent framework that automatically translates NeRF research papers into executable, trainable Nerfstudio plugins. The framework integrates formal syntactic constraints, graph-structured coordination for multi-file code generation, automatic integration of referenced components, vision-feedback-driven repair mechanisms guided by PSNR, SSIM, and vision-language models (VLMs), and neural rendering architecture parsing. The study also introduces the first dedicated benchmark for evaluating such systems. Evaluated on 30 NeRF papers without open-source implementations, the generated code achieves visual fidelity comparable to human-written implementations—within ±0.5 dB PSNR and ±0.2 SSIM—while reducing development time from weeks to minutes.
📝 Abstract
The proliferation of neural radiance field (NeRF) research requires significant efforts to reimplement papers before building upon them. We introduce NERFIFY, a multi-agent framework that reliably converts NeRF research papers into trainable Nerfstudio plugins, in contrast to generic paper-to-code methods and frontier models like GPT-5 that usually fail to produce runnable code. NERFIFY achieves domain-specific executability through six key innovations: (1) Context-free grammar (CFG): LLM synthesis is constrained by Nerfstudio formalized as a CFG, ensuring generated code satisfies architectural invariants. (2) Graph-of-Thought code synthesis: Specialized multi-file-agents generate repositories in topological dependency order, validating contracts and errors at each node. (3) Compositional citation recovery: Agents automatically retrieve and integrate components (samplers, encoders, proposal networks) from citation graphs of references. (4) Visual feedback: Artifacts are diagnosed through PSNR-minima ROI analysis, cross-view geometric validation, and VLM-guided patching to iteratively improve quality. (5) Knowledge enhancement: Beyond reproduction, methods can be improved with novel optimizations. (6) Benchmarking: An evaluation framework is designed for NeRF paper-to-code synthesis across 30 diverse papers. On papers without public implementations, NERFIFY achieves visual quality matching expert human code (+/-0.5 dB PSNR, +/-0.2 SSIM) while reducing implementation time from weeks to minutes. NERFIFY demonstrates that a domain-aware design enables code translation for complex vision papers, potentiating accelerated and democratized reproducible research. Code, data and implementations will be publicly released.