Geo-ID: Test-Time Geometric Consensus for Cross-View Consistent Intrinsics

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the challenge of inconsistent intrinsic property predictions—such as albedo and roughness—across multi-view images, which hinders their use in editable neural scenes and 3D reconstruction. To this end, we propose Geo-ID, a test-time framework that enforces cross-view consistency by leveraging sparse geometric correspondences, without requiring retraining or inverse rendering. Geo-ID is the first method to achieve model-agnostic consistency enhancement at test time through geometric consensus. By integrating an uncertainty-aware consensus objective with a test-time optimization strategy, our approach significantly improves multi-view intrinsic consistency on both synthetic and real-world scenes while preserving per-view prediction accuracy. This enables coherent downstream applications such as appearance editing and relighting.

Technology Category

Application Category

📝 Abstract

Intrinsic image decomposition aims to estimate physically based rendering (PBR) parameters such as albedo, roughness, and metallicity from images. While recent methods achieve strong single-view predictions, applying them independently to multiple views of the same scene often yields inconsistent estimates, limiting their use in downstream applications such as editable neural scenes and 3D reconstruction. Video-based models can improve cross-frame consistency but require dense, ordered sequences and substantial compute, limiting their applicability to sparse, unordered image collections. We propose Geo-ID, a novel test-time framework that repurposes pretrained single-view intrinsic predictors to produce cross-view consistent decompositions by coupling independent per-view predictions through sparse geometric correspondences that form uncertainty-aware consensus targets. Geo-ID is model-agnostic, requires no retraining or inverse rendering, and applies directly to off-the-shelf intrinsic predictors. Experiments on synthetic benchmarks and real-world scenes demonstrate substantial improvements in cross-view intrinsic consistency as the number of views increases, while maintaining comparable single-view decomposition performance. We further show that the resulting consistent intrinsics enable coherent appearance editing and relighting in downstream neural scene representations.

Problem

Research questions and friction points this paper is trying to address.

intrinsic image decomposition

cross-view consistency

physically based rendering

3D reconstruction

neural scene representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-view consistency

intrinsic image decomposition

test-time optimization