UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Current 3D neural radiance field methods suffer from two key limitations: strong scene specificity and an inability to model prediction uncertainty. To address these, we propose U-NeRF—the first generalized, uncertainty-aware unified neural feature field—capable of jointly encoding visual, geometric, and semantic modalities while estimating per-modality epistemic and aleatoric uncertainties. Our method employs a voxel-based representation, integrates features from pre-trained foundation models, and jointly optimizes both neural feature representations and uncertainty distributions via incremental RGB-D data fusion. The framework enables zero-shot cross-scene generalization, accurately characterizes error distributions in both scene reconstruction and semantic segmentation, and significantly enhances perceptual robustness and decision reliability in active object search tasks for mobile manipulation robots.

Technology Category

Application Category

📝 Abstract

Comprehensive visual, geometric, and semantic understanding of a 3D scene is crucial for successful execution of robotic tasks, especially in unstructured and complex environments. Additionally, to make robust decisions, it is necessary for the robot to evaluate the reliability of perceived information. While recent advances in 3D neural feature fields have enabled robots to leverage features from pretrained foundation models for tasks such as language-guided manipulation and navigation, existing methods suffer from two critical limitations: (i) they are typically scene-specific, and (ii) they lack the ability to model uncertainty in their predictions. We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generalizable representation while also predicting uncertainty in each modality. Our approach, which can be applied zero shot to any new environment, incrementally integrates RGB-D images into our voxel-based feature representation as the robot explores the scene, simultaneously updating uncertainty estimation. We evaluate our uncertainty estimations to accurately describe the model prediction errors in scene reconstruction and semantic feature prediction. Furthermore, we successfully leverage our feature predictions and their respective uncertainty for an active object search task using a mobile manipulator robot, demonstrating the capability for robust decision-making.

Problem

Research questions and friction points this paper is trying to address.

Generalizable neural field for visual, semantic and geometric features

Modeling uncertainty in scene reconstruction and semantic predictions

Enabling robust robotic decision-making in unstructured environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified neural field combining visual semantic geometric features

Generalizable voxel representation with uncertainty estimation

Zero-shot application to new scenes with incremental updates

🔎 Similar Papers

Unsupervised Discovery of Object-Centric Neural Fields