🤖 AI Summary
To address privacy risks in NeRF-based visual localization—specifically, the unintended leakage of scene-sensitive details—we propose ppNeSF, the first privacy-preserving neural radiance field method. Our approach abandons RGB supervision and instead leverages self-supervised semantic segmentation labels for training. It employs semantic-space distillation and geometry-appearance disentanglement to preserve geometric structure essential for localization while suppressing sensitive appearance information. We introduce a quantitative privacy evaluation protocol and validate ppNeSF on standard benchmarks: it matches state-of-the-art localization accuracy while robustly resisting diverse privacy attacks—effectively obscuring fine-grained sensitive content such as building logos, text, and human faces. This work is the first to systematically identify and resolve the privacy–accuracy trade-off inherent in NeRF-based visual localization.
📝 Abstract
Visual localization (VL) is the task of estimating the camera pose in a known scene. VL methods, a.o., can be distinguished based on how they represent the scene, e.g., explicitly through a (sparse) point cloud or a collection of images or implicitly through the weights of a neural network. Recently, NeRF-based methods have become popular for VL. While NeRFs offer high-quality novel view synthesis, they inadvertently encode fine scene details, raising privacy concerns when deployed in cloud-based localization services as sensitive information could be recovered. In this paper, we tackle this challenge on two ends. We first propose a new protocol to assess privacy-preservation of NeRF-based representations. We show that NeRFs trained with photometric losses store fine-grained details in their geometry representations, making them vulnerable to privacy attacks, even if the head that predicts colors is removed. Second, we propose ppNeSF (Privacy-Preserving Neural Segmentation Field), a NeRF variant trained with segmentation supervision instead of RGB images. These segmentation labels are learned in a self-supervised manner, ensuring they are coarse enough to obscure identifiable scene details while remaining discriminativeness in 3D. The segmentation space of ppNeSF can be used for accurate visual localization, yielding state-of-the-art results.